Bug 691559 - bad xml output
Summary: bad xml output
Status: RESOLVED FIXED
Alias: None
Product: MuPDF
Classification: Unclassified
Component: apps (show other bugs)
Version: unspecified
Hardware: All All
: P4 critical
Assignee: Tor Andersson
URL:
Keywords:
: 691641 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-08-16 21:50 UTC by mupdfxmltracebug
Modified: 2010-09-23 21:43 UTC (History)
1 user (show)

See Also:
Customer:
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description mupdfxmltracebug 2010-08-16 21:50:22 UTC
pdfdraw -x filename.pdf
The xml trace output does not escape the xml strings, producing invalid xml.

I fixed mine by modifying mupfd\fitz\res_text.c  (although there is another place which has the error, but it didn't effect my usage)

fz_debugtext prints the character with 
printf("<g ucs=\"%c\" ..., test->els[i].ucs, ...

I replaced that with 
printf("<g ucs=\"%s\" ..., GetXmlCharacter( test->els[i].ucs), ...

and this really fast utility function:

const char * GetXmlCharacter (char ucs){
	static const char * strAmpersand  = "&amp;";
	static const char * strApostrophe = "&apos;";
	static const char * strGreater    = "&gt;";
	static const char * strLess       = "&lt;";
	static const char * strQuote      = "&quot;";
	static char unescaped [2];
	int bFirst = 1;
	if (bFirst){
		bFirst = 0;
		memset(unescaped, '\0', sizeof(unescaped));
	}

	switch (ucs){
		case '&':
			return strAmpersand;
		case '\'':
			return strApostrophe;
		case '>':
			return strGreater;
		case '<':
			return strLess;
		case '\"':
			return strQuote;
		default:
			{
				unescaped[0] = ucs;
				return unescaped;
			}
	}
}
Comment 1 Kenny Ostrom 2010-09-23 21:43:33 UTC
*** Bug 691641 has been marked as a duplicate of this bug. ***