Bug 691559

Summary: bad xml output
Product: MuPDF Reporter: mupdfxmltracebug
Component: appsAssignee: Tor Andersson <tor.andersson>
Status: RESOLVED FIXED    
Severity: critical CC: kennyostrom
Priority: P4    
Version: unspecified   
Hardware: All   
OS: All   
Customer: Word Size: ---

Description mupdfxmltracebug 2010-08-16 21:50:22 UTC
pdfdraw -x filename.pdf
The xml trace output does not escape the xml strings, producing invalid xml.

I fixed mine by modifying mupfd\fitz\res_text.c  (although there is another place which has the error, but it didn't effect my usage)

fz_debugtext prints the character with 
printf("<g ucs=\"%c\" ..., test->els[i].ucs, ...

I replaced that with 
printf("<g ucs=\"%s\" ..., GetXmlCharacter( test->els[i].ucs), ...

and this really fast utility function:

const char * GetXmlCharacter (char ucs){
	static const char * strAmpersand  = "&amp;";
	static const char * strApostrophe = "&apos;";
	static const char * strGreater    = "&gt;";
	static const char * strLess       = "&lt;";
	static const char * strQuote      = "&quot;";
	static char unescaped [2];
	int bFirst = 1;
	if (bFirst){
		bFirst = 0;
		memset(unescaped, '\0', sizeof(unescaped));
	}

	switch (ucs){
		case '&':
			return strAmpersand;
		case '\'':
			return strApostrophe;
		case '>':
			return strGreater;
		case '<':
			return strLess;
		case '\"':
			return strQuote;
		default:
			{
				unescaped[0] = ucs;
				return unescaped;
			}
	}
}
Comment 1 Kenny Ostrom 2010-09-23 21:43:33 UTC
*** Bug 691641 has been marked as a duplicate of this bug. ***