Bug 692586

Summary: XML output from pdfdraw not parsable by LibXML2
Product: MuPDF Reporter: Alec Taylor <alec.taylor6>
Component: appsAssignee: Tor Andersson <tor.andersson>
Status: RESOLVED FIXED    
Severity: normal CC: robin.watts
Priority: P4    
Version: unspecified   
Hardware: All   
OS: All   
Customer: Word Size: ---
Attachments: Proposed fix for the problem...

Description Alec Taylor 2011-10-13 03:33:13 UTC
Good afternoon,

Unfortunately output from pdfdraw is not parsable by LibXML2.


= How to reproduce error =
[From console]
> wget http://archive.org/download/lawofthehayes00ewinrich/lawofthehayes00ewinrich.pdf && pdfdraw -ttt awofthehayes00ewinrich/lawofthehayes00ewinrich.pdf > law.xml
> wget http://pastebin.ca/raw/2089594 -O main.c
// compile and link to the LibXML2 libraries with -o parser
> parser law.xml
law.xml:9: parser error : Opening and ending tag mismatch: char line 0 and span
</span>
       ^
law.xml : failed to parse

Please patch the library to fix this bug.

Thanks,

Alec Taylor
Comment 1 Sebastian Rasmussen 2011-10-13 11:26:18 UTC
Created attachment 7991 [details]
Proposed fix for the problem...

Proposed fix for the problem...
Comment 2 Alec Taylor 2011-10-13 11:37:45 UTC
After patch, here is my output:

> parser law5.xml 1> log1.txt 2> log2.txt
log1.txt = http://pastebin.com/ZtPVdbEq
log2.txt = http://pastebin.com/aBKW1x3k

Please fix this as soon as you can.

Thanks for the effort!

Alec Taylor
Comment 3 Robin Watts 2012-03-19 17:38:49 UTC
The output for this file passes http://www.xmlvalidation.com just fine.

If the latest version from git still goes wrong, please reopen this bug and attach the .xml and the errors again. If it's possible to restrict it to a single page (or a smaller page range) that goes wrong, that would help too.

Thanks.