Bug 692586 - XML output from pdfdraw not parsable by LibXML2
Summary: XML output from pdfdraw not parsable by LibXML2
Status: RESOLVED FIXED
Alias: None
Product: MuPDF
Classification: Unclassified
Component: apps (show other bugs)
Version: unspecified
Hardware: All All
: P4 normal
Assignee: Tor Andersson
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-13 03:33 UTC by Alec Taylor
Modified: 2012-03-19 17:38 UTC (History)
1 user (show)

See Also:
Customer:
Word Size: ---


Attachments
Proposed fix for the problem... (360 bytes, patch)
2011-10-13 11:26 UTC, Sebastian Rasmussen
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alec Taylor 2011-10-13 03:33:13 UTC
Good afternoon,

Unfortunately output from pdfdraw is not parsable by LibXML2.


= How to reproduce error =
[From console]
> wget http://archive.org/download/lawofthehayes00ewinrich/lawofthehayes00ewinrich.pdf && pdfdraw -ttt awofthehayes00ewinrich/lawofthehayes00ewinrich.pdf > law.xml
> wget http://pastebin.ca/raw/2089594 -O main.c
// compile and link to the LibXML2 libraries with -o parser
> parser law.xml
law.xml:9: parser error : Opening and ending tag mismatch: char line 0 and span
</span>
       ^
law.xml : failed to parse

Please patch the library to fix this bug.

Thanks,

Alec Taylor
Comment 1 Sebastian Rasmussen 2011-10-13 11:26:18 UTC
Created attachment 7991 [details]
Proposed fix for the problem...

Proposed fix for the problem...
Comment 2 Alec Taylor 2011-10-13 11:37:45 UTC
After patch, here is my output:

> parser law5.xml 1> log1.txt 2> log2.txt
log1.txt = http://pastebin.com/ZtPVdbEq
log2.txt = http://pastebin.com/aBKW1x3k

Please fix this as soon as you can.

Thanks for the effort!

Alec Taylor
Comment 3 Robin Watts 2012-03-19 17:38:49 UTC
The output for this file passes http://www.xmlvalidation.com just fine.

If the latest version from git still goes wrong, please reopen this bug and attach the .xml and the errors again. If it's possible to restrict it to a single page (or a smaller page range) that goes wrong, that would help too.

Thanks.