(A) By passing the GS documentation through a HTML validator, one gets hundreds of non-conformance errors. These also make browsers miss some text. (B) There are also problems with text that looks like SVN keywords, and need to remain as-is, but SVN expands them, sometines altering the files' semantics. Attached are a number of patches ment to fix these problems. I tried to divide changes by as small as possible categories. Please don't be scared by the number (18) or total size (364K) of these diffs; the changes are simple, but repetitive. Note that these patches affect some of the files used during the release process; I hope you will to consider these changes before releasing 8.56, otherwise the number of problems will increase. To apply the patches: - Use "patch -p1 -i difffile" in the GS root directory. - Apply them in comment order, from 1 to 18; since these often modify the same files, changing the order may make patch not find the needed context and thus reject some hunks. - The diff in comment #18 touches lines containing $Id$ keywords. Since patch does not know about keyword expansion, and your copy may have these expanded differently, some manual intervention may be needed.
Created attachment 2756 [details] "</small>" is mandatory, part #1/2: script generating incorrect HTML The closing "</small>" tag is mandatory. This patch fixes toolbin\makeset.tcl which generates HTML without it.
Created attachment 2757 [details] "</small>" is mandatory, part #2/2: fix existing HTML The closing "</small>" tag is mandatory. This patch: - adds it to existing HTMLs; - suppresses existing "</small>" from the few files that contained it on a different line than the opening "<small>"; these files would become incorrect after using the script fixed in comment #1.
Created attachment 2758 [details] Character entities for "<>&", part #1/3: script generating wrong HTML Text coming from log messages may (and does) contain "<>&", which have special meaning in HTML. The script that converts these messages into HTML does not html-escape them. This patch corrects the script.
Created attachment 2759 [details] Character entities for "<>&", part #2/3: fix "automatic" HTML This patch replaces "&"/"<"/">" => "&"/">"/"<" in doc\Changes.htm, doc\Details.htm, doc\History8.htm, and doc\Details8.htm, which (I think) were automatically generated by the incorrect version of toolbin\split_changelog.py. Note: These changes were done automatically by a script that relies on the particular way that split_changelog.py generates the HTML.
Created attachment 2760 [details] Character entities for "<>&", part #3/3: manual fixes to existing HTML Mostly a continuation of the previous patch: - Replaces "&"/"<"/">" => "&"/">"/"<" in the other HTMLs; - Changes/Details.html contain a "<pre>" as text, this needs to be replaced by "<pre>"; - a few incorrect character entities (">"/ "<") in doc\Ps2pdf.htm.
Created attachment 2761 [details] Double "</title>", part #1/2: script generating incorrect HTML toolbin\split_changelog.py generates double closing "</title>" for the detailed changelog. This patch fixes the script.
Created attachment 2762 [details] Double "</title>", part #2/2: fix existing html toolbin\split_changelog.py generates double closing "</title>" for the detailed changelog. This patch fixes the already-generated HTML.
Created attachment 2763 [details] Invalid HTML comments There exist the impression that HTML comments start with "<!--" and end with "-->". In reality, they start/end with "--" and are inside SGML markup, which starts with "<!" and ends with ">". The bottom line is that they cannot contain "--".
Created attachment 2764 [details] Missing/misplaced "<pre>"/ "</pre>" in doc\History6/7.htm doc\History6.htm has lots of "<pre/>" elements that are not closed. There are also e few misplaced opening "<pre>" tags in History6.htm, and a few misplaced closing "</pre>" tags in History7.htm ("<pre/>" elements cannot contain "<hr/>", "<h1/>", or "<h2/>").
Created attachment 2765 [details] Incorrectly nested "<b/>"+"<tt/>"[+"<em/>"] Wrong: "<b><tt> ... </tt></b>" OK: "<b><tt> ... </b></tt>" There are many occurences of this inversion. Occasionaly, there is also an "<em/>" tag involved.
Created attachment 2766 [details] Don't use lists for indenting "<dl/>" can contain only "<dt/>" and "<dd/>" elements, and "<ul/>" only "<li/>". Not being allowed arbitrary contents, they should not be used for indenting; use "<blockquote/>" instead.
Created attachment 2767 [details] Miscellaneous missing/ misplaced/ missplelled/ extraneous tags This patch incldes an assortement of fixes, related to tags, that did not fit into other patches: - A few typos and forgotten/ extraneous tags; - Some places that had list items ("<li/>"/ "<dt/>"+"<dd/>") without being enclosed in a list element ("<ol/>"/ "<dl/>"); - Lists can only have specific elements as direct descendants. Put the "<p/>"s (used for spacing) and the anchors ("<a name=../>") inside, not between, list elements, or before the whole list; - Inline elements, like "<i/>" or "<tt/>", cannot contain block elements like "<p/>" or "<ul/>". wrong: "<i><p> .. </p><p> .. </p></i>" ok: " <p><i> .. </i></p><p><i> .. </i></p>"
Created attachment 2768 [details] 'align="middle"' should be 'align="center"' "Middle" is only for vertical alignment ("valign="); for the horizontal alignment use "center".
Created attachment 2769 [details] Typos in "href="s to anchors A few links were missing the "#". Also quotes are needed ('href="file#anchor"'), because in this case the attribute's value contains special characters (the "#").
Created attachment 2770 [details] Don't use utf-8 text [without declaring the charset] doc\Language.htm uses utf-8 text, but it does not declare the charset ('<meta http-equiv="content-type" content="text/html; charset=..">'). A browser that does not default to utf-8 will not display the text "samples x components" correctly. The patch replaces the "x" with a character entity, for the charset not to be an issue anymore. (The patch also inserts a missing space.)
Created attachment 2771 [details] For utf-8 HTML, declare the charset toolbin\split_changelog.py encodes text coming from log messages as utf-8. The patch changes it to output a "<meta/>" element specifying the charset used, otherwise browsers that are not set for utf-8 by default won't necessarily display the file correctly [if message logs contain extended characters]. Note: limiting the HTML to 7-bit ASCII would be a better idea.
Created attachment 2772 [details] Protect text-"$Id$" from expansion, part #1/2: script generating HTML (17.A) If a commit log contains the text "$Id$", or some other text that looks like a SVN keyword, toolbin\split_changelog.py outputs it as-is to HTML and later SVN expands it as a keyword. The result is the text displayed differs. This patch extends the substitutions done for character entities (comment #3) to include "$" => "$"; there won't be any "$"-as-text, and thus no "$Keyword..$"-as-text, anymore. (17.B) The patch also modifies a function in doc\gsdoc.el to do a similar substitution.
Created attachment 2773 [details] Protect text-"$Id$" from expansion, part #2/2: fix existing files This patch changes files that contain "$Id..." as text in order to protect the apparent keyword from being expanded by SubVersion. The methods used are as follows: - for HTML: use the numeric entity "$" instead of "$". - for Emacs Lisp: '"$Id$"' => '(concat "$" "Id$")'. - for C and Python: if the parser finds 2 consecutive string literals, separated by nothing but whitespace, then it automatically treats them as if it were a single, longer, string obtained from concatenating the 2 original ones. Thus, the patch replaces '"$Id"' => '"$" "Id$"'. - for TCL: I used "[$]Id", since it was in a regular expression. Important note: Please review especially the change to toolbin\3way.tcl. This file looks damaged since the beginning of the Ghostscript repository, and I don't really know what was there.
I notice that Internet Explorer interprets some of the patches as HTML, although they are "unified diffs". Dont't know about other browsers. However doing a "Save target as" the content is OK.
I have applied the fixes of comment #17 and comment #18 to the doc/gsdoc.el files in the trunk and in the gs-esp-gpl-merger branch now (SVN rev 7934).
Confirm and re-assign bugs with attached/in-line patches for another person to review.
Hin-Tak, why did you assign this bug to me? Did I break something? I am not a doc expert only because I have applied a doc-related patch three years ago.
Created attachment 6930 [details] Patches updated as of TRUNK rev 11894 (ZIP file) The patches posted 3 years ago cannot be applied as they are, so here I attach updated ones. The numbers correspond directly to comments #1 to #18 because the problems are mostly the same (those comments can serve as/ be the base for log messages). The notable exceptions are as follows: Patch 01: After revision #8323, we need to fix makehist.tcl and not makeset.tcl. Patch 03: The problem from comment #3 was solved in revision #9411. There is a patch for this part, however, but it only removes a comment that is no more applicable. Patch 04: split_changelog.py was fixed in the meantime and also GS’s major revision number was incremented. The ‘automatic’ fixes now touch only details8/history8.htm; details.htm and changes.htm need no change from the point of view of comment #4. Patch 10: Revision #9030 partly replaced nested <B/>+<TT/> with styled <CODE/>. So, the attached patch 10 does the rest of such replacements instead of fixing the nesting of <B/> and <TT/>. Patch 11 and patch 14: There are no patches 11 and 14, because the problems from comment #11 and comment #14 were either solved or obsoleted by revisions #9033 (various fixes) and #8687 (removal of testing.htm). Patch 17: The changes to gsdoc.el from comment #17 are already committed in revision #7934. Patch #18: The changes to gsdoc.el from comment #17 are already committed in revision #7934, and file 3way.tcl was removed in revision #8322. Important addition connected to comment #18, which cannot be expressed as a diff: File ijs\ijs_spec.pdf is damaged in a fresh svn checkout, both the ‘normal’ copy and the pristine copy in .svn\text-base\. The way I understand how svn stores its working copy, I would conclude this file is damaged in the repository. Please: - remove its svn:eol-style and svn:keywords properties; - replace it with a good copy; - commit the changes. Because these patches often modify the same files over and over, they need to be applied (‘patch -p1 -i file.diff.txt’) in numeric order. Just in case this could be useful for the review, here is a list of revisions that I found to affect the old patches: #7934 <http://svn.ghostscript.com/viewvc?view=rev&revision=7934> (the changes to gsdoc.el from comment #17 and comment #18) #8322 <http://svn.ghostscript.com/viewvc?view=rev&revision=8322> (removal of 3way.tcl) #8323 <http://svn.ghostscript.com/viewvc?view=rev&revision=8323> (merge of makeset.tcl into makehist.tcl) #8687 <http://svn.ghostscript.com/viewvc?view=rev&revision=8687> (removal of testing.htm) #9030 <http://svn.ghostscript.com/viewvc?view=rev&revision=9030> (replaces many <b/>+<tt/> with <code/>) #9033 <http://svn.ghostscript.com/viewvc?view=rev&revision=9033> (various fixes concerning html conformance) #9411 <http://svn.ghostscript.com/viewvc?view=rev&revision=9411> (solves the problem mentioned in comment #3)
Created attachment 6932 [details] ‘text-weight’ should be ‘font-weight’ For completeness, I’m attaching a fix to GS’s CSS stylesheet: Fix a typo in Ghostscript’s CSS style sheet, allowing code fragments in the documentation to appear in bold type, as intended.
Created attachment 6975 [details] A good copy of ijs_spec.pdf In comment #23 I mentioned the file ‘ijs\ijs_spec.pdf’ is damaged. I found a backup of the last CVS version, as it was when the source repository switched to SVN. This PDF was fine then, so I’m attaching a good copy of it. In the 1st revision I have after the switch to SVN this file is already damaged.
Created attachment 6991 [details] Remove remaining notices/ warnings (2 patches in a ZIP) Two more patches to cleanup some notices/ warnings signaled by the W3C Validator. I did not previously address these because they are not errors, but after all it’s better to have 100% clean and 100% compatible HTMLs. Patch 20: Specify the charset for all files by using a <META/> element. Files Changes.htm, Details.htm, History[89].htm, and Details[89].htm already contain such a specification (added by patch 16), because they either already use utf-8 chars or may do so in the future. For all others, ‘us-ascii’ is enough. There were only 2 non-ASCII characters, a typo in Ps2pdf.htm that this patch removes and a lowercase o with dieresis that gets replaced with a HTML entity. Patch 21: Do not use SHORTTAGS. The W3C Validator signals these as not recommended due to problems with some browsers.
Hello SaGS, I'd rather you send me a public key and I'll give you write access to ghostscript svn and you can commit these documentation fixes directly. Code changes have to be regression tested but these should be okay. We can start the process with you emailing me a public key: ssh-keygen -t dsa
All patches updated and committed as SVN revisions #12346 to #12365 (inclusive). <http://svn.ghostscript.com/viewvc?view=rev&revision=12346>
SaGS, I believe you have committed patches for this and collected the bounty, if not please let me know.
*** Bug 688440 has been marked as a duplicate of this bug. ***