Bug 689093 - HTML [non-]conformance and SVN keywords cleanup
Summary: HTML [non-]conformance and SVN keywords cleanup
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: Documentation (show other bugs)
Version: master
Hardware: PC All
: P4 normal
Assignee: Henry Stiles
URL:
Keywords: bountiable
: 688440 (view as bug list)
Depends on:
Blocks:
 
Reported: 2007-02-18 08:09 UTC by SaGS
Modified: 2011-05-11 23:07 UTC (History)
3 users (show)

See Also:
Customer:
Word Size: ---


Attachments
"</small>" is mandatory, part #1/2: script generating incorrect HTML (751 bytes, patch)
2007-02-18 08:10 UTC, SaGS
Details | Diff
"</small>" is mandatory, part #2/2: fix existing HTML (18.51 KB, patch)
2007-02-18 08:11 UTC, SaGS
Details | Diff
Character entities for "<>&", part #1/3: script generating wrong HTML (865 bytes, patch)
2007-02-18 08:11 UTC, SaGS
Details | Diff
Character entities for "<>&", part #2/3: fix "automatic" HTML (241.68 KB, patch)
2007-02-18 08:13 UTC, SaGS
Details | Diff
Character entities for "<>&", part #3/3: manual fixes to existing HTML (15.96 KB, patch)
2007-02-18 08:13 UTC, SaGS
Details | Diff
Double "</title>", part #1/2: script generating incorrect HTML (575 bytes, patch)
2007-02-18 08:14 UTC, SaGS
Details | Diff
Double "</title>", part #2/2: fix existing html (593 bytes, patch)
2007-02-18 08:15 UTC, SaGS
Details | Diff
Invalid HTML comments (1.62 KB, patch)
2007-02-18 08:15 UTC, SaGS
Details | Diff
Missing/misplaced "<pre>"/ "</pre>" in doc\History6/7.htm (11.39 KB, patch)
2007-02-18 08:16 UTC, SaGS
Details | Diff
Incorrectly nested "<b/>"+"<tt/>"[+"<em/>"] (34.00 KB, patch)
2007-02-18 08:16 UTC, SaGS
Details | Diff
Don't use lists for indenting (1.62 KB, patch)
2007-02-18 08:17 UTC, SaGS
Details | Diff
Miscellaneous missing/ misplaced/ missplelled/ extraneous tags (13.98 KB, patch)
2007-02-18 08:17 UTC, SaGS
Details | Diff
'align="middle"' should be 'align="center"' (1.44 KB, patch)
2007-02-18 08:18 UTC, SaGS
Details | Diff
Typos in "href="s to anchors (1.75 KB, patch)
2007-02-18 08:18 UTC, SaGS
Details | Diff
Don't use utf-8 text [without declaring the charset] (430 bytes, patch)
2007-02-18 08:19 UTC, SaGS
Details | Diff
For utf-8 HTML, declare the charset (506 bytes, patch)
2007-02-18 08:19 UTC, SaGS
Details | Diff
Protect text-"$Id$" from expansion, part #1/2: script generating HTML (868 bytes, patch)
2007-02-18 08:19 UTC, SaGS
Details | Diff
Protect text-"$Id$" from expansion, part #2/2: fix existing files (18.02 KB, patch)
2007-02-18 08:20 UTC, SaGS
Details | Diff
Patches updated as of TRUNK rev 11894 (ZIP file) (158.74 KB, application/x-zip-compressed)
2010-11-21 14:22 UTC, SaGS
Details
‘text-weight’ should be ‘font-weight’ (292 bytes, patch)
2010-11-21 16:35 UTC, SaGS
Details | Diff
A good copy of ijs_spec.pdf (51.96 KB, application/pdf)
2010-11-30 20:08 UTC, SaGS
Details
Remove remaining notices/ warnings (2 patches in a ZIP) (3.93 KB, application/x-zip-compressed)
2010-12-04 07:48 UTC, SaGS
Details

Note You need to log in before you can comment on or make changes to this bug.
Description SaGS 2007-02-18 08:09:55 UTC
(A) By passing the GS documentation through a HTML validator, one 
    gets hundreds of non-conformance errors. These also make browsers 
    miss some text.
(B) There are also problems with text that looks like SVN keywords, 
    and need to remain as-is, but SVN expands them, sometines 
    altering the files' semantics.

Attached are a number of patches ment to fix these problems. I tried 
to divide changes by as small as possible categories. Please don't 
be scared by the number (18) or total size (364K) of these diffs; 
the changes are simple, but repetitive.

Note that these patches affect some of the files used during the 
release process; I hope you will to consider these changes before 
releasing 8.56, otherwise the number of problems will increase.

To apply the patches:
- Use "patch -p1 -i difffile" in the GS root directory.
- Apply them in comment order, from 1 to 18; since these often 
  modify the same files, changing the order may make patch not 
  find the needed context and thus reject some hunks.
- The diff in comment #18 touches lines containing $Id$ keywords.
  Since patch does not know about keyword expansion, and your copy 
  may have these expanded differently, some manual intervention may 
  be needed.
Comment 1 SaGS 2007-02-18 08:10:41 UTC
Created attachment 2756 [details]
"</small>" is mandatory, part #1/2: script generating incorrect HTML

The closing "</small>" tag is mandatory. This patch fixes 
toolbin\makeset.tcl which generates HTML without it.
Comment 2 SaGS 2007-02-18 08:11:16 UTC
Created attachment 2757 [details]
"</small>" is mandatory, part #2/2: fix existing HTML

The closing "</small>" tag is mandatory. This patch:
- adds it to existing HTMLs;
- suppresses existing "</small>" from the few files that contained it 
  on a different line than the opening "<small>"; these files would 
  become incorrect after using the script fixed in comment #1.
Comment 3 SaGS 2007-02-18 08:11:51 UTC
Created attachment 2758 [details]
Character entities for "<>&", part #1/3: script generating wrong HTML

Text coming from log messages may (and does) contain "<>&", which 
have special meaning in HTML. The script that converts these messages 
into HTML does not html-escape them. This patch corrects the script.
Comment 4 SaGS 2007-02-18 08:13:08 UTC
Created attachment 2759 [details]
Character entities for "<>&", part #2/3: fix "automatic" HTML

This patch replaces "&"/"<"/">" => "&amp;"/"&gt;"/"&lt;" in 
doc\Changes.htm, doc\Details.htm, doc\History8.htm, and 
doc\Details8.htm, which (I think) were automatically generated by 
the incorrect version of toolbin\split_changelog.py.

Note:
    These changes were done automatically by a script that relies on 
    the particular way that split_changelog.py generates the HTML.
Comment 5 SaGS 2007-02-18 08:13:39 UTC
Created attachment 2760 [details]
Character entities for "<>&", part #3/3: manual fixes to existing HTML

Mostly a continuation of the previous patch:
- Replaces "&"/"<"/">" => "&amp;"/"&gt;"/"&lt;" in the other HTMLs;
- Changes/Details.html contain a "<pre>" as text, this needs to be 
  replaced by "&lt;pre&gt;";
- a few incorrect character entities ("&GT"/ "&LT") in doc\Ps2pdf.htm.
Comment 6 SaGS 2007-02-18 08:14:08 UTC
Created attachment 2761 [details]
Double "</title>", part #1/2: script generating incorrect HTML

toolbin\split_changelog.py generates double closing "</title>" for 
the detailed changelog. This patch fixes the script.
Comment 7 SaGS 2007-02-18 08:15:00 UTC
Created attachment 2762 [details]
Double "</title>", part #2/2: fix existing html

toolbin\split_changelog.py generates double closing "</title>" for 
the detailed changelog. This patch fixes the already-generated HTML.
Comment 8 SaGS 2007-02-18 08:15:33 UTC
Created attachment 2763 [details]
Invalid HTML comments

There exist the impression that HTML comments start with "<!--" and 
end with "-->". In reality, they start/end with "--" and are inside 
SGML markup, which starts with "<!" and ends with ">". The bottom 
line is that they cannot contain "--".
Comment 9 SaGS 2007-02-18 08:16:04 UTC
Created attachment 2764 [details]
Missing/misplaced "<pre>"/ "</pre>" in doc\History6/7.htm

doc\History6.htm has lots of "<pre/>" elements that are not closed. 
There are also e few misplaced opening "<pre>" tags in History6.htm, 
and a few misplaced closing "</pre>" tags in History7.htm ("<pre/>" 
elements cannot contain "<hr/>", "<h1/>", or "<h2/>").
Comment 10 SaGS 2007-02-18 08:16:35 UTC
Created attachment 2765 [details]
Incorrectly nested "<b/>"+"<tt/>"[+"<em/>"]

Wrong: "<b><tt> ... </tt></b>"
OK:    "<b><tt> ... </b></tt>"
There are many occurences of this inversion. Occasionaly, there is 
also an "<em/>" tag involved.
Comment 11 SaGS 2007-02-18 08:17:00 UTC
Created attachment 2766 [details]
Don't use lists for indenting

"<dl/>" can contain only "<dt/>" and "<dd/>" elements, and "<ul/>" 
only "<li/>". Not being allowed arbitrary contents, they should not 
be used for indenting; use "<blockquote/>" instead.
Comment 12 SaGS 2007-02-18 08:17:34 UTC
Created attachment 2767 [details]
Miscellaneous missing/ misplaced/ missplelled/ extraneous tags

This patch incldes an assortement of fixes, related to tags, that 
did not fit into other patches:
- A few typos and forgotten/ extraneous tags;
- Some places that had list items ("<li/>"/ "<dt/>"+"<dd/>")
  without being enclosed in a list element ("<ol/>"/ "<dl/>");
- Lists can only have specific elements as direct descendants. Put 
  the "<p/>"s (used for spacing) and  the anchors ("<a name=../>") 
  inside, not between, list elements, or before the whole list;
- Inline elements, like "<i/>" or "<tt/>", cannot contain block 
  elements like "<p/>" or "<ul/>".
  wrong: "<i><p> ..	   </p><p> ..	     </p></i>"
  ok:	 "   <p><i> .. </i></p><p><i> .. </i></p>"
Comment 13 SaGS 2007-02-18 08:18:03 UTC
Created attachment 2768 [details]
'align="middle"' should be 'align="center"'

"Middle" is only for vertical alignment ("valign="); for the 
horizontal alignment use "center".
Comment 14 SaGS 2007-02-18 08:18:30 UTC
Created attachment 2769 [details]
Typos in "href="s to anchors

A few links were missing the "#". Also quotes are needed 
('href="file#anchor"'), because in this case the attribute's value 
contains special characters (the "#").
Comment 15 SaGS 2007-02-18 08:19:01 UTC
Created attachment 2770 [details]
Don't use utf-8 text [without declaring the charset]

doc\Language.htm uses utf-8 text, but it does not declare the charset 
('<meta http-equiv="content-type" content="text/html; charset=..">').
A browser that does not default to utf-8 will not display the text 
"samples x components" correctly. The patch replaces the "x" with 
a character entity, for the charset not to be an issue anymore.
(The patch also inserts a missing space.)
Comment 16 SaGS 2007-02-18 08:19:28 UTC
Created attachment 2771 [details]
For utf-8 HTML, declare the charset

toolbin\split_changelog.py encodes text coming from log messages as 
utf-8. The patch changes it to output a "<meta/>" element specifying 
the charset used, otherwise browsers that are not set for utf-8 by 
default won't necessarily display the file correctly [if message 
logs contain extended characters]. Note: limiting the HTML to 7-bit 
ASCII would be a better idea.
Comment 17 SaGS 2007-02-18 08:19:59 UTC
Created attachment 2772 [details]
Protect text-"$Id$" from expansion, part #1/2: script generating HTML

(17.A)
If a commit log contains the text "$Id$", or some other text that 
looks like a SVN keyword, toolbin\split_changelog.py outputs it as-is 
to HTML and later SVN expands it as a keyword. The result is the text 
displayed differs. This patch extends the substitutions done for 
character entities (comment #3) to include "$" => "&#36;"; there 
won't be any "$"-as-text, and thus no "$Keyword..$"-as-text, anymore.

(17.B)
The patch also modifies a function in doc\gsdoc.el to do a similar
substitution.
Comment 18 SaGS 2007-02-18 08:20:33 UTC
Created attachment 2773 [details]
Protect text-"$Id$" from expansion, part #2/2: fix existing files

This patch changes files that contain "$Id..." as text in order to 
protect the apparent keyword from being expanded by SubVersion. The 
methods used are as follows:
- for HTML: use the numeric entity "&#36;" instead of "$".
- for Emacs Lisp: '"$Id$"' => '(concat "$" "Id$")'.
- for C and Python: if the parser finds 2 consecutive string 
  literals, separated by nothing but whitespace, then it 
  automatically treats them as if it were a single, longer, string 
  obtained from concatenating the 2 original ones. Thus, the patch 
  replaces '"$Id"' => '"$" "Id$"'.
- for TCL: I used "[$]Id", since it was in a regular expression.

Important note:
    Please review especially the change to toolbin\3way.tcl. This 
    file looks damaged since the beginning of the Ghostscript 
    repository, and I don't really know what was there.
Comment 19 SaGS 2007-02-18 08:31:25 UTC
I notice that Internet Explorer interprets some of the patches as HTML, 
although they are "unified diffs". Dont't know about other browsers. However 
doing a "Save target as" the content is OK.
Comment 20 Till Kamppeter 2007-05-09 03:15:08 UTC
I have applied the fixes of comment #17 and comment #18 to the doc/gsdoc.el
files in the trunk and in the gs-esp-gpl-merger branch now (SVN rev 7934).
Comment 21 Hin-Tak Leung 2010-08-02 00:17:44 UTC
Confirm and re-assign bugs with attached/in-line patches for another person to review.
Comment 22 Till Kamppeter 2010-11-08 22:19:15 UTC
Hin-Tak, why did you assign this bug to me? Did I break something? I am not a doc expert only because I have applied a doc-related patch three years ago.
Comment 23 SaGS 2010-11-21 14:22:53 UTC
Created attachment 6930 [details]
Patches updated as of TRUNK rev 11894 (ZIP file)

The patches posted 3 years ago cannot be applied as they are, so here 
I attach updated ones. The numbers correspond directly to comments #1 
to #18 because the problems are mostly the same (those comments can serve 
as/ be the base for log messages). The notable exceptions are as follows:

    Patch 01: After revision #8323, we need to fix makehist.tcl and 
    not makeset.tcl.

    Patch 03: The problem from comment #3 was solved in revision #9411.
    There is a patch for this part, however, but it only removes a
    comment that is no more applicable.

    Patch 04: split_changelog.py was fixed in the meantime and also 
    GS’s major revision number was incremented. The ‘automatic’ fixes 
    now touch only details8/history8.htm; details.htm and changes.htm 
    need no change from the point of view of comment #4.

    Patch 10: Revision #9030 partly replaced nested <B/>+<TT/> with 
    styled <CODE/>. So, the attached patch 10 does the rest of such 
    replacements instead of fixing the nesting of <B/> and <TT/>.

    Patch 11 and patch 14: There are no patches 11 and 14, because 
    the problems from comment #11 and comment #14 were either solved 
    or obsoleted by revisions #9033 (various fixes) and #8687 (removal 
    of testing.htm).

    Patch 17: The changes to gsdoc.el from comment #17 are already 
    committed in revision #7934.

    Patch #18: The changes to gsdoc.el from comment #17 are already 
    committed in revision #7934, and file 3way.tcl was removed in 
    revision #8322.

Important addition connected to comment #18, which cannot be 
expressed as a diff:

    File ijs\ijs_spec.pdf is damaged in a fresh svn checkout, both the 
    ‘normal’ copy and the pristine copy in .svn\text-base\. The way I 
    understand how svn stores its working copy, I would conclude this 
    file is damaged in the repository.

    Please:
    - remove its svn:eol-style and svn:keywords properties;
    - replace it with a good copy;
    - commit the changes.

Because these patches often modify the same files over and over, they 
need to be applied (‘patch -p1 -i file.diff.txt’) in numeric order.

Just in case this could be useful for the review, here is a list of 
revisions that I found to affect the old patches:

    #7934 <http://svn.ghostscript.com/viewvc?view=rev&revision=7934>
          (the changes to gsdoc.el from comment #17 and comment #18)

    #8322 <http://svn.ghostscript.com/viewvc?view=rev&revision=8322>
          (removal of 3way.tcl)

    #8323 <http://svn.ghostscript.com/viewvc?view=rev&revision=8323>
          (merge of makeset.tcl into makehist.tcl)

    #8687 <http://svn.ghostscript.com/viewvc?view=rev&revision=8687>
          (removal of testing.htm)

    #9030 <http://svn.ghostscript.com/viewvc?view=rev&revision=9030>
          (replaces many <b/>+<tt/> with <code/>)

    #9033 <http://svn.ghostscript.com/viewvc?view=rev&revision=9033>
          (various fixes concerning html conformance)

    #9411 <http://svn.ghostscript.com/viewvc?view=rev&revision=9411>
          (solves the problem mentioned in comment #3)
Comment 24 SaGS 2010-11-21 16:35:47 UTC
Created attachment 6932 [details]
‘text-weight’ should be ‘font-weight’

For completeness, I’m attaching a fix to GS’s CSS stylesheet:

Fix a typo in Ghostscript’s CSS style sheet, allowing code fragments in the documentation to appear in bold type, as intended.
Comment 25 SaGS 2010-11-30 20:08:00 UTC
Created attachment 6975 [details]
A good copy of ijs_spec.pdf

In comment #23 I mentioned the file ‘ijs\ijs_spec.pdf’ is damaged.
I found a backup of the last CVS version, as it was when the source 
repository switched to SVN. This PDF was fine then, so I’m attaching 
a good copy of it. In the 1st revision I have after the switch to 
SVN this file is already damaged.
Comment 26 SaGS 2010-12-04 07:48:48 UTC
Created attachment 6991 [details]
Remove remaining notices/ warnings (2 patches in a ZIP)

Two more patches to cleanup some notices/ warnings signaled by the W3C 
Validator. I did not previously address these because they are not errors, 
but after all it’s better to have 100% clean and 100% compatible HTMLs.

    Patch 20:
    Specify the charset for all files by using a <META/> element.
    Files Changes.htm, Details.htm, History[89].htm, and Details[89].htm 
    already contain such a specification (added by patch 16), because they 
    either already use utf-8 chars or may do so in the future. For all 
    others, ‘us-ascii’ is enough. There were only 2 non-ASCII characters, 
    a typo in Ps2pdf.htm that this patch removes and a lowercase o with 
    dieresis that gets replaced with a HTML entity.

    Patch 21:
    Do not use SHORTTAGS. The W3C Validator signals these as not recommended
    due to problems with some browsers.
Comment 27 Henry Stiles 2011-03-27 20:43:36 UTC
Hello SaGS, I'd rather you send me a public key and I'll give you write access to ghostscript svn and you can commit these documentation fixes directly.  Code changes have to be regression tested but these should be okay.  We can start the process with you emailing me a public key:

ssh-keygen -t dsa
Comment 28 SaGS 2011-04-03 21:30:30 UTC
All patches updated and committed as SVN revisions #12346 to #12365 (inclusive).
<http://svn.ghostscript.com/viewvc?view=rev&revision=12346>
Comment 29 Henry Stiles 2011-05-11 22:41:01 UTC
SaGS, I believe you have committed patches for this and collected the bounty, if not please let me know.
Comment 30 Henry Stiles 2011-05-11 23:07:09 UTC
*** Bug 688440 has been marked as a duplicate of this bug. ***