The FOSS server https://fossies.org - also supporting "GhostPDL" - offers a feature named "Source code misspelling reports": https://fossies.org/features.html#codespell Such reports are normally only generated on request, but as Fossies administrator I have just created (for testing purposes) such an analysis for the current "GhostPDL" release: https://fossies.org/linux/misc/ghostpdl/codespell.html That version-independent URL should redirect always to the last report (if available), so currently to https://fossies.org/linux/misc/ghostpdl-9.52.tar.gz/codespell.html Although after a first review some obviously wrong matches ("false positives") are already filtered out (ignored) please inform me if you find more of them so that I can force a new improved check if applicable. Ok, some spelling suggestions are worth discussing, some obvious spelling errors may be desired (for e.g. in variable names) and others may be contained in third-party code. Just for information there are also three supplemental pages https://fossies.org/linux/misc/ghostpdl/codespell_conf.html showing some used "codespell" configurations, https://fossies.org/linux/misc/ghostpdl/codespell_fps.html showing all resulting obvious "false positives" and https://fossies.org/linux/misc/ghostpdl/codespell_hist.html showing changes regarding found spelling errors compared to previous analyzed versions. Perhaps more meaningful may be a similar and continuously updated report for the GhostPDL Git "master" version which is available within a special restricted "test" folder that isn't really integrated into the standard Fossies services and should also not be accessible to search engines: https://fossies.org/linux/test/ghostpdl-master.tar.gz/codespell.html That version independent URL hopefully always redirects to the report for the latest "master" version identified by the short commit ID and a year-month-day string (YYMMDD) representing the according git pull date (mostly = commit date). The refresh is currently done only once a day (some minutes after 0:00 a.m. (midnight) CET. If meaningful that refresh rate may be increased. Although the correction of misspellings and typos has probably not a top priority, I hope that the report can nevertheless be a little bit useful. Regards Jens
Assigning to Henry to decide what, if anything, we want to do with this info. I've determined that running codespell in 'interactive' (-i 3) mode allows for going through and deciding whether to accept the recommended change or not is pretty quick -- One of the larger files was doc/History9.htm and it took less than 3 minutes to go through. There was one place where it wanted to replace 'grat' with 'great' that I refused since it was (from context) not an error. Running it on files changed during a commit as part of the cluster tests, or as a pre-commit hook with git (on the changed files AND the log message) might help us going forward. Changing to CONFIRMED since I was able to run codespell and see this issue.
I would *strongly* suggest we do not want to correct doc/History9.htm - the bulk of the contents of that are trawled directly from the commit comments, and I think we really want to keep the two in sync.
That "problem" is similar to the ChangeLog file within many other FOSS projects. If a little bit helpful I could generate a new report excluding the doc/History9.htm file.
Jens, No need to generate a new report. I can do that readily enough. Chris, I agree that in order to keep History9.htm sync-ed up with the git commit log, we don't want a hand corrected one to be used. If we do anything, we could auto-correct History9.htm after it is generated during the release process. My exercise in going through that one was to see how long it would take for a file with LOTS of typos to be gone through in interactive mode. BTW, I only found 1 change that I had to reject (grat). That and two words where it offered two choices (one was 'ths', choosing between 'the' and 'this') -- I don't recall the other one.
We agreed not to take action on this now, changing the status to "Later"