Adding markup to the pages to improve the index
The keyword finding program is very aggressive about splitting things up. In some cases, you want to keep a group of words together. You can explicitly do this by putting the words inside an <span> element:
<span>Principal Components Analysis</span>
<span>Adam Smith</span>
<span>Autonomous system number</span>
It is also possible to make material to explicitly exclude it from the index:
<span class="dont-index">Text to exclude from the index.</span>
<span class="dont-index">20% error rate</span>
You can also indicate that something is an inline reference (with the "-r" flag these are also excluded from the indexing):
<span class="inline-ref">[1] J. Ioannidis and G. Maguire, '<i>Coherent File Distribution Protocol</i>',
Internet Request for Comments, vol. RFC 1235 (Experimental), Jun. 1991 [Online].
DOI: <a href="https://doi.org/10.17487/RFC1235">10.17487/RFC1235</a></span>
<span class="inline-ref">(see Smith, Figure 10, on page 99.)</span>
Quite often in the ALT text for figures (and figure caption), I have an inline reference of the type shown just above - in this case, you need to say:
<span class='inline-ref'>(see Smith, Figure 10, on page 99.)</span>
Note the use of single quote marks, this is necessary as the ALT text is inside double quotes, such as:
alt="<span class='dont-index'>zoomed into middle of first figure</span>"
I find that an easy way to do the above type of markup is to store the markup strings in an Emacs register and then insert them where you want. You can do this editing in the copy of the page in the temporary directory that you made and then copy and paste the HTML into the page using the RCE editor or you can use one of my programs to upload the modified file.
I also can exclude from the indexing text that is below a <hr> or <hr /> - as this is where I put notes and references on each page. Exclusion of this material and the two classes above are controlled by the "-r" option to the program that finds keywords and phrases.