 Process Engineer Toolkit >
 Process Engineer Toolkit >
 User's Guide >
 User's Guide >
 Tools Reference >
 Tools Reference >
 Search Engine >
 Search Engine >
 Overview
 Overview
| SearchEngine: OverviewTopics
 This chapter explains the workings of the SearchEngine, and the iterative process of generating, and later, regenerating the final applet word database.  The 
  purpose of the SearchEngine The SearchEngine reads one or more HTML files, parses the words within 
the markup tags, and then parses all linked HTML files. Each word 
is checked for word removal and word reduction, and the resulting word list for 
the HTML file is stored internally. When all the linked HTML 
files have been parsed, the word database is constructed, together with the applet 
tag for the HTML applet search page. | 
| -f filename | the root HTML filename (required) | 
| -gw filename | generate Web applet files | 
| -lu filename | list dependency URLs to filename | 
| -lw filename | list words to filename | 
| -nt | exclude <TITLE> tagged words from database | 
| -nh | exclude <H1..H6><CAPTION> tagged words from database | 
| -nl | exclude <DT><LI> tagged words from database | 
| -nb | exclude <BODY> tagged words from database | 
| -p filepath | intermediate data filepath | 
| -r filename | execute response file | 
| -s | suppress HTML syntax error reporting | 
| -u url | the WWW URL equivalent of the root HTML document | 
| -xn | exclude numbers from word list | 
| -xu url | exclude URL from dependency list | 
| -xwf filename | word exclusion HTML filename | 
| -xwu url | exclude URL from word list | 
| -l | The file with language dependent messages. | 
| -c | The characterset to use when reading input. If this option is used it has to be the the first option. Default is local characterset. | 
| -h | File containing text to make the output from the application language dependent. | 
Options are separated by white space, so if you have a filename, or URL which contains a white space character, you must place that parameter in double quotes:
| -lu | "/html/Site dependency list" | 
| -f filename | the root HTML filename (required) | 
| -u url | the WWW URL equivalent of the root HTML document | 
| -xu url | exclude URL from dependency list | 
The resulting dependency list can be output to a file using:
| -lu filename | list dependency URLs to filename | 
The intermediate parsed data files are stored in the directory specified by:
| -p filepath | intermediate data filepath | 
if this argument is not specified the current working directory is used.
These options are further explained in the chapter Building the dependency list.
| -nt | exclude <TITLE> tagged words from database | 
| -nh | exclude <H1..H6><CAPTION> tagged words from database | 
| -nl | exclude <DT><LI> tagged words from database | 
| -nb | exclude <BODY> tagged words from database | 
| -xwf filename | word exclusion HTML filename | 
| -xwu url | exclude URL from word list | 
| -xn | exclude numbers from word list | 
The resulting word list can be output to a file using:
| -lw filename | list words to filename | 
These options are further explained in the chapter Eliminating words.
| -gw filename | generate Web applet files | 
The option are explained in the chapter Building the applet database.
Since the SearchEngine acts on a series of options, these options can be placed for commodity, in one or more text files. In addition to reducing keystrokes, these files can also contain comments. The following is an extract from the response file used to build the database for this manual:
Response file for the SearchEngine manual
(where on the hard disk)
-f \www\rational\application\search\search\TOC.html
(where on the World Wide Web)
-u http://www.ruptools.com/rup/rational/application/search/search/TOC.html
Dependency exclusions:
(ignore any links to zip files, java files, and the link to my java page)
-xu *.zip
-xu */javapage.html
-xu *.java
Word count exclusions:
(ignore the search page, and table of contents)
-xwu */docsearch.html
-xwu */TOC.html
Standard word exclusion filters:
(ignore all numbers)
-xn
(standard english language exclusion list)
-xwf exclude.english.html
(specific exclusion list for the manual)
-xwf search.exclude.html
The SearchEngine parses a response file, ignoring all lines which do not begin with a hyphen as the first non-white space character. Any valid SearchEngine option can appear in a response file, invalid or illegal options produce an error message.
The -r filename option itself can also appear in a response file, so that, for example, you can create standard dependency or word file exclusion filters, which can be used to generate multiple databases.
Each option and its associated parameters must appear on a single, separate line of the response file.
The SearchEngine can generate several output files, as well as HTML syntax error messages to the standard output device. 
| Rational Unified
Process   
 |