| SearchEngine: 
  Frequently Asked QuestionsTopicsSome common problems have occurred when using the SearchEngine. This chapter 
  lists these problems and their solutions. Questions have been divided into two 
  categories; the SearchEngine and the Search applet.  The FAQ 
  index  
  Files are not being excludedThe SearchEngine is reading files excluded with the -xu flag.SearchEngine: tags or tag attributes are being stored 
    in the databaseThe SearchEngine is storing words which look suspiciously like tags or tag 
    attributes.SearchEngine: keywords in titles and headers are missingThe SearchEngine is not storing words which appear in HTML 
    tags like <TITLE>, <H1..H6>, etc.SearchEngine: runs fine for a while, then slows downThe SearchEngine parses the first few hundred files, then slows down and 
    starts thrashing (repeatedly using) the hard-disk.SearchEngine: stops with an OutOfMemoryExceptionThe SearchEngine parses the first few hundred files, then displays a long 
    list of error messages, starting with OutOfMemoryException.SearchEngine: stops with a 'Too many files for the search 
    applet database' messageThe SearchEngine parses many hundreds of files, then displays a 'Too many 
    files for the search applet database' message.Applet: Search button remains gray, or an error message 
    appearsThe applet starts up, but after a few seconds, the search button appears 
    grayed out, or an error message is displayed.Applet: Clicking on a title causes the browser to issue 
    a 'document not found' errorWhen the user double clicks on a found document title, instead of the browser 
    opening the document, it issues a 'document not found' error message.  Questions 
  about the SearchEngine  
  Files are not being excludedThe SearchEngine is reading files excluded with the -xu flag. 
     
      Take care when using the wildcard character '*'.The wildcard character '*' can appear at the start of the 
        URL, and/or at the end of the URL, anywhere 
        else it is treated as an ordinary character.No other combinations of the wildcard character '*' are valid. 
        A filter definition of */extawt/*remove.* will result in 
        a (probably useless) filter to ignore all URLs containing 
        /extawt/*remove., and not the probable intention of 
        ignoring all URLs containing /extawt/ and 
        also remove.
The SearchEngine uses case sensitive URLs when filtering.Some operating systems (Windows) are case insensitive to file names, 
        however, the SearchEngine is not. If for example, the filter 
        
-xu *.zip
 was used, then all files ending in .zip will be removed, 
          but files ending in .ZIP will not. Use both lower 
          case and upper case to filter file extensions: 
-xu *.zip
-xu *.ZIP
Tags or tag attributes are being 
    stored in the databaseThe SearchEngine is storing words which look suspiciously like tags or tag 
    attributes. 
     
      The HTML documents may indeed contain the tag keywords 
        as text, if the argument is about HTMLCheck the documents for the offending keywords, and ensure that they 
        are or are not inside HTML markup, watch out for incorrectly 
        formed comment syntax.The HTML document may have syntax errors, which caused 
        the SearchEngine to store the words in the body, or ignore them completely.Check the documents for the offending keywords, and ensure that they 
        are inside the correct HTML markup, watch out for incorrectly 
        formed comment syntax. Keywords in titles and headers 
    are missingThe SearchEngine is not storing words which appear in HTML 
    tags like <TITLE>, <H1..H6>, etc. 
     
      The HTML document may have syntax errors, which caused 
        the SearchEngine to store the words in the body, or ignore them completely.Check the documents for the offending keywords, and ensure that they 
        are inside the correct HTML markup, watch out for incorrectly 
        formed comment syntax. Runs fine for a while, then slows 
    downThe SearchEngine parses the first few hundred files, then slows down and 
    starts thrashing (repeatedly using) the hard-disk. 
    The SearchEngine is running out of virtual memory.The SearchEngine requires about 1.5 to 2.0 times the virtual memory, as 
      the size of the documents being parsed. If, say, you have 9 MB of documents, 
      then you will require about 15 to 18 MB of virtual memory. 
      Start the Java interpreter with as much virtual memory as needed using 
        the -mx switch (the default is 16 MB): 
java -mx24m ruptools.SearchEngine ...
Not enough virtual memory.Possible solutions are: 
      
        Split the files up into sub-groups, and create databases for each.Remove word groups, -nb, -nl, -nh (in that order).Do both, a restricted global search, with complete sub-search.Increase the word exclusion list (english.exclude.html is very generic)  
          Stops with an OutOfMemoryException The SearchEngine parses the first few hundred files, then displays a long 
    list of error messages, starting with OutOfMemoryException. 
     
      The SearchEngine ran out of virtual memory.The SearchEngine requires about 1.5 to 2.0 times the virtual memory, 
        as the size of the documents being parsed. If, say, you have 9 MB of documents, 
        then you will require about 15 to 18 MB of virtual memory. 
        Start the Java interpreter with as much virtual memory as needed using 
          the -mx switch (the default is 16 MB): 
java -mx24m ruptools.SearchEngine
Not enough virtual memory.Possible solutions are: 
        
          Split the files up into sub-groups, and create databases for each.Remove word groups, -nb, -nl, -nh (in that order).Do both, a restricted global search, with complete sub-search.Increase the word exclusion list (english.exclude.html is very generic)  
            Stops with a 'Too many files for the 
  search applet database' message 
  The SearchEngine parses many hundreds of files, then displays a 'Too many 
    files for the search applet database' message. 
     
      The SearchEngine exceeded the applet database maximum file size.The applet database can hold information on up to a maximum of 4096 
        HTML documents.  Questions 
  about the Search applet  
  Search button remains gray, or 
    an error message appearsThe applet starts up, but after a few seconds, the search button appears 
    grayed out, or an error message is displayed.The cause of this problem is that the applet failed to find or load the 
    database. 
     
      Check that the file path is correct.The applet will look in the path made up from the codebase 
        plus database parameter value. Supposing the applet definition 
        is: 
        
<applet codebase=".." archive="Search.zip"
 code="ruptools.Search.class" width=100 height=20>
<param name=database value="docsearch">
 and assuming the applet file is in the /search directory, 
          then the applet will look for the file in /search/../classes/docsearch.ws 
          or, when reduced /classes/docsearch.wsIf this is not the correct location of the database file, then either 
          copy the database to that location, or change the database 
          parameter value.
 Remember that the database file must appear in the codebase 
          path of the applet, otherwise some browsers may refuse access to the 
          file, causing the applet to fail.
Check the file path for spelling.On some operating systems, the filename is case insensitive (Windows), 
        whilst on others it is not (Unix). Ensure that the codebase 
        path and database parameter path have the same case as the 
        directories and filename. The database file extension is .ws, 
        in lower case.Check that the file path is within the codebase path.As for checking the file path, ensure that the reduced file path is 
        the same or a child directory of the codebase, otherwise 
        some browsers may refuse access to the file, causing the applet to fail.Check the database file.The database file may have become corrupt, or have been replaced. Recompile 
        the database, and copy the file, then try running the applet again in 
        the browser or appletviewer. Clicking on a title causes the 
    browser to issue a 'document not found' errorWhen the user double clicks on a found document title, instead of the browser 
    opening the document, it issues a 'document not found' error message. 
     
      The path parameter is probably wrong or missing.The path parameter is used to correct the database document 
        URL with respect to the search applet HTML file 
        URL.If, for example, when compiling the database the root file is specified 
        as:
 -f /rational/application/search/doc/index.htm and the root URL as: -u http://www.ruptools.com/rup/rational/application/
search/doc/index.htm then the root file URL will be stored in the database 
          as: rational/application/search/doc/index.htm which corresponds to the identical path in both options: -f rational/application/search/doc/index.htm
-u http://www.ruptools.com/rup/rational/application/
search/doc/index.htm
 If we now suppose the search applet HTML file to be at: /rational/application/search/doc/docsearch.htm for the local file, or http://www.ruptools.com/rup/rational/application/
search/doc/docsearch.htm for the Internet URL, then we need to correct the document 
          URL references in the applet database file to move back 
          three directories: <param name=path value="../../../../"> Now, when the user clicks on a link, the browser will construct the 
          URL as follows: rational/application/search/doc/
../../../../rational/application/search/doc/index.htm for the local file, or http://www.ruptools.com/rup/rational/application/search/doc/
../../../../rational/application/search/doc/index.htm for the Internet URL, which reduces to: /rational/application/search/doc/index.htm for the local file, or http://www.ruptools.com/rup/rational/application/
search/doc/index.htm for the Internet URL. 
Copyright 
© 1987 - 2001 Rational Software Corporation
 |  | 
 
   |