# DIRECTIVES COMMON to HTTP and FILESYSTEM METHODS ################################################### IndexDir /home/yourusername/public_html # For the FileSystem Method: # This is a space-separated list of files and # directories you want indexed. You can specify # more than one of these directives. # # For the HTTP Method: # Use the URL's from which you want the spidering # to begin. # NOTE: use hmtl files rather than directories # for this method. IndexFile /home/yourusername/public_html/swish.index # This is what the generated index file will be. IndexName "Site Index" #IndexDescription "Default Index" #IndexPointer "http://www.cgi101.com/" #IndexAdmin "Webmaster (webmaster@yoursite.com)" # Extra information you can include in the index file. # MetaNames first author # List of all the meta names used in the file to index, must be on one line. # If no metanames DO NOT deleted the line. IndexReport 3 # This is how detailed you want reporting. You can specify numbers # 0 to 3 - 0 is totally silent, 3 is the most verbose. FollowSymLinks no # Put "yes" to follow symbolic links in indexing, else "no". #UseStemming no # Put yes to apply word stemming algorithm during indexing, # else no. See the manual for info about stemming. Default is # no. #PropertyNames author # List of meta tags names that can be retrieved with the -p option. # Index size increases as by the formula in the manual. # Comment out if no PropertyNames. Case insensitive IgnoreTotalWordCountWhenRanking yes # Put yes to ignore the total number of words in the file # when calculating ranking. Often better with merges and # small files. Default is no. #ReplaceRules remove "modules/" #ReplaceRules replace "[a-z_0-9]*_m.*\.html" "index.html" # ReplaceRules allow you to make changes to file pathnames # before they're indexed. This directive uses C library # regex.h regular expressions. # NOTE: do not use replace "" to remove a string, # use remove instead - you might get a core dump otherwise. #MinWordLimit 5 # Set the minimum length of an indexable word. Every shorter word # will not be indexed. # Commenting out the line will give the defaults #MaxWordLimit 5 # Set the maximum length of an indexable word. Every longer word # will not be indexed. # Commenting out the line will give the defaults #WordCharacters abcdefghijklmnopqrstuvwxyz\&#;0123456789.@|,-'"[](~!@$%^{}_+? # WORDCHARS is a string of characters which SWISH permits to # be in words. Any strings which do not include these characters # will not be indexed. You can choose from any character in # the following string: # # abcdefghijklmnopqrstuvwxyz0123456789_\|/-+=?!@$%^'"`~,.[]{}() # # Note that if you omit "0123456789&#;" you will not be able to # index HTML entities. DO NOT use the asterisk (*), lesser than # and greater than signs (<), (>), or colon (:). # # Including any of these four characters may cause funny things to happen. # NOTE: Do not escape \ nor " and they cannot be the first letter in the string # Commenting out the line will give the defaults #BeginCharacters m" # Of the characters that you decide can go into words, this is # a list of characters that words can begin with. It should be # a subset of (or equal to) WordCharacters # Same rule of syntax as for WordCharacters #EndCharacters \"\ # Of the characters that you decide can go into words, this is # a list of characters that words can end with. It should be # a subset of (or equal to) WordCharacters # Same rule of syntax as for WordCharacters #IgnoreLastChar # Array that contains the char that, if considered valid in the middle of # a word need to be disreguarded when at the end. It is important to also # set the given char's in the ENDCHARS array, otherwise the word will not # be indexed because considered invalid. # Commenting out the line will give the defaults # NOTE: if " is the first char in the string it needs to be escaped with \ # Do not escape otherwise #IgnoreFirstChar # Array that contains the char that, if considered valid in the middle of # a word need to be disreguarded when at the beginning. This was to solve # the problem of parenthesis when there is no space between ( and the # beginning of the word. # Remember to add the char's to the BEGINCHARS list also. # Commenting out the line will give the defaults # NOTE: if " is the first char in the string it needs to be escaped with \ # Do not escape otherwise # IgnoreLimit 80 1000 # This automatically omits words that appear too often in the files # (these words are called stopwords). Specify a whole percentage # and a number, such as "80 256". This omits words that occur in # over 80% of the files and appear in over 256 files. Comment out # to turn of auto-stopwording. #IgnoreWords SwishDefault # The IgnoreWords option allows you to specify words to ignore. # Comment out for no stopwords; the word "SwishDefault" will # include a list of default stopwords. Words should be separated by spaces # and may span multiple directives. IndexComments 0 # This option allows the user decide if to index the comments in the files # default is 1. Set to 0 if comment indexing is not required. ################################## # DIRECTIVES for FILESYSTEMS ONLY # Comment out if using HTTP ################################### IndexOnly .html # Only files with these suffixes will be indexed. NoContents .gif .xbm .au .mov .mpg .pdf .ps .jpg .png # Files with these suffixes will not have their contents indexed - # only their file names will be indexed. #FileRules pathname contains .*dir1 #FileRules filename contains # % ~ .bak .orig .old old. #FileRules title contains construction example pointers #FileRules directory contains .htaccess #FileRules filename is index # Files matching the above criteria will *not* be indexed. # The pattern matching uses the C library regex.h