# MiniSearch v0.2 # Personal Website Search Engine # http://www.dansteinman.com/minisearch/ # # Copyright (C) 1999 Dan Steinman # Distributed under the terms of the of the GNU General Public License # You may modify this progam, redistribute it in it's entirety, # but any significant improvements must be made public # Read the LICENSE file for more details ABOUT MiniSearch is a website search engine that I built for my own website, http://www.dansteinman.com. I have released it under the free GNU license in the hopes that it will be useful to others. This is not commercial software therefore there's no warranty, or support. DISCLAIMER MiniSearch is a small, efficient, but not very sophisticated search engine. It uses a simple search/index scheme to search a website. First, the 'searchindex' script generates index files with all the words in all the files. Then, the 'search' script examines this index to return the results of a query. This scheme is not ideal for very large websites (more than 1000 files), but it does handle several hundred files fairly quickly without problems. There is no search algorithym, so your results will be returned in an arbitrary order. REQUIREMENTS: The scripts: search, and searchindex, are written for Perl 5.x and have only been tested on Linux/Apache. http://www.apache.org/ (Apache webserver) http://www.perl.com/ (Perl interpreter) These script don't require any Perl modules or libraries and thus should work with any fairly version of Perl 5. They may or may not work with the Win32 (Window 9x/NT) port of Perl 5. If you have no idea what I'm referring to by Perl, or CGI, than this script should not be your starting point. I would suggest you familiarize youself by reading a few tutorials or buying a good book on the subject. There's a few tutorials listed at: http://www.gustavo.net/programming/cgi__tutorials.shtml INSTALLATION Copy the 2 scripts and the /searchdata/ directory somewhere in your webserver's /cgi-bin/ directory. If your webserver only handles .pl or .cgi scripts, you can rename both the files, they will still work. Your initial directory structure should look something like this: /cgi-bin/ /cgi-bin/search (search script) /cgi-bin/searchindex (index script) /cgi-bin/searchdata /cgi-bin/searchdata/templates /cgi-bin/searchdata/templates/default-prompt.txt (prompt template) /cgi-bin/searchdata/templates/default-results.txt (results template) You must set the permissions of the search and searchindex scripts to 755. Most ftp software will also allow you to change the permissions of files. If not you can execute the following commands telnet/shell prompt: chmod 755 search chmod 755 searchindex Now visit the searchindex script via the webserver: http://yourdomain/cgi-bin/searchindex If it says "A configuration file has been generated..." it worked. Use the temporary password 'minisearch' to enter the administration page. If it says "There was a problem generating the configuration file. " then the script wasn't able to write to the /searchdata/ directory. In this case, try to set the permissions of the /searchdata/ directory to 777: chmod 777 searchdata If after that you still cannot get the script to work you'll have to do some detective work, double check that your files are the proper position, their permissions, webserver configurations etc. ADMINISTRATION / CONFIGURATION The administration page gives you all the controls you'll need to change how MiniSearch will search your website. It should be fairly obvious. Try to index your site to see what results you get. It will give you a listing of all the files it found. If you set the Search Directories to "/", and have "recurse" on, then it will index everything. Any "Exclude" directories will be skipped. By default it will only index the following filetypes: html, htm, phtml, shtml, php3, asp To add more you'll have to edit the searchindex script, the line to edit is denoted by "#### extensions to search". Templates: Two generic templates are provided: /templates/default-prompt.txt /templates/default-results.txt The 'prompt' template is the file that will be displayed when you first visit your search page (eg. http://yourdomain/cgi-bin/search). This is a static file, it can be edited as needed. Take a good look at how the example FORM fields are laid out, they'll show you how you can add extra parameters like what directories to search in (the "where" parameter). Also note, you can copy the FORM fields and insert them into any of your other webpages. The 'results' template is the file that will be displayed when you get results from the search. There are variables such as ##QUERY##, that the search script will parse with values, you'll see how it works. You'll probably want to keep those fields, but all the HTML tags around them can be edited as needed. The 'default' templates will determine the look of the pages when you visit the search script by itself (eg. http://yourdomain/cgi-bin/search). You can add more templates if you're like me and have several websites. You can use the same search scripts, but have different templates for each site. If you add another template set named "mytemplate", you can view that template by going to: http://yourdomain/cgi-bin/search?view=mytemplate Forum Support: This search script can search messages in the discussion forum that I wrote, DForum (formerly named Dan Steinman's Forum). It is available at: http://www.dansteinman.com/dforum/ Just type in the forumdata directory (eg. /cgi-bin/forumdata) , and the url (eg. /cgi-bin/forum.pl) and MiniSearch will handle the rest. In your "prompt" template, if you want to specify a forum you need to enter the url, an example was provided in the default-prompt.txt file. Other than that everything should be self explanitory. There shouldn't be any need for you to edit the search or index scripts unless they are causing you some real troubles or you want to play around with it. I'd certainly appreciate any suggestions, improvements, comments, etc. If you successfully implement it on a website I would be most pleased to link to it from my website as an example of how it can be used. Good luck, Dan Steinman dan@dansteinman.com www.dansteinman.com