Penetrator - Personal search engine Copyright (C) 2000-2002 Angel Ortega <angel@triptico.com> Home Page: http://www.triptico.com/software/penetrator.html
This software is GPL. NO WARRANTY. See file 'COPYING' for details.
The SQL table layout of version 3.x of Penetrator have changed from the 2.x series, so you'll need to recreate them (-z option) or nothing will work. If you want to preserve its current contents, you'll need to dump out your data and manually restore them. Consult your database's documentation for information on how to do this. Be also sure to read the penetratorrc.sample included here, as it contains very important information about database configuration that have changed. Penetrator 3.x depends more tightly on database features than 2.x. If you use any of the DBM drivers you don't have to worry about all of this.
Apart from this, Penetrator 3.x includes many improvements from 2.x: query results are sorted by relevance (DBI driver only), much better indexing times, a query cache and many more. Go on reading.
Penetrator is a tool for indexing big trees of text files, as your local HTML documentation or home directory. It's able to use DBM files or DBI databases and be used as a command line tool or a CGI. The files to be indexed can be selected by extension or using external file identifying programs as /usr/bin/file.
Penetrator's console version tries to emulate grep (it shares many of this command's arguments), and the CGI, a typical web search engine.
This program doesn't want to be Google; that is, is not meant to index remote sites, as it needs to have filesystem access to the documents it indexes. This may change, but not soon.
NOTE: the installation method has changed in version 3.1.x.
Penetrator installation process is like any other Perl Module:
$ perl Makefile.PL $ make # make install
This will install Penetrator.pm in your Perl module directory and the penetrator script in /usr/local/bin. If you want to use the CGI interface, you'll have to manually copy the script to your CGI directory and rename it as penetrator.cgi (a symbolic link will also do the work).
To make it work, you need to create a configuration file. It can live in /etc/penetratorrc, only for the CGI and $(HOME)/.penetratorrc, or you can change it by a command line option. Included with the package is a sample file that you can tune to fit your needs. There's much information there; be careful to read it to get the details that won't be told here.
When you have finished, you must first create the index by using
$ penetrator -z -v
(The -v is just for being verbose; you can happily ignore it). Take note that this creation is not really necessary if you use the DBM drivers; but as it's mandatory for the DBI one, it's good to get use to it).
Then you must build the index by using
$ penetrator -r -v
again the -v is just for verbose output. Depending on the size of the trees to be indexed, this process can be terribly time consuming the first time; next rebuilds will be faster, as only the changed files are re-indexed. You can include this command in a crontab if you want.
Once you have an index, you should use it. If you run it from the command line (recommended for the first test), you can just run
$ penetrator <search words>
Only the documents matching all words will be shown if you use more than one. This command tends to be quite verbose, as it shows all the lines that contains any of the chosen words; its output is very much like the one from grep. If you just want to see the lines containing all the keywords searched (that is, exclude lines not containing all the search terms), you must use
$ penetrator <search words> -s
then the output will be less verbose. Also, you may want to know just the file names that match all the keywords; then, use
$ penetrator <search words> -l
note that this is again very similar to the output of grep -l .
Words can also be prepended by a - or ~ to exclude documents containing them from the final result.
Penetrator.cgi works like most CGIs, so just give it a try. The CGI also searches for a 'penetratorrc' file in the current directory, so it's a good idea to configure one for it. Again, take a look at the sample configuration file to see what you can do.
You can add a call to penetrator.cgi by using the following HTML:
<form action='/cgi-bin/penetrator.cgi'> <i>Penetrator</i> Search <input name=query value=''> <input type=submit value='Search'> </form>