Welcome! Log In Create A New Profile

Advanced

search pdf-files WITHOUT using pdftotext.exe

Posted by kashmir 
search pdf-files WITHOUT using pdftotext.exe
March 02, 2008 11:44AM
Re: search pdf-files WITHOUT using pdftotext.exe
March 03, 2008 12:09AM
Re: search pdf-files WITHOUT using pdftotext.exe
March 21, 2008 08:03PM
This is great. I am hosted on yahoo and they do not allow pdftotext.exe to run. So your solution will be superb if it works. Could you post a step-by-step guide on how to get this working.
Re: search pdf-files WITHOUT using pdftotext.exe
March 21, 2008 08:13PM
i am very interested too
Re: search pdf-files WITHOUT using pdftotext.exe
March 21, 2008 11:17PM
Re: search pdf-files WITHOUT using pdftotext.exe
March 22, 2008 07:55AM
I tried Sphider-Plus. It is better (but needs php 5.x) in most aspects from the regular sphider. However, in so far as pdf files are concerned. all it does is to include pd2text.exe in the converter package. That exe still needs to run on the server. So, if the server prohibits exe files, you are where you are.

This is why a php converter would be great.

A step by step guide please would be greatly appreciated.
hmc
Re: search pdf-files WITHOUT using pdftotext.exe
March 24, 2008 07:02AM
A "how to" list, that worked for me (eventually!).

(Note: I still have to find a way to force admin.php to reindex an entire site without having to "change" every html page - there's some help in the forum that I'll try.)

Make a backup copy of your working sphiderfuncs.php and conf.php files

Copy kashmir's post 3 Mar 07 0009h (12:09am) (from <?php of course) and save to the admin folder, overwriting the existing sphiderfunc.php

Change conf.php as follows:

Suggest this line be up to date:
$version_nr = '1.3.4b';

Set logging on so you can see what is happening:
$keep_log = 1;

Set pdf on:
// Index pdf files
$index_pdf = 1;

Comment out:
//executable path to pdf converter
//$pdftotext_path = 'c:\temp\pdftotext.exe';

Transfer the amended conf.php and sphiderfuncs.php to your domain host

Ensure at least one v1.2 pdf file exists on your domain, using download3k.com if needed to create a v1.2 pdf from a later version. It is best if this pdf has at least one word in it that does not exist elsewhere on the site, so you can prove to yourself the pdf indexing routines are working properly.

Note: The pdf you use as a test must have recognisable text in it, in other words it cannot just be a scanned image, it has to be a pdf file that has been produced either by using Acrobat Distiller, or some other pdf creation utility from a text or doc file, or has been "recognised" (to use Adobe's terminology).

Transfer the v1.2 pdf to your domain host, and create a new link to it on any existing page - this will ensure the re-indexing process will see the new link and index it and the v1.2 pdf file, irrespective of your re-indexing settings and even if your site has previously been re-indexed.

Then re-index your site and then look at the log. Did the test pdf file get indexed? It should have been! All of the words in the test pdf will have been parsed and added as needed to the sphinder database.

Then do a search for that unique word in the v1.2 test pdf online within your domain to prove all is well.

If it all works, the next step - to convert all of your site's pdf files back to v1.2!
Re: search pdf-files WITHOUT using pdftotext.exe
March 26, 2008 10:23PM
Thanks very much. It works very well with sphider. I tested on localhost (win xp) and on the yahoo server (BeOS). The files were nicely indexed.

However, it did not work with sphider-plus. Not only were the pdf files not indexed but the changes to the sphiderfunc and conf files screwed up the existing instllation.

Any ideas on how to make it compatible with sphider-plus?
Re: search pdf-files WITHOUT using pdftotext.exe
April 15, 2009 06:30PM
This looks like an excellent way to index pdfs without the need for an exe on the server.

Has anybody managed to get this working with a pdf version 1.4 file however?

Thanks in advance

Robert
Re: search pdf-files WITHOUT using pdftotext.exe
July 04, 2011 07:51PM
refer this for PDF file search
[techpdf.in]
Re: search pdf-files WITHOUT using pdftotext.exe
January 31, 2012 10:35PM
Thanks hmc. But I see no longer the php script of kashmir's post 3 Mar 08 (12:09am) to which you refer. Where can I find it?
Sorry, only registered users may post in this forum.

Click here to login