Welcome! Log In Create A New Profile

Advanced

Sphider simply will NOT index my PDF files. It's a conspiracy, somebody call the cops!

Posted by cobman 
I used to enjoy sanity. Now I barely remember it.

I've spent three frustrating days working with Sphider, downloading Xpdf binaries, reading every scrap of related advice Google conjures up, and experimenting until my eyes bleed! Sphider absolutely refuses to index my PDF files.

I have tested a hundred different "pdftotext" variations, extentions, pathways, locations, code strings, settings and permissions.
I have deleted the entire system - every file and process - and started from scratch twice!
I am ready to kill small animals.

a) My computer runs Windows 7 on a 64-bit platform
b) My websites are hosted on a commercial Linux server accessed via Cpanel
c) My PDF files contain clean copy-able text and are NOT password protected
d) I have carefully read and tested every related tip, trick, technique and tutorial in this forum and many others.

Here's the problem:

Sphider reads my HTML pages perfectly, but returns the ridiculous PDF error message "Page contains less than 10 words" every time.

I am going to tie concrete blocks to my head and jump off the ferry.

I am not a programmer, but a reasonably well-rounded website builder.

Any help would be hugely appreciated !!
Re: Sphider simply will NOT index my PDF files. It's a conspiracy, somebody call the cops!
January 28, 2011 11:10PM
Unfortunately I don´t know what might be the problem so I can´t be of any help but I'd like to let you know that I admire your determination.

I hope someone else will show up and give you the tip that will keep you from insanity.
Thanks Willy, appreciate the sentiment. I hope someone helps me out, too.

In the meantime, I've joined a lawn bowling club ... sanity is optional.

drinking smiley
i wonder where the support teams are? it's been a week or so, since you posted this, and no support is coming? hmmm?
I am revisiting this forum again.

There is no support team in this sphider forum. We just kind of help each other, if you have a good idea.

I do not have a problem indexing PDF files.

Make sure you have installed little software suggested by the author (Ando) and configure it right in the server. Review the documentation again.

My company's sphider has indexed more than 90.0 GB of websites already.

Virtue
I hope you've found a solution in the meantime, but here's some feedback, just in case smiling smiley

Have you checked your web server's mime types? From a quick search of Sphider, it expects pdf files to be reported as “application/pdf”. If the web server's mime types are not set up correctly, then Sphider will not know that a particular file is pdf.

HTH - Pete
I have the same problem.
I put the pdftotext.exe in the sphider directory and set the code in conf.php to read c:\sphider\pdftotext.exe

Is that the correct location and pointer and why is it coded like that. It is NOT the C drive but a directory on an ISP's server and the sphider directory is not in the root.

Given that the above has received no reply for 12 months is there any point in even bothering to get sphider to work? It seems that there is no support for it and it is badly broken. The PDF indexing does not work and the link on their web site to the doc file indexer is broken.
yes, pmolsen right.

it's not working indexing through pdf files.

sad smiley thumbs down
Re: virtue
August 11, 2012 06:15PM
How did you get to 90GB?! Mine crashes silently after less than a gig!
Sorry, only registered users may post in this forum.

Click here to login