<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel>
        <title>Indexing PDFs on Linux with Sphider-plus (solved for me)</title>
        <description> Hours later I found a working solution indexing PDFs on a shared linux host with Sphider-plus. Here's my way:

Sphider-plus includes a pdf converter but this doesn't work on linux systems because of impossibility running exe files on linux.

Instead of using the built-in converter try this:
1. Download the linux related pre-compiled binary of pdftotext included in the xpdf bundle from: www.foolabs.com/xpdf/download.html
2. Unzip/untar the package and save only the pdftotext file (it has no extension, that's ok)
3. Rename &amp;quot;pdftotext&amp;quot; to &amp;quot;pdftotext.script&amp;quot;
4. Upload via FTP this file to the &amp;quot;converter&amp;quot; directory of Sphider-plus
5. Identify the physical path of your web site (your hoster should provide this information anywhere)
6. Create an empty text file and into this write two lines:
#!/bin/sh
/PATH/TO/YOUR/WEB/DOWN/TO/converter/pdftotext.script $1 - 
7. Adapt the full path above to your needs and use simple slashes (not double backslashes)
Second line begins with a slash and ends WITH the minus sign!
(Thanks to the user posted this hint sometimes ago)
8. Save this file as &amp;quot;pdftotext&amp;quot; (without the quotes)
9. Upload it to the converter dir
10. Set permissions of both pdftotext and pdftotext.script to 755 or 777 (whatever needed to run correctly)
11. Set permissions of the converter dir to 777! Otherwise indexing fails because of pdftotext is unable to write a temp file needed!
12. Last: change the pdftotext path in conf.php to:
$pdftotext_path = '/PATH/TO/YOUR/WEB/DOWN/TO/converter/pdftotext';

Now it should work fine. For me it does.</description>
        <link>http://www.sphider.eu/forum/read.php?2,4413,4413#msg-4413</link>
        <lastBuildDate>Fri, 10 Sep 2010 18:37:23 +0300</lastBuildDate>
        <generator>Phorum 5.2.10</generator>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,4413,7399#msg-7399</guid>
            <title>Re: Indexing PDFs on Linux with Sphider-plus (solved for me)</title>
            <link>http://www.sphider.eu/forum/read.php?2,4413,7399#msg-7399</link>
            <description><![CDATA[ Yep if possible please let us know how you intergrated it into sphider. Thanks.]]></description>
            <dc:creator>LorraineGodsey</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Tue, 27 Apr 2010 15:19:37 +0300</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,4413,6638#msg-6638</guid>
            <title>Re: Indexing PDFs on Linux with Sphider-plus (solved for me)</title>
            <link>http://www.sphider.eu/forum/read.php?2,4413,6638#msg-6638</link>
            <description><![CDATA[ How did you integrate the function into Sphider???<br />
<br />
It might be good info for others running into issues with the PDFtoText converters.]]></description>
            <dc:creator>jbenton</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Wed, 07 Oct 2009 02:10:13 +0300</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,4413,6552#msg-6552</guid>
            <title>Re: Indexing PDFs on Linux with Sphider-plus (solved for me)</title>
            <link>http://www.sphider.eu/forum/read.php?2,4413,6552#msg-6552</link>
            <description><![CDATA[ If you are using debian / ubuntu as your server its very easy,<br />
<br />
# apt-get install xpdf<br />
<br />
Then set &quot;Full executable path to PDF converter&quot; to /usr/bin/pdftotext<br />
<br />
Thats it!]]></description>
            <dc:creator>joepc</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Thu, 10 Sep 2009 16:29:10 +0300</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,4413,6519#msg-6519</guid>
            <title>Re: Indexing PDFs on Linux with Sphider-plus (solved for me)</title>
            <link>http://www.sphider.eu/forum/read.php?2,4413,6519#msg-6519</link>
            <description><![CDATA[ Hello,<br />
Newbiest to all that stuff.<br />
Goal : create accessible repository of pdf files (exclusively) that will be accessed by visitors. Fulltext search.<br />
<br />
Work done up to now : <br />
- Created mysql database.<br />
- Filled database.php file<br />
- Installed Sphider on shared server (dreamhost). Database installed properly. No problem so far.<br />
<br />
- Downloaded pdftotext file. Renamed it pdftotext.script. Created pdftotext file and changed name of pdftotext.script. Placed those files in the converter directory.<br />
- Chmoded Converter directory to 777 and files to 775.<br />
- Changed user and pw in auth.php<br />
- Created datapdf directory. Pointed to it using /cmxxx.com/searchpub/datapdf<br />
- Modified conf.php<br />
  $pdftotext_path = '/mydomain.com/converter/pdftotext'; Got this from FileZilla (replaced mydomain with cmxxxxxx.com<br />
- Modified spider.php file to check if file exist. Launched IndexAll and got the file does not exist message.<br />
<br />
Now, one problem. Obviously, I do not know how to set the path properly. I tried [<a href="http://www.cmxxxx.com/converter/" rel="nofollow" >www.cmxxxx.com</a>] and failed.<br />
<br />
Can anyone of you nice people can help me here?]]></description>
            <dc:creator>alexdp</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Fri, 28 Aug 2009 20:27:43 +0300</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,4413,6511#msg-6511</guid>
            <title>Re: Indexing PDFs on Linux with Sphider-plus (solved for me)</title>
            <link>http://www.sphider.eu/forum/read.php?2,4413,6511#msg-6511</link>
            <description><![CDATA[ your a genius]]></description>
            <dc:creator>txstate2005</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Fri, 28 Aug 2009 00:57:14 +0300</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,4413,6144#msg-6144</guid>
            <title>Re: Indexing PDFs on Linux with Sphider-plus (solved for me)</title>
            <link>http://www.sphider.eu/forum/read.php?2,4413,6144#msg-6144</link>
            <description><![CDATA[ Hi to all!<br />
I also have the same problem of many others: Sphider 1.3.4 doesn't index PDF files, and I tried the solutions posting on this forum without results. So, I consider to use a PHP function to convert them. I found this:<br />
[<a href="http://community.livejournal.com/php/295413.html" rel="nofollow" >community.livejournal.com</a>]<br />
and it works!<br />
So now, how can we include that script in spiderfuncs.php? If pdftotext fails (maybe for server settings, on my PC it works fine...), probably it's more easy to use something else! ;)]]></description>
            <dc:creator>RedWolf</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Sat, 30 May 2009 06:03:00 +0300</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,4413,4519#msg-4519</guid>
            <title>Re: Indexing PDFs on Linux with Sphider-plus (solved for me)</title>
            <link>http://www.sphider.eu/forum/read.php?2,4413,4519#msg-4519</link>
            <description><![CDATA[ Thank for sharing your solution.<br />
I am totally new to both Sphider and Sphider-Plus, and apply the PDF indexing using your methods(in Suse Linux with Sphider-Plus). It works well.<br />
<br />
But how about indexing the Word document.<br />
I downloaded the catdoc-0.94.2, but I don't know how to make it run in Linux env.<br />
<br />
If you have any ideas, please help. I searched about this in the forum but still have not found a proper solution.]]></description>
            <dc:creator>nemo_anhoa</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Thu, 10 Jul 2008 06:59:01 +0300</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,4413,4421#msg-4421</guid>
            <title>Re: Indexing PDFs on Linux with Sphider-plus (solved for me)</title>
            <link>http://www.sphider.eu/forum/read.php?2,4413,4421#msg-4421</link>
            <description><![CDATA[ I'm new to Sphider, but adapting your instructions, I got the converter working with regular Sphider (not Sphider-Plus).  Thanks.<br />
<br />
My only problem is that it doesn't seem to work if I index via command-line, but it does work via the web interface.  Unfortunately this client's web host seems to have a ridiculously low limit on memory size, so any PDF over 3mb generates a fatal error.  No access to PHP.INI, unfortunately.]]></description>
            <dc:creator>Dekortage</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Tue, 01 Jul 2008 22:58:26 +0300</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,4413,4413#msg-4413</guid>
            <title>Indexing PDFs on Linux with Sphider-plus (solved for me)</title>
            <link>http://www.sphider.eu/forum/read.php?2,4413,4413#msg-4413</link>
            <description><![CDATA[ Hours later I found a working solution indexing PDFs on a shared linux host with Sphider-plus. Here's my way:<br />
<br />
Sphider-plus includes a pdf converter but this doesn't work on linux systems because of impossibility running exe files on linux.<br />
<br />
Instead of using the built-in converter try this:<br />
1. Download the linux related pre-compiled binary of pdftotext included in the xpdf bundle from: www.foolabs.com/xpdf/download.html<br />
2. Unzip/untar the package and save only the pdftotext file (it has no extension, that's ok)<br />
3. Rename &quot;pdftotext&quot; to &quot;pdftotext.script&quot;<br />
4. Upload via FTP this file to the &quot;converter&quot; directory of Sphider-plus<br />
5. Identify the physical path of your web site (your hoster should provide this information anywhere)<br />
6. Create an empty text file and into this write two lines:<br />
#!/bin/sh<br />
/PATH/TO/YOUR/WEB/DOWN/TO/converter/pdftotext.script $1 - <br />
7. Adapt the full path above to your needs and use simple slashes (not double backslashes)<br />
Second line begins with a slash and ends WITH the minus sign!<br />
(Thanks to the user posted this hint sometimes ago)<br />
8. Save this file as &quot;pdftotext&quot; (without the quotes)<br />
9. Upload it to the converter dir<br />
10. Set permissions of both pdftotext and pdftotext.script to 755 or 777 (whatever needed to run correctly)<br />
11. Set permissions of the converter dir to 777! Otherwise indexing fails because of pdftotext is unable to write a temp file needed!<br />
12. Last: change the pdftotext path in conf.php to:<br />
$pdftotext_path = '/PATH/TO/YOUR/WEB/DOWN/TO/converter/pdftotext';<br />
<br />
Now it should work fine. For me it does.]]></description>
            <dc:creator>rasc</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Mon, 30 Jun 2008 10:55:31 +0300</pubDate>
        </item>
    </channel>
</rss>
