<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel>
        <title>1.3.5 does not obey robots.txt</title>
        <description> Hi,

I updated my 1.3.4 by replacing th changed files. 

I noticed that the 1.3.5 does not completely follow (it follows some and indexed some!) my robots.txt, while the 1.3.4 did follow fully.

Can anybody guide me please to solve this problem? Do I have to have php5+ to use 1.3.5? Thanks.</description>
        <link>http://www.sphider.eu/forum/read.php?2,6969,6969#msg-6969</link>
        <lastBuildDate>Wed, 22 May 2013 09:25:35 +0300</lastBuildDate>
        <generator>Phorum 5.2.10</generator>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,6969,10197#msg-10197</guid>
            <title>Re: 1.3.5 does not obey robots.txt</title>
            <link>http://www.sphider.eu/forum/read.php?2,6969,10197#msg-10197</link>
            <description><![CDATA[ Before the release of seven with Mount accompany went from 1.03 now quite busy recently,<a href="http://www.diabloiiigold.com" rel="nofollow" >D3 Gold</a> so it is no time to continue to release. The purpose of this set is still the same, that is, want to give you a scientific fitted ideas, so as to achieve full anti, blood, steal relatively satisfied with the panel, looking pickup blood cells and other related additional properties and from what part of Value. Of course, these additional properties is not to say this one is the best,<a href="http://www.diabloiiigold.com" rel="nofollow" >buy d3 gold</a> but also according to the specific circumstances of AH to decide, just an idea.]]></description>
            <dc:creator>Tuzi</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Wed, 20 Feb 2013 05:46:26 +0200</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,6969,10154#msg-10154</guid>
            <title>Re: 1.3.5 does not obey robots.txt</title>
            <link>http://www.sphider.eu/forum/read.php?2,6969,10154#msg-10154</link>
            <description><![CDATA[ Thank you very much! Works very well now.<br />
<br />
I owe you a pint.<br />
<br />
(tu)]]></description>
            <dc:creator>docbear</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Sun, 27 Jan 2013 09:08:35 +0200</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,6969,9022#msg-9022</guid>
            <title>Re: 1.3.5 does not obey robots.txt</title>
            <link>http://www.sphider.eu/forum/read.php?2,6969,9022#msg-9022</link>
            <description><![CDATA[ Kudos on that patch, Matt. I've been pounding my head against this for days thinking I was doing something wrong. You're my hero.]]></description>
            <dc:creator>halfacre</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Fri, 02 Dec 2011 02:41:09 +0200</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,6969,8241#msg-8241</guid>
            <title>Re: 1.3.5 does not obey robots.txt</title>
            <link>http://www.sphider.eu/forum/read.php?2,6969,8241#msg-8241</link>
            <description><![CDATA[ Hello Everyone Thank you :):)]]></description>
            <dc:creator>berbatov</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Fri, 21 Jan 2011 01:20:22 +0200</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,6969,8228#msg-8228</guid>
            <title>Re: 1.3.5 does not obey robots.txt</title>
            <link>http://www.sphider.eu/forum/read.php?2,6969,8228#msg-8228</link>
            <description><![CDATA[ Thanks heaps for your patch, that solved a big hassle I was having..!<br />
<br />
Ross..]]></description>
            <dc:creator>rossv</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Thu, 13 Jan 2011 10:04:49 +0200</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,6969,7887#msg-7887</guid>
            <title>Re: 1.3.5 does not obey robots.txt</title>
            <link>http://www.sphider.eu/forum/read.php?2,6969,7887#msg-7887</link>
            <description><![CDATA[ Some sites have different rules set in the META TAGS for robots regardless of robot.txt file, some are not set for sphider unless they say &quot;index, follow&quot;.]]></description>
            <dc:creator>ClickRaider</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Sun, 29 Aug 2010 07:40:56 +0300</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,6969,7819#msg-7819</guid>
            <title>Re: 1.3.5 does not obey robots.txt</title>
            <link>http://www.sphider.eu/forum/read.php?2,6969,7819#msg-7819</link>
            <description><![CDATA[ Thankd man, perfect!]]></description>
            <dc:creator>Malcolm</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Fri, 06 Aug 2010 05:14:04 +0300</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,6969,7261#msg-7261</guid>
            <title>Re: 1.3.5 does not obey robots.txt</title>
            <link>http://www.sphider.eu/forum/read.php?2,6969,7261#msg-7261</link>
            <description><![CDATA[ Hi again all, there's yet another problem in the  check_robot_txt function within spiderfuncs.php.<br />
<br />
around line 217, it has<br />
<pre class="bbcode">
return null;</pre>
<br />
this tells it to return null and exit the function when disallow rule returns nothing. Well, now this is a huge problem because most robots.txt start  with a general &quot;disallow none&quot; rule and then further restrict directories.<br />
<br />
i.e,<br />
<pre class="bbcode">
User-agent: *
Disallow:
Disallow:  /cgi-bin/ 
Disallow: /private/</pre>
<br />
and on and on, the way this is written it exits the function when it see the first Disallow rule. So change the return null line (line 217) to this;<br />
<pre class="bbcode">
continue;</pre>
<br />
this tells the script to simply continue to check the next line within the loop.<br />
<br />
Now it finally parses and applies rules from robots.txt correctly.<br />
<br />
Now when you index you should see the disallowed rules at the top of the output/logs, i.e.<br />
<pre class="bbcode">
Disallowed files and directories in robots.txt:
http: //example.com/cgi-bin/
http: //example.com/private/</pre>
etc, etc. I added a space because the forum software keeps wanting to linkify the text.<br />
<br />
I have updated the patch at [<a href="http://mdj.us/media/spiderfuncs.patch" rel="nofollow" >mdj.us</a>]]]></description>
            <dc:creator>matt</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Thu, 04 Mar 2010 20:34:27 +0200</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,6969,7243#msg-7243</guid>
            <title>Re: 1.3.5 does not obey robots.txt</title>
            <link>http://www.sphider.eu/forum/read.php?2,6969,7243#msg-7243</link>
            <description><![CDATA[ Thanks Matt,<br />
<br />
I tiny but significant change...(tu)]]></description>
            <dc:creator>Willy</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Thu, 25 Feb 2010 19:40:51 +0200</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,6969,7231#msg-7231</guid>
            <title>Re: 1.3.5 does not obey robots.txt</title>
            <link>http://www.sphider.eu/forum/read.php?2,6969,7231#msg-7231</link>
            <description><![CDATA[ Thanks matt for taking time to help, really appreciate it.]]></description>
            <dc:creator>t-p</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Mon, 22 Feb 2010 23:39:28 +0200</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,6969,7230#msg-7230</guid>
            <title>Re: 1.3.5 does not obey robots.txt</title>
            <link>http://www.sphider.eu/forum/read.php?2,6969,7230#msg-7230</link>
            <description><![CDATA[ Hey guys,<br />
<br />
I discovered the same thing, looks like when the author updated the eregi functions to preg_match, he forgot to use the case modifier. <br />
<br />
Basically, if you have User-agent instead of user-agent, it won't count it.<br />
<br />
Anyways, I have created a patch for it here; [<a href="http://mdj.us/media/spiderfuncs.patch" rel="nofollow" >mdj.us</a>]<br />
<br />
If you don't know what the hell that is, then just open spiderfuncs.php and look for preg_match and make sure to add the &quot;i&quot; modifier in the regular expression if it's missing. There are three instances to change.]]></description>
            <dc:creator>matt</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Mon, 22 Feb 2010 19:34:18 +0200</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,6969,7167#msg-7167</guid>
            <title>Re: 1.3.5 does not obey robots.txt</title>
            <link>http://www.sphider.eu/forum/read.php?2,6969,7167#msg-7167</link>
            <description><![CDATA[ t-p Wrote:<br />
-------------------------------------------------------<br />
&gt; Hi,<br />
&gt; <br />
&gt; I updated my 1.3.4 by replacing th changed files.<br />
&gt; <br />
&gt; <br />
&gt; I noticed that the 1.3.5 does not completely<br />
&gt; follow (it follows some and indexed some!) my<br />
&gt; robots.txt, while the 1.3.4 did follow fully.<br />
&gt; <br />
&gt; Can anybody guide me please to solve this problem?<br />
&gt; Do I have to have php5+ to use 1.3.5? Thanks.<br />
<br />
<br />
t-p,<br />
<br />
I just, today, installed 1.3.5 and also noticed it does not obey robots.txt. - I would be interested if anyone has a fix for this, as well. I have two existing installs using 1.3.4 (but they show 1.3.3 in admin...)<br />
<br />
BTW, I'm using php5 on my servers and that is not the problem...]]></description>
            <dc:creator>Convergence</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Thu, 11 Feb 2010 03:13:05 +0200</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,6969,6969#msg-6969</guid>
            <title>1.3.5 does not obey robots.txt</title>
            <link>http://www.sphider.eu/forum/read.php?2,6969,6969#msg-6969</link>
            <description><![CDATA[ Hi,<br />
<br />
I updated my 1.3.4 by replacing th changed files. <br />
<br />
I noticed that the 1.3.5 does not completely follow (it follows some and indexed some!) my robots.txt, while the 1.3.4 did follow fully.<br />
<br />
Can anybody guide me please to solve this problem? Do I have to have php5+ to use 1.3.5? Thanks.]]></description>
            <dc:creator>t-p</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Thu, 07 Jan 2010 00:11:46 +0200</pubDate>
        </item>
    </channel>
</rss>
