HELP Sphider will not index my website

Posted by starprogrammer 
January 06, 2012 07:01AM

Sphider will not index my website.

Please someone help, I am new to sphider and have got it to find about 10 pages out of my 350 pages, could someone please look at my site as I believe the issue is with the links on the pages.. i.e first have added %20 rather than space, and some people say about / thing, have tried everything and starting to get a headace

My site is http://www.easylinksrus.com

kind regards
January 06, 2012 11:20PM
Do I need full url?

I have a base tag, cannot figure out what is wrong, I have not changed the sphider files, so I think it must be a problem with my site, it is in HTML5, is that the problem?

do I need to clear the sql tables? and repopulate when I index/ reindex site,? doesnt sound right, im putting hours in and having no luck,I have seen a few people with similar problems but theres were solved though full url or %20 instead of space, base tag etc. but none of these have worked for me.

January 06, 2012 11:37PM
The main problem is that sphider does not follow the links in the nav bar, it has only been able to index

Pages: 1

I dont know what was different in these pages to allow them to be indexed? I think the pages have since been changed.

now I try and reindex and:

Spidering http://www.easylinksrus.com/

1. Retrieving: http://www.easylinksrus.com/ at 18:35:52.
Size of page: 6.44kb. Starting indexing at 18:35:52. MD5 sum checked. Page content not changed
Links found: 0. New links: 0

Completed at 18:35:52.

I have links on the page, in the nav bars, but it will not find new links, I am a noob so It is prob bad code.

January 08, 2012 10:31PM
Solved need sphider plus
July 31, 2015 04:08PM
If you're using the free shider, you can fix this. I took some time to find the issue. Look at \admin\spiderfuncs.php, then look at the get_links function. Within that function are are regular expressions to parse the links. They do not allow for a space.

If you for example modify the expression say to this:
preg_match_all("/href\s*=\s*[\'\"]?([+:%\/\?~=&;\\\(\),._a-zA-Z0-9- ]*)(#[.a-zA-Z0-9-]*)?[\'\" ]?(\s*rel\s*=\s*[\'\"]?(nofollow)[\'\"]?)?/i", $file, $regs, PREG_SET_ORDER);


(Line ~239)

It will work.

So ya hope that helps someone in the future.
