Welcome! Log In Create A New Profile

Advanced

Add or index single page?

Posted by Vertikal 
Add or index single page?
May 21, 2015 11:41PM
I have installed Sphider on a web site that used to run an ancient and heavily modified phpdig.
The new search engine works like a charm and was both fast and easy to install and adapt.
The first index run took about 20 hours, but managed to run to the end. Subsequent runs have taken almost as long.

My problem is that I do not want to run a complete reindex when the search index needs updating. I know what specific - and typically few - pages need to be added or updated, and want to run an indexing on these alone with no links followed or at least only one level of links followed, so that the changes are incrementally added to the existing index with short, manually started index runs.

I can't seem to find anywhere where this is mentioned in the docs or here - or anywhere else for that matter.

Is this at all possible?

I don't want to divide my site/search into sections or categories. One large index suits me fine, but does that mean that a complete run is necessary every time I add or change a single article?

Martin
Tec
Re: Add or index single page?
May 23, 2015 05:29PM
<<< Subsequent runs have taken almost as long >>>
Invalid statement.
During first indexation, Sphider extracts all links and keywords of each page of your site. Additionally the MD5 checksum is stored individually for each page.
Then, during re-indexing, only the MD5 checksum is verified for each page. As long as the content was not modified, (of course) checksum will be the same like during first indexation. This test is running very fast. Much faster than extracting again all links and text content of each page, and verifying whether they are already known and stored in database. This is one of the tricks, Ando implemented, when developing this search engine.
Only if a new link is added to a page, or the content was modified since last index procedure, the MD5 will be changed. During re-indexing, only the pages containing a different, or unknown, MD5 will be executed completely. New links and new keywords of these pages will be added to the database, and herewith will become searchable together the already existing keywords.

Tec
Re: Add or index single page?
May 28, 2015 02:40AM
Tec,

>Invalid statement.

With all due respect: There's nothing invalid there! I don't know how you know what time it took. I was here, you weren't...

Sure the pages won't be reindexed if they are the same and sure the MD5 check is quicker than an indexing, but... the pages change!
Not the main content on each page, but lots of other things like calendar event lists, related content, user comments and such. Then the MD5 check sees an updated page even though the indexed content marked by !--sphider_noindex-- lines is the exact same. And rightfully so. The page HAS changed. It just doesn't need to be reindexed.

I know my site and my content in detail and know what pages need to be updated and added, and do not want to do a complete run through thousands of pages every time I add or edit a singe or a few pages. There's no reason to.

I need a way to add a single page and maybe just the first level linked pages from that page - manually and on demand, preferably through a simple form. If it isn't an option, I'll probably need to develop some kind of function myself or hack a bit.

We'll see.

Martin
Tec
Re: Add or index single page?
May 28, 2015 10:03AM
You may use Sphider-plus, which allows you to index only those URLs placed in a sitemap.xml file. Create a sitemap file, containing only the URLs of your modified pages.

But of course, if you modify 80% off all your pages, re-indexing will run nearly as long as your first index procedure. The same as by checking MD5 . . .

Tec
Re: Add or index single page?
May 28, 2015 11:45AM
Tec,

I did register and pay for Sphider Plus some time ago and wanted to implement it, but never got that far. Now my password has expired and I have to buy is again, which I am reluctant to do.

It may be able to do what I want, but I also remember finding it way too complex for my taste and remember the admin interface as somewhat convoluted. Sphider in the free version is just my kind of system: compact, simple, efficient.

I think I'll work on a small mod in stead.

Thanks for your input and thanks for the effort you put into both this and Sphider Plus.

Martin
Sorry, only registered users may post in this forum.

Click here to login