Welcome! Log In Create A New Profile

Advanced

Sphider-plus version 2.9 released

Posted by Tec 
Tec
Sphider-plus version 2.9 released
November 02, 2012 01:34PM
As it might be of interest also for the users of original Sphider:
Meanwhile the latest development of Sphider-plus is available at [www.sphider-plus.eu]

In front of version 2.8 of Sphider-plus the following items have been added / modified:


New feature:
Support for non-ASCII URLs using 'Internationalized Domain Names' (IDN).
It is a standard described in RFC 3490, RFC 3491 and RFC 3492.
If activated, internationalized domain names like 'http://президент.рф/' and 'http://müller.de/'
will be accepted as new sites in Admin backend, as well as in User's addurl form.

New feature:
Support Punycode URLs like http://xn--90aoqlh7c4a.xn--d1abbgf6aiiy.xn--p1ai/
Converted into the readable form http://события.президент.рф/
To be activated in Admin settings.

New feature:
Besides the usual HTML elements <element> , also delete from full text all those HTML elements,
which are defined like &lt; element &gt;
To be activated in Admin settings.

New feature:
Index only parts of a page, defined by <element> . . . </element>
This feature is foreseen to cooperate with the new HTML5 elements like
section, nav, aside, hgroup, article, header, footer, etc
If enabled in Admin settings, the values as defined in the list-file
…/include/common/elements_use.txt will be used to index only the page content between
<element> . . . </element>

New feature:
Ignore parts of a page, defined by <element> . . . </element>
This feature is foreseen to cooperate with the new HTML5 elements like
section, nav, aside, hgroup, article, header, footer, etc
If enabled in Admin settings, the values as defined in the list-file
…/include/common/elements_not.txt will be used to remove the content between
<element> . . . </element> from the page content.
This is the contrary function to 'Index only parts of a page, defined by <element> . . . </element>'

New feature:
Index only files and documents with defined suffix :
If activated, all pages of the site will be searched for links,
but only files with suffixes as defined in the docs list will be indexed.

New feature:
1. Perform a WHOIS check for sites waiting for approval in Admin backend.
2. Perform a WHOIS check for suggested URLs direct in the addurl form,
so that invalid URLs will automatically be rejected.
For both tests a basic list of WHOIS servers for the generic top level domains
and some important country codes (supporting 30 suffixes),
or an extended list (supporting 155 suffixes) are selectable.

New option to be activated in Admin backend:
Crawler can leave domain during index procedure, but only for canonical links.
Only the canonical link will be indexed, but links found there will be ignored.

New feature:
Obey the 'refresh' meta tags as part of HTML headers.
Now following the redirection and delayed indexing.

New option:
Support UTF-16 coded sites. Will convert UTF-16 coded sites into UTF-8.
To be activated in Admin settings

New option:
For index procedure always use the standard Firefox HTTP_USER_AGENT string
and ignore the individual defined Sphider-plus string. To be activated in Admin backend.

New feature
Follow redirections, which are invoked by JavaScript, when sent as HTTP content.
Will obey directives like:
<SCRIPT language="javascript">window.location="mp.php?mcv=59";</SCRIPT>

New feature:
Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes.

New feature:
Separated PDF converter supplied for 32 and 64 bit Operating Systems.

New feature:
Follow links placed in JavaScript files. Will detect and follow links like
document.write(' <a href="new_12.pdf">All news 2012</a> ');
Also the complete content of
document.write( this text in all rows');
will be indexed and stored as keywords in db.

New feature:
Now indexing also sites, which do send a obligatory request for a cookie, to be set by the crawler.

New feature:
In order to reduce transmission time, the crawler now requests gzip-formatted data transfer
from the remote server for the URL to be indexed.

New option:
In order to convert the text into UTF-8, use the charset definition as supplied via HTTP by the client
server.
If this option is not activated in Admin Settings, the charset will be extracted from the header of the files to be indexed. If not found, like in PDF documents, the preferred charset will be used.

New option:
Delete duplicate parts of the URL path found in the indexed page URL and the new links.
Unfortunately some CMS seem to be unable to build up a correct path for relative links.
If activated in Admin backend, these duplicate parts of the path will be deleted from the link URL. Should be activated only, if sites are indexed created by dedicated CMS.

New feature:
Show summary of actually active User database at the bottom of result listing.
To be activated in Admin backend, the count of sites, categories, page links and keywords
are displayed.

New feature:
Automatically deleting invalid URLs from Admin 'Sites' view.

Improved 'Add site' function in Admin backend.
Now treating URLs with and without 'www' as equal, and excluding them as duplicate sites.

Improved image indexing procedure
Now also indexing phpBB images, linked by php command files.

New option
Suppress the file suffix from image file names for indexing.

Improved media indexing procedure
In case of missing title tag, now the alt tag is used to define the name of the media.
In case that also the alt tag is missing, the file name will be used as keyword.

Improved "banned domain" management
Now holding name and suffix of the banned domains, and no longer the URLs.

Improved index procedure
Now ignoring links that try to link to the calling URI (self back linking).

Improved link detection for relative links, which are to be found in full text.

Improved input protection against SQL injections

Improved Admin statistics
Now providing also the IP, country code and country name for
- Search log
- Most popular searches
- Most popular page links
- Most popular media links

Updated GeoIP database, used to provide the IP, CC and country name for the Admin statistics.
Now also supporting IPv6 URLs.

Support on Windows systems temporary removed for ppt files, as the converter causes failures on large PowerPoint documents.

Bug fixed, which prevented category selection without activating the "Advanced search form" option.

Bug fixed that caused invalid URL encoding in result listing.

Bug fixed causing the error output "Unknown column 'naame' in field list" during media indexing.

Bug fixed that caused MySQL warning messages during index procedure at some older MySQL versions, if the URL to be indexed contained blank characters.

Bug fixed, which caused invalid URL creation for relative links containing a file name and/or query.

Bug fixed in option 'Crawler can leave domain'.

Bug fixed in option 'Use list of div ids to ignore the div content during index/re-index'.

Bug fixed in option 'Enable to decode entity coded sites into standard HTML characters'.

Bug fixed in 'addurl' form, which prevented input of words containing accents in 'title' and 'description' fields.

Some additional small bugs killed.
Re: Sphider-plus version 2.9 released
December 03, 2012 12:19PM
Hi Tec, do you think you can help to update Sphider basic. It would be a nice thing.

-----
[url=http://myfxtips.com]Foreign Exchange Forex Trading Strategies[/url]
Tec
Re: Sphider-plus version 2.9 released
December 04, 2012 10:48PM
Sorry to tell you, but the original Sphider is unsupported since about 4 years now. Before starting to develop Sphider-plus, I tried to cooperate with Ando. My intension was to continue with him together. But as it seems, Ando intended to concentrate on other projects. Thus I started to create Sphider-plus by myself. Meanwhile I have added 284 new features (additional mods, functions, template designs and debugging) to the original Sphider. And now, 4 years later, I do not intend to do reengineering for the original Sphider. I do have to concentrate on Sphider-plus, which is under continuous development.

Tec
Re: Sphider-plus version 2.9 released
December 29, 2012 08:30AM
Thanks for the reply, and you're right about just working on your Sphider-Plus, it is much better but its not free smiling smiley

---
[url=http://myfxtips.com]Forex Strategies Blog[/url]
[url=http://dakredit.com]Financijski Portal[/url]
Re: Sphider-plus version 2.9 released
December 30, 2012 10:21AM
What is the difference between sphider-plus and pro?

I read that sphider-plus is lacking good security.

Can someone explain the difference between both packages to me?

thanks in advance,

W//
Tec
Re: Sphider-plus version 2.9 released
December 30, 2012 11:28AM
Sphider Pro is not an official update or upgrade of Sphider-plus. As no details are presented, it seems they just added some old mods like 'Follow sitemap.xml' and 'Erase & Re-index' to the original Sphider.
I've published all these mods here in this forum several years ago. Adding them to the original Sphider, now he is calling it Sphider Pro. Okay, in order to remain fair: he also implemented some options from Sphider-plus.

After purchasing Sphider-plus about 2 month ago, unable to install the scripts on their server by missing knowledge on server settings and configuration. Also the help and advices granted in the Sphider-plus forum was not accepted. Instead, a dispute at PayPal was opened. About 7 hours after downloading the scripts. The PayPal claim was decided in my favour. But it seems to be difficult to accept. Now they try to bother the real developer of Sphider. Publishing publicity and polemic instead of offering solutions here in the forum. Also placing the scripts for free download. Poor boy. Looks like a typical reaction of script kiddies. How do they intend to copy and paste in the future, without intellectual input?
As a cite from one of the e-mails, sent to me:
". . . Should have give me my money back . . . "

Tec



Edited 1 time(s). Last edit at 01/01/2013 04:40PM by Tec.
Tec
Re: Sphider-plus version 2.9 released
January 01, 2013 10:42PM
<<< Sphider Pro has been running on websites since 2009 >>>
Your website, offering Sphider Pro, has been established November 29, 2012.
Quite a nice performance that all your beta testers were busy for more than 4 years. But even so much activities would not justify to name your first public available release v.3.1
In order to verify, all details regarding registration, ownership, etc. they are available at
http://www.eurid.eu/
Just whois for sphiderpro

<<< As Tec says he published all these mods in here seral years ago yet he states on his website that that he addapted Sphider-plus form the original sphider created in 2008. >>>
Okay, English is not my native language. But as far as I understand the above: Congratulations, finally also you understood, who really developed the search engine. Ando Saaabas devoloped the original Sphider, and as he did not support it any longer I continued with his personal compliance. Would be no problem for me to present the ragarding e-mails written by Ando dated 2008 and 2009.

By the way: How you might know, whether I've never seen your scripts?

Tec



Edited 1 time(s). Last edit at 01/02/2013 06:44AM by Tec.
Sorry, only registered users may post in this forum.

Click here to login