Welcome! Log In Create A New Profile

Advanced

Sphider-plus version 3.2013a released

Posted by Tec 
Tec
Sphider-plus version 3.2013a released
August 05, 2013 03:24PM
As it might be of interest also for the users of original Sphider:
Meanwhile the newest development of Sphider-plus is available at [www.sphider-plus.eu]
In front of version 2.9 the following items have been added / modified:

- New feature:
Index DOCX files. To be activated in Admin settings.
Implemented as PHP script, the converter needs no adoption to the Operating System.

- New feature:
Index XLSX files. To be activated in Admin settings.
Implemented as PHP script, the converter needs no adoption to the Operating System.

- New feature:
Index only preferred sites. Level depended re-index of only those URLs, containing the according level.

- New option:
Admin's 'Sites' table sorted by index priority.

- New feature:
Create a thumbnail of all Internet URLs during index procedure.
Will be presented as part of the text result listing for each link.
To be activated in Admin backend.

- New feature:
Prevent indexing of suspected malware and phishing pages.
To be activated in Admin backend, this feature is supplied by a Google web service
to prevent indexing of pages that contain malware or phishing content.

- New feature:
If the blacklist is met too often, automatically abort the indexation of the regarding site.
Defined to a count of 20.

- New option:
Check correct converting of content into UTF-8
Will detect invalid charset definitions in Meta tags of HTML header,
or invalid charset definition supplied via HTTP by the client server.
If an invalid charset is detected, the index procedure will be aborted for the regarding link.

- New feature:
The addurl form now will only store domain name and TLD. Something like 'sphider-plus.eu'
Thus, www. and any subfolder of the suggested URL will be ignored.

- New feature:
Ignore the content of style="display:none" in div elements. Something like:
<div style="display:none">ignore_this_content</div>

- New feature:
In order to enable immediate query input, auto focus is set to the search form.

- New suggest framework.
The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery

- New feature:
Separate search fields for text and media queries. Consequently also separate suggestions
will be offered. To be activated in Admin 'Settings'.

- New feature:
Restrict the search results by means of up to 5 categories simultaneously.
Import and export of URLs with multiple category definitions assigned to each site.

- New feature:
Now indexing also site URLs containing the https scheme.

- Improved index procedure:
Now treating link URLs with and without 'www' as equal, and excluding them as duplicate pages.
Linking in it selves caused by HTTP 301/302/307 redirections are intercepted.
Thus, infinite indexation is prevented.
Multiple attempts to redirect in it selves will force Sphider-plus to abort the index procedure
for the involved site.

- New option in Admin 'Settings' menu:
Define count of redirections followed for each link (1-9) while indexing.

- New options in Admin 'Settings' menu:
Follow URL redirections, which are invoked by JavaScript like
<sript . . . 'window.location.replace . . . . '
<script . . . var cURL = . . . .'
<script . . . window.location = . . . . AND " + location.host + "
and several other script directives.

- New options in Admin 'Settings' menu:
Follow URL redirections, which are invoked by body tags like
<BODY onLoad = "parent.location = 'home.asp'">
'HTTP-EQUIV= . . refresh . . content= . . .'
and several other tags

- New option in Admin 'Settings' menu:
Obey refresh delay directives, placed in meta tags like
<meta http-equiv="refresh" content="180;url=http://www.moodys.com.ar">

- New option in Admin 'Settings' menu:
Do not index comment parts <!-- this text --> and scripts outside the HTML tags

- New option in Admin 'Settings' menu:
If not already exist, add a final slash to the path for all detected links.
If a file name exists as part of the path, this option will be bypassed.
Also, if the http request for the main URL is only accepted without slash,
this option will not be obeyed.

- New option in Admin 'Settings' menu:
Convert all link URLs to lower case characters.

- New option in Admin 'Settings' menu:
Convert all link URLs found during indexation into UTF-8
Will convert URLs like
/3v/catalog/%C1%E0%E2%E0%F0%E8%FF+%E8%E7%F0%E0%E7%E5%F6/
into:
/3v/catalog/Бавария+изразец/

- Improved link detection:
Invalid URLs containing duplicate slashes in its path will be ignored.
The following links are followed now:
<script>window.document.location ="/this.path";</script>
<script>window.document.location.href="/this.path";</script>
<script>window.location.replace("/this.path" </script>
<script>"https|http this URL"</script>
<body onload "/this.path">
and several other.

- New option in Admin backend 'Clean' menu:
Truncate all tables in database.

- Improved 'NOHOST' detection during index procedure:
Now trying 5 times to get in contact with the server.
Each attempt is performed by 2 different HTTP requests.

- Improved 'Add site' function in Admin backend.
Now treating URLs with the scheme 'http' and 'https' as equal, and excluding them as duplicate sites.

- Support added for Windows-31J (CP932) charset as extension of Shift JIS.
(CP932 contains standard 7-bit ASCII codes, and Japanese characters are indicated by the high bit set to 1)

- UTF-8 support implemented for media titles, file names and ID-3 tags.

- SQLi connector implemented between PHP and a MySQL database. Performed by OOP.

- Bug fixed in option: Do not index the full text.

- Bug fixed for URLs containing CP1252 coded paths.

- Bug fixed in detection of www/non www links. Now preventing duplicate indexing.

- Bug fixed in 'Strip session ids'

- Bug fixed in Korean word segmentation

- Some small bugs killed.
Sorry, only registered users may post in this forum.

Click here to login