Welcome! Log In Create A New Profile

Advanced

Tec_Protocol_to_Import_a_list_of_urls_in_6_steps

Posted by Anonymous User 
Anonymous User
Tec_Protocol_to_Import_a_list_of_urls_in_6_steps
June 18, 2007 10:02PM
This is

Tec_Protocol_to_Import_a_list_of_urls_in_6_steps

http://www.sphider.eu/forum/read.php?2,601,618#msg-618
i TESTED IT WITH 4 00 URLS

(I)In .../admin/admin.php search for
switch ($f) {

and

Include as case40 the following code (if you give it other number, you should keep it in other changes to admin.php (cfr II)





case 40:
print "<b><br><center>Import URLs</center></b><br>\n";

$short_desc = '';
$title = '';
$spider_depth = '-1';
$required = '';
$disallowed = '';
$can_leave_domain = '';
$theFile = file_get_contents('url.txt');
$lines = array();
$lines = explode("\n", $theFile);

print "<b>Importing :</b><hr><br>&nbsp;&nbsp;&nbsp;&nbsp;";

foreach ($lines as $url) {
$url = cleanup_text (nl2br(trim(substr ($url, 0,150))));
print "<br>&nbsp;&nbsp;&nbsp;&nbsp;$url :";
$compurl = parse_url("".$url);
if ($compurl['path']=='')
$url=$url."/";
$result = mysql_query("select site_ID from ".$mysql_table_prefix."sites where url='$url'"winking smiley;
echo mysql_error();
$rows = mysql_numrows($result);
if ($rows==0 ) {
mysql_query("INSERT INTO ".$mysql_table_prefix."sites (url,spider_depth) VALUES ('$url', '$spider_depth')"winking smiley;
echo mysql_error();
} else {
echo "<b><br>&nbsp;&nbsp;&nbsp;&nbsp;Attention: Site ' $url '<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;is already in database. Currently not imported a second time.</b><br><br>";
}
}
print "<br><br><hr><b>Import finished !</b><p />";
break;




(II)Open .../admin/admin.php
Search for
function showsites($message) {
include as additional part of the selectable items:



<li> <a href='admin.php?f=40'>Import my URL.txt</a></li>




(III)add at the end of commonfuncs.php (in sphider/include)



/** string cleanup_text ([string value [, string preserve [, string allowed_tags]]])

This function uses the PHP function htmlspecialchars() to convert
special HTML characters in the first argument (e.g., &,",',<, and >winking smiley
to the equivalent HTML entities. If the optional second argument is empty,
any HTML tags in the first argument will be removed. The optional
third argument lets you specify specific tags to be spared from
this cleaning. The format for the argument is "<tag1><tag2>".
*/

function cleanup_text ($value='', $preserve='', $allowed_tags='')
{
if (empty($preserve))
{
$value = strip_tags($value, $allowed_tags);
}
$value = htmlspecialchars($value, ENT_QUOTES);
return $value;
}




(IV)
In .../admin/

Save

url.txt

as a file with a list of url's, line by line


(V) Use Shider to index those urls


(VI) Can someone show how to import those urls to a named category?
Sorry, only registered users may post in this forum.

Click here to login