Welcome! Log In Create A New Profile

Advanced

Import url list with spider depth an categories.

Posted by Tec 
Tec
Import url list with spider depth an categories.
June 27, 2007 06:01PM
Already published some time ago the script now additionally handles spider-depth an categories. If you already use the old script, only case 40 has to be replaced.


In folder .../admin/ create a new file called url.txt with your URL's like this but without [..]:

[http://www.domain1.com,-1,Info]
[http://www.domain2.de,3,Funny things]
[http://www.domain3.it,2]
[http://www.etc.com]

URL spider-depth and category must be separates by comma.
If you don't specify spider-depth it is automatically set to '-1'.
Also category is optional. If not specified the new site will be stored without category.
Not specifying spider-depth but category requires: url,,categoryname (programmer was lazy!)


Open .../admin/admin.php
Search for
function showsites($message) {

Now include as additional part of the selectable items:

<li> <a href='admin.php?f=40'>Import my URL.txt</a></li>


In .../admin/admin.php search for

switch ($f) {


Include as case40 the following code:


case 40:
print "<b><br><center>Import URL's for localhost Server</center></b><br>\n";

$short_desc = '';
$title = '';
$required = '';
$disallowed = '';
$can_leave_domain = '';
$parent_num = "0";
$theFile = file_get_contents('urls_intern.txt');
$lines = array();
$lines = explode("\n", $theFile);

print "<b>Importing :</b><hr><br>&nbsp;&nbsp;&nbsp;&nbsp;";

foreach ($lines as $new) {
$new = cleanup_text (nl2br(trim(substr ($new, 0,150))));

$new = explode(",",$new);
$url = $new[0];
$spider_depth = $new[1];
if ($spider_depth == ('')) $spider_depth = '-1';
$category = $new[2];

print "<br>&nbsp;&nbsp;&nbsp;&nbsp;$url :";
// clean url
$compurl = parse_url("".$url);
if ($compurl['path']=='')
$url=$url."/";
$result = mysql_query("select site_ID from ".$mysql_table_prefix."sites where url='$url'"winking smiley;
echo mysql_error();
$rows = mysql_numrows($result);
if ($rows==0 ) {
// save new url and spider-depth
mysql_query("INSERT INTO ".$mysql_table_prefix."sites (url,spider_depth) VALUES ('$url', '$spider_depth')"winking smiley;
echo mysql_error();

// handle the category if we do have one
if ($category != ('')) {
$result = mysql_query("select category from ".$mysql_table_prefix."categories where category='$category'"winking smiley;
echo mysql_error();
$rows = mysql_numrows($result);
if ($rows==0 ) {
// if new category
mysql_query("INSERT INTO ".$mysql_table_prefix."categories (category, parent_num) VALUE ('$category', '$parent_num')"winking smiley;
echo mysql_error();
}

// get category_id
$result = mysql_query("select * from ".$mysql_table_prefix."categories where category='$category'"winking smiley;
echo mysql_error();
$cat = mysql_fetch_array($result);
$cat_id = $cat['category_id'];

// get site_id
$result = mysql_query("select * from ".$mysql_table_prefix."sites where url='$url'"winking smiley;
echo mysql_error();
$sit = mysql_fetch_array($result);
$site_id = $sit['site_id'];

// save new site_id and category_id
mysql_query("INSERT INTO ".$mysql_table_prefix."site_category (site_id, category_id) VALUES ('$site_id', '$cat_id')"winking smiley;
echo mysql_error();

}

} else {
echo "<b><br>&nbsp;&nbsp;&nbsp;&nbsp;Attention: Site ' $url '<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;is already in database. Currently not imported a second time.</b><br><br>";
}
}
print "<br><br><hr><b>Import finished !</b><p />";
break;


At the end of .../include/commonfuncs.php include :

/** string cleanup_text ([string value [, string preserve [, string allowed_tags]]])

This function uses the PHP function htmlspecialchars() to convert
special HTML characters in the first argument (e.g., &,",',<, and >winking smiley
to the equivalent HTML entities. If the optional second argument is empty,
any HTML tags in the first argument will be removed. The optional
third argument lets you specify specific tags to be spared from
this cleaning. The format for the argument is "<tag1><tag2>".
*/

function cleanup_text ($value='', $preserve='', $allowed_tags='')
{
if (empty($preserve))
{
$value = strip_tags($value, $allowed_tags);
}
$value = htmlspecialchars($value, ENT_QUOTES);
return $value;
}


That's all for today. The bad news at the end:
This script doesn't support subcategories. All categories you add to your url.txt file, are set as top-level. Sphider administrates subcategories in respect to the top-level it is assigned to. For my point of view it would become a little bit complicated to take care of something like [http://www.domain2.de,3,Funny things,4,3] . As Sphiders admin sets the number of the top-level category (4), is not known by an external user. But of course you are free to enhance the script. Just manage $parent_num . . .

Tec



Edited 1 time(s). Last edit at 06/27/2007 07:33PM by Tec.
Tec
Re: Import url list with spider depth an categories.
June 28, 2007 05:43AM
Okay, yesterday I was lazy. Now for all of you who want to spare one comma (or are too oblivious) here it is:

In case 40 search for:
$category = $new[2];

After this place as new row:
if (strlen($spider_depth) > '2') $category = $spider_depth;

Now you may use or forget spider-length in url.txt

Tec



Edited 2 time(s). Last edit at 06/28/2007 05:45AM by Tec.
Sorry, only registered users may post in this forum.

Click here to login