Welcome! Log In Create A New Profile

Advanced

Request: Image and Video Search

Posted by whompster 
Request: Image and Video Search
October 10, 2007 02:04AM
Hey guys,

Just wondering if anyone has made an image and video search yet?

Thanks
Mike
Re: Request: Image and Video Search
November 12, 2007 09:46PM
How tough would it be to put JPG, Gifs,in their own part of the data base and then display them in a gallery format much like Google?

Same with video format?

Or better yet allow the admin to add their own file extensions to do this with? so that you could have torrents, music, movies, anything segregated from links.

I would be willing to talk a price for this mod...
Re: Request: Image and Video Search
November 13, 2007 02:46PM
It wouldn't be that hard, you would just need a big database and a way to store the image in the database and then view from the database. I remember reading some ways on how to do this. Now movies, thats a lot harder and requires a lot more space.
I will provide some help if you want.
Re: Request: Image and Video Search
November 13, 2007 05:03PM
A nice way to do it is to create a table holding the critical data (id, name,type,local_path, hash etc) and a folder on the server for saving the images or video (ex <id>.jpg).
This will save space and performance from the database and you can also manipulate the files easier. smiling smiley

thinks that come in mind are:

1. The link between the files and the pages (the same file can be found in many pages) probably an other table..

2. If you are planning to save all these files, then you should be using a hashing algorithm or something similar in order to identify them. (the same file can have a different names) and this can be very resource consuming...

3. There are some limitations in php and i don't think you would be able to download a 100MB stream and manipulated easy (An external application would help?) smiling smiley

4. Is this really needed? are you trying to reverse engineer google or something? tongue sticking out smileyPP

There are a lot of other things i can think about in the case of "Why" but everything is possible, so "Why NOT?" tongue sticking out smileyP

If someone makes a start i would like to help smiling smiley

Keep it fun!

Wireless
node: 2435
website: www.woogle.awmn
voip: 24351

Wired
website: www.iptelephony.gr
voip: 1003@sip.iptelephony.gr
Re: Request: Image and Video Search
November 14, 2007 09:52AM
Hi,
Here are two functions that can be used to extract image links,title, and alt attributes from html source code.
The output from get_images() is an array containing the absolute link to the image file and any alt and title wording.
I haven't tested this with indexing yet, though the best place to capture this data is in the clean_file() function in spiderfuncs.php

Just before:-
$file = strip_tags($file);

add:-

$images = get_images($file,$url);

then add the $images array to $data as $data['images']=$images.
this enables the images to be passed out of the clean_file() function.

In index_url() after the line:-
$data = clean_file($file, $url, $url_status['content']);

you can access the images as $data['images'] to process the data.

Pikos's suggestion for a table structure seems very good.

I've tested the functions on a few web pages with the test code at the top, though there are a few limitations. Some websites won't allow you to hotlink to their images (www.sxc.hu/ as an example), so you've got the image link, but no way to display it. It also extracts links from commented code if not placed just before strip_tags in clean_file().

Anyway, its the bare bones of an interesting mod and hopefully someone will take it a bit further.

hope this helps

-------------------------------------------------------------
<?php
// save this file and open in your browser to see the output.

$url = "http://stlcomics.com/cover_gallery/index.php";
$file = file_get_contents($url);
$images = get_images($file,$url);
echo '<pre>';
print_r($images);
echo '</pre>';
foreach($images as $key=>$value){
echo "<img src=\"{$value['src']}\"/><br />
";
}

//delete from here upwards.

/*
strips out image tags from html with src/alt/title attributes
*/

function get_images($file,$url){

//init

$regs=array();
$images=array();

//match image tags

preg_match_all('/<\s?img(.*?)>/i',$file,$regs);

//loop through

foreach($regs[0] as $key=>$value){
$items = explode('=',$value);
$count = count($items);
for($i=0;$i<$count;$i++){

//replace any bits after the data we need with nothing

$item = str_replace(strrchr($items[$i+1],' '),'',$items[$i+1]);

//get rid of " and '

$item = trim(preg_replace('/("|\')/','',$item));

//if src

if(preg_match('/src/i',$items[$i])){
if(trim($item)!=''){

//make sure image

if(preg_match('/((.*)(\.gif|\.jpeg|\.jpg|\.png))(.*)/',$item,$reg)){
$images[$key]['src']=find_true_path($url,$reg[1]);
}
}else{
continue;
}
}

//if alt

elseif(preg_match('/alt/i',$items[$i])){

if($item!=''){
if(isset($images[$key]['src'])){
$images[$key]['alt']=$item;
}else{
continue;
}
}else{
continue;
}
}

//if title

elseif(preg_match('/title/i',$items[$i])){
if($item!=''){
if(isset($images[$key]['src'])){
$images[$key]['title']=preg_replace('/[\"\']/','',$item);
}else{
continue;
}
}else{
continue;
}
}else{
continue;
}
}
}
return($images);
}
/*
sorts out a relative url with ./ and ../ s and prefixes
the main url to the start of it giving the correct absoulte
url to the image file.
*/
function find_true_path($url,$image_location){

//if absolute already

if(preg_match('/http/',$image_location)){
return($image_location);
}else{
$true_location='';

//if start of $image_lcoation is ./ then remove it
if(substr($image_location,0,2)=='./'){
$image_location=substr($image_location,2);
}

//if start of $image_location is / then remove it

elseif(substr($image_location,0,1)=='/'){
$image_location=substr($image_location,1);
}else{
$image_location=$image_location;
}

//how far back to go

$back_folder_count = substr_count($image_location,'../');

//reset url to the last folder

$url = substr($url,0,strrpos($url,'/'));
for($i=0;$i<$back_folder_count;$i++){

//if last char is /

if(strrpos($url,'/')==strlen($url)-1){

//remove it and chop of the word to the next to last /

$url = substr($url,0,-1);
$url = substr($url,0,strrpos($url,'/'));
}else{

//chop of the word

$url = substr($url,0,strrpos($url,'/'));
}
}

//add / to the end

if(substr($url,-1)!='/'){
$url = $url.'/';
}

//replace ../ with nothing in $image_location

$image_location = preg_replace('/(\.\.\/)/','',$image_location);

// stick them together

$true_location = $url.$image_location;
return($true_location);
}
}
?>



Edited 3 time(s). Last edit at 11/14/2007 10:06AM by gandalf.
Re: Request: Image and Video Search
November 14, 2007 08:56PM
Awesome! This gives me a lot too play with at work to look like I'm making the company big bucks! lol

They just see code on the screen and think, Man that guy is dedicated! wooo! hehe

Thanks for the input and if anyone has any more ideas or code bits that would be cool!
Re: Request: Image and Video Search
November 15, 2007 03:01AM
I updated his code....

<?php
// save this file and open in your browser to see the output.

require("spiderfuncs.php"winking smiley;
//$url = "http://stlcomics.com/cover_gallery/index.php";
$url = "http://www.darkharbor.com/noriko/html/genma_panda.html";
$file = file_get_contents($url);
$images = get_images($file,$url);
echo '<pre>';
print_r($images);
echo '</pre>';
foreach($images as $key=>$value){
echo "<img src=\"{$value['src']}\"/><br />
";
}

//delete from here upwards.

/*
strips out image tags from html with src/alt/title attributes
*/

function get_images($file,$url){

//init

$regs=array();
$images=array();

//match image tags

preg_match_all('/<\s?img(.*?)>/i',$file,$regs);

//loop through

foreach($regs[0] as $key=>$value){
$items = explode('=',$value);
$count = count($items);
for($i=0;$i<$count;$i++){

//replace any bits after the data we need with nothing

$item = str_replace(strrchr($items[$i+1],' '),'',$items[$i+1]);

//get rid of " and '

$item = trim(preg_replace('/("|\')/','',$item));

//if src

if(preg_match('/src/i',$items[$i])){
if(trim($item)!=''){

//make sure image

if(preg_match('/((.*)(\.gif|\.jpeg|\.jpg|\.png))(.*)/',$item,$reg)){
$images[$key]['src']=find_true_path($url,$reg[1]);
$newurl = find_true_path($url,$reg[1]);
// if image is 404
if (image_status($newurl) <> "Unreachable"winking smiley{
// get image size duh.
list($width, $height, $type, $attr) = getimagesize($images[$key]['src']);
$totalsize = $width + $height;
// if image is possible to get size and > 0
// change $totalsize > 4 to whatever you want the minimum total size to be
if (($width > 0 && $height > 0) && ($totalsize > 4)){
$images[$key]['width']=$width;
$images[$key]['height']=$height;
} else {
// too small 2103 = made up number for too small
$images[$key]['status']=2103;
continue;
}
} else {
// 404 error
$images[$key]['status'] = 404;
continue;
}


}
}else{
continue;
}
}

//if alt and not too small or 404

elseif(preg_match('/alt/i',$items[$i]) && ($images[$key]['status'] != 2103 || $images[$key]['status'] != 404)){

if($item!=''){
if(isset($images[$key]['src'])){
$images[$key]['alt']=$item;
}else{
continue;
}
}else{
continue;
}
}

//if title and not too small or 404

elseif(preg_match('/title/i',$items[$i]) && ($images[$key]['status'] != 2103 || $images[$key]['status'] != 404)){
if($item!=''){
if(isset($images[$key]['src'])){
$images[$key]['title']=preg_replace('/[\"\']/','',$item);
}else{
continue;
}
}else{
continue;
}


}else{
continue;
}
}
}

// check for too low of size (2103) or 404 errors
// if you want to include 404 images comment out the following till return($images);
reset ($images);
while ($linksimg = each($images)) {
if (is_null(key($images)) === FALSE){
$imgf = key($images);
if ($images[$imgf]['status'] == 2103 || $images[$imgf]['status'] == 404){
removeArrayElement($images, $imgf);
}
}
}
//for some reason it doesn't process the [0] key
reset ($images);
if ($images[0]['status'] == 2103 || $images[0]['status'] == 404){
removeArrayElement($images, 0);
}

return($images);
}

// get image status (almost same as url_status)
function image_status($url2) {
global $user_agent;
$urlparts1 = parse_url($url2);
$path = $urlparts1['path'];
$host = $urlparts1['host'];
if (isset($urlparts1['query']))
$path .= "?".$urlparts1['query'];

if (isset ($urlparts1['port'])) {
$port = (int) $urlparts1['port'];
} else
if ($urlparts1['scheme'] == "http"winking smiley {
$port = 80;
} else
if ($urlparts1['scheme'] == "https"winking smiley {
$port = 443;
}

if ($port == 80) {
$portq = "";
} else {
$portq = ":$port";
}

$all = "*/*"; //just to prevent "comment effect" in get accept
$request = "HEAD $path HTTP/1.1\r\nHost: $host$portq\r\nAccept: $all\r\nUser-Agent: $user_agent\r\n\r\n";

if (substr($url2, 0, 5) == "https"winking smiley {
$target = "ssl://".$host;
} else {
$target = $host;
}

$fsocket_timeout = 30;
$errno = 0;
$errstr = "";
$fp = fsockopen($target, $port, $errno, $errstr, $fsocket_timeout);
print $errstr;
$linkstate = "ok";
if (!$fp) {
$status['state'] = "NOHOST";
} else {
socket_set_timeout($fp, 30);
fputs($fp, $request);
$answer = fgets($fp, 4096);
$regs1 = Array ();
if (ereg("HTTP/[0-9.]+ (([0-9])[0-9]{2})", $answer, $regs1)) {
$httpcode = $regs1[2];
$full_httpcode = $regs1[1];
if ($httpcode <> 2 && $httpcode <> 3) {
$linkstate = "Unreachable";
}
}
}
return $linkstate;
}

/*
sorts out a relative url with ./ and ../ s and prefixes
the main url to the start of it giving the correct absoulte
url to the image file.
*/

function find_true_path($url,$image_location){

//if absolute already

if(preg_match('/http/',$image_location)){
return($image_location);
}else{
$true_location='';

//if start of $image_lcoation is ./ then remove it
if(substr($image_location,0,2)=='./'){
$image_location=substr($image_location,2);
}

//if start of $image_location is / then remove it

elseif(substr($image_location,0,1)=='/'){
$image_location=substr($image_location,1);
}else{
$image_location=$image_location;
}

//how far back to go

$back_folder_count = substr_count($image_location,'../');

//reset url to the last folder

$url = substr($url,0,strrpos($url,'/'));
for($i=0;$i<$back_folder_count;$i++){

//if last char is /

if(strrpos($url,'/')==strlen($url)-1){

//remove it and chop of the word to the next to last /

$url = substr($url,0,-1);
$url = substr($url,0,strrpos($url,'/'));
}else{

//chop of the word

$url = substr($url,0,strrpos($url,'/'));
}
}

//add / to the end

if(substr($url,-1)!='/'){
$url = $url.'/';
}

//replace ../ with nothing in $image_location

$image_location = preg_replace('/(\.\.\/)/','',$image_location);

// stick them together

$true_location = $url.$image_location;
return($true_location);
}
}
?>



Edited 2 time(s). Last edit at 11/15/2007 03:16AM by fastest963.
Re: Request: Image and Video Search
November 18, 2007 03:26AM
I got the following area trying to access the image search page.

Parse error: syntax error, unexpected T_VARIABLE in /home/.westmont/jeremy90/*****.com/admin/spiderfuncs.php on line 598
Re: Request: Image and Video Search
November 19, 2007 02:35PM
What line is that 598?
Check for a ; right before that line. That is probably the problem. Are you using my code or gandalf's code?
Re: Request: Image and Video Search
November 22, 2007 07:30PM
How do I find out what line is 598?
Re: Request: Image and Video Search
November 23, 2007 04:47AM
Im sorry, i just don't see any errors, can you try to reapply the mod?
YOu also, never said which mod you used the first or second.
Re: Request: Image and Video Search
December 13, 2007 05:49AM
yes, fastest963 and I are already indexing images into a large database, but searching the database is still being developed
Re: Request: Image and Video Search
January 04, 2008 07:33AM
I don't know how Sphider saves entries to the database (so I don't know how this would integrate into Sphider), but basically if a url ended in .jpg, .gif, .png, it would be archived as an image. If it ended in .mpg, .avi, .wmv, it would be archived as a video.
Re: Request: Image and Video Search
February 11, 2008 03:04PM
How is this coming along?

When its finished, let me know smiling smiley
Re: Request: Image and Video Search
April 03, 2008 10:00PM
So, i have updated one of this forum,
and this one indexes images by the indexing:
Scales it (with proportions)
Converts it to png (4 kb!)
And push it in a database

Table like this:
links_image:
url,image

Good luck!
<?php
function get_image_data($file,$host,$url) {

include("../settings/database.php"winking smiley;
$chunklist = explode(">", $file);

foreach ($chunklist as $imgs) {

if (stristr($imgs, "<img"winking smiley) {

// jpeg control
if (stristr($imgs, "jpg"winking smiley || stristr($imgs, "jpeg"winking smiley || stristr($imgs, "gif"winking smiley || stristr($imgs, "png"winking smiley || stristr($imgs, "tiff"winking smiley) {

if (preg_match("/src *= *[\"']?([^<>'\"]+)[\"']?/i", $imgs, $res)) {

$image = $res[1];

if (substr($image, 0, 4) == 'http') {
$image = $res[1];
}else{
$image = $res[1];
$image = str_replace("http://", "", $image);
$image = str_replace($host, "", $image);

if (substr($image, 0, 1) == '/') {
$image = "http://".$host.$image;
}else{
$image = "http://".$host."/".$image;
}
}
$target_path = $target_path . basename( $_FILES['uploadedfile']['name']);

$_FILES['uploadedfile']['tmp_name'];
$size = getimagesize($image);


$data['image'] = $image;

$query = "select * from links_image where image = '$image'";
$result = mysql_query($query);
$aantal = mysql_num_rows($result);

unlink ($target_path); // delete the original file


if($aantal == "0"winking smiley{

$check = $image;

$filename = strtolower($image) ;
$exts = split("[/\\.]", $filename) ;
$n = count($exts)-1;
$extension = $exts[$n];



if($extension == "jpg" or $extension == "jpg"winking smiley{
$im = imagecreatefromjpeg($image);
}
else if($extension == "png"winking smiley{
$im = imagecreatefrompng($image);
}
else if($extension == "gif"winking smiley{
$im = imagecreatefromgif($image);
}

list($width, $height) = getimagesize($image); // get the width and height of the jpg

$verticaal = 60;
$horizontaal = 60;


$y1 = $width;
$x1 = $height;


if ($width > $horizontaal)
{
// find slope
$m = $y1/$x1;

$x2 = $horizontaal;

$y2 = $m * $x1;
}

$image_p = imagecreatetruecolor($y2, $x2); // create a 16x16 canvas to play with


imagecopyresampled($image_p, $im, 0, 0, 0, 0, $y2, $x2, $width, $height); // resize jpg to 16x16

$num = rand (1,99999); // generate a random number between 1 and 99999

$output = "uploads/".$num.".png"; // add the number to a string with -favicon.ico
echo"$image --- $extension<br>";

imagepng($image_p,$output); // make a .png file (icon file) from our data

imagedestroy ($im); // close gd library


mysql_query ("insert into links_image (url, image) values ('$url','$image')"winking smiley;
echo mysql_error();
}




}
?>
Re: Request: Image and Video Search
May 02, 2008 01:37PM
Good script i multiple question...

i error
Parse error: syntax error, unexpected $end in /home/xblogzo/public_html/admin/tsearch.php on line 111

excute the script in admin... is correct ?

index the image and show result in ? main search witch other query ?
Re: Request: Image and Video Search
May 09, 2008 01:54PM
I have same issue!
any suggest?
jb
Re: Request: Image and Video Search
May 28, 2008 02:10PM
Hi,
I'm way too lazy so I used some of your functions. I discovered your function find_true_path($url,$image_location) will not work for an url like:
url = http://domain.com/dir1/index.htm
image_location= /dir2/pic.jpg
your script will result in http://domain.com/dir1/dir2/pic.jpg

I did a quick and dirty fix adding 1 line.. You can look into parse_url to do the real thing.. however you can use this as well
so instead of :
//if start of $image_location is / then remove it

elseif(substr($image_location,0,1)=='/'){
$image_location=substr($image_location,1);
}else{
$image_location=$image_location;
}

use this:

//if start of $image_location is / then remove it

elseif(substr($image_location,0,1)=='/'){
$image_location=substr($image_location,1);

$url =substr($url,0,(strpos($url,'/',10)+1));


}else{
$image_location=$image_location;
}
jb
Re: Request: Image and Video Search
June 03, 2008 11:14AM
Hi,
There is another bug in your find_true_path function.
It doesn't take into account an url with http://domain.com?var=/bla/bla.htm

The easyest fix is to add a line like this:

$url = substr($url,0,strrpos($url,'?'));
Re: Request: Image and Video Search
June 08, 2008 12:59PM
Hi,
I have installed Sphider for the first time - works great - really perfect - really impressed.

Right, What i am looking for it to get this image search to work! ha-ha. I have copied the one that links to the table so in theory when you index - it should index the images inside the table image_links.

However, I have now realised that this code is a function....
How on earth do I get this to work? Would be great if someone could help.

What I would like is for the admin to index the site and then the images from that site are put in to the table.... like on the code above.

I hope someone somewhere can help.

Thanks, Nathan
Sorry, only registered users may post in this forum.

Click here to login