Greenguy's Board


Go Back   Greenguy's Board > Programming & Scripting
Register FAQ Calendar Today's Posts

Reply
 
Thread Tools Search this Thread Rate Thread Display Modes
Old 2006-10-13, 10:03 AM   #1
oast
With $10,000, we'd be millionaires! We could buy all kinds of useful things like ... love!
 
oast's Avatar
 
Join Date: May 2004
Location: UK
Posts: 316
Getting the Hostname in PHP

I've been searching all day, trying different functions, regular expressions, etc., but nothing I've tried seems to work fully,
so I've decided to let someone else have a go...

What I want to do is check for uniqueness of the domain of a submitted site. I do not even want to allow subdomains.

So far I have this:
PHP Code:
$url_in_db 'http://www.domain.tld/thispage.html';
$submitted_url 'http://www.domain.tld/anotherpage.html';

$parsedurl parse_url($submitted_url);
$get_the_hostname $parsedurl["host"];

if (
strstr ($url_in_db$get_the_hostname))
echo 
"match found";
else
echo 
"no match"
That works, but it doesn't do exactly what I want it to do, because although it finds a match if the 'www.' is left off, it doesn't find a match on a subdomain entry like 'http://web.domain.tld/thispage.html'

There is an example at php.net to get the hostname:
PHP Code:
preg_match('@^(?:http://)?([^/]+)@i'"http://www.domain.tld/index.html"$matches);
$host $matches[1];
preg_match('/[^.]+\.[^.]+$/'$host$matches); 
This doesn't work for the '.co.uk', '.com.au', etc TLDs as it returns the last two elements of the domain, nor would it match any 'https://' URLs

So it is over to the PHP gurus at & |Jim

There is no rush on this, but if I could have a solution by Thursday 12th Oct, that would be great

Note - As you may have guessed, the '$url_in_db' is stored in a MySQL table, in case a function exists in a query expression that I may have missed.
__________________
Playboy Webmasters - The name says it all! $35 per signup or 60% revshare.
oast is offline   Reply With Quote
Old 2006-10-14, 06:08 PM   #2
matt
Trying is the first step towards failure
 
Join Date: Apr 2003
Location: Australia
Posts: 123
Send a message via ICQ to matt
Here's a really ugly way to do it... I'm sure there's a better one but this should work. Something like:
PHP Code:
$tld = array('com','net','org','co'); // might need to add more
$bits explode('.',$host);

if(
$bits[0] == 'www')
array_shift($bits); // the www exists, but it's irrelevant so shift it off the start

if(count($bits) > 4)
{
//some crazy domain action with at least 2 subdomain levels, i'd give them the shaft... but you can reuse what's below for more subdomain levels if ya want to.
}
elseif(
count($bits) == 4)
{
  if(
array_search($bits[2],$tld) !== false)
  {
  
// third bit is a tld, so this is a domain.com.au with a subdomain
  
$domain $bits[1] .'.'$bits[2] .'.'$bits[3];
  }
  else
  {
  
//some crazy domain action with at least 2 subdomain levels or it's an ip, i'd give them the shaft
  
}
}
elseif(
count($bits) == 2)
{
//can only have the name + tld
$domain $bits[0] .'.'$bits[1];
}
elseif(
array_search($bits[1],$tld) !== false)
{
//there are three parts, second one is a tld, so this is a valid domain
$domain $bits[0] .'.'$bits[1] .'.'$bits[2];
}
else
{
//there are three parts, but no tld as the second one, must be using a subdomain.
$domain $bits[1] .'.'$bits[2];

Hope that helps
__________________
Link List Land - Linklist Creation, Design and Implementation.
matt is offline   Reply With Quote
Old 2006-10-14, 06:18 PM   #3
matt
Trying is the first step towards failure
 
Join Date: Apr 2003
Location: Australia
Posts: 123
Send a message via ICQ to matt
I just realised that it might be easier for you to use regex after parse_url... Only just woken up, so the brain's only half fired up.. But it would still be pretty much doing with i did in php with regex... Depends how neat you like your code!
__________________
Link List Land - Linklist Creation, Design and Implementation.
matt is offline   Reply With Quote
Old 2006-10-14, 08:01 PM   #4
oast
With $10,000, we'd be millionaires! We could buy all kinds of useful things like ... love!
 
oast's Avatar
 
Join Date: May 2004
Location: UK
Posts: 316
Thanks matt

I'll put the code to some in-house testing and let you know how I get on.

Steve
__________________
Playboy Webmasters - The name says it all! $35 per signup or 60% revshare.
oast is offline   Reply With Quote
Old 2006-10-15, 06:19 AM   #5
Mateusz
Screw you, guys. I'm going home.
 
Mateusz's Avatar
 
Join Date: Mar 2004
Location: Gliwice, Poland
Posts: 996
Send a message via ICQ to Mateusz
Try this - this function can be usefull for blacklist and recip checking as well
PHP Code:
if (eregi($new_url$old_url)) {do sth}
   else {do 
sth else}; 
Please note there will be problem if someone submited
http://mydomain.com/freesite1/freesite2/freesite3/
and
http://mydomain.com/freesite1/
was already in DB

I see no point in creating such directory structure for free sites and I guess nobody does

Last edited by Mateusz; 2006-10-15 at 06:27 AM..
Mateusz is offline   Reply With Quote
Old 2006-10-15, 07:38 AM   #6
oast
With $10,000, we'd be millionaires! We could buy all kinds of useful things like ... love!
 
oast's Avatar
 
Join Date: May 2004
Location: UK
Posts: 316
Quote:
Originally Posted by Mateusz View Post
Please note there will be problem if someone submited
http://mydomain.com/freesite1/freesite2/freesite3/
and
http://mydomain.com/freesite1/
was already in DB
That is the sort of thing I am trying to prevent. It is not a freesite list, it is a directory listing domains (websites) not pages.

As far as I can see, your function will not do what I want it to.
I.e. if [url]http://mydomain.com/freesite1/ was in the DB, if they then submitted http://www.mydomain.com/freesite1/, http://web.mydomain.com/freesite1/, http://sub.mydomain.com/freesite1/ and http://another.mydomain.com/freesite1/ it wouldn't catch them.
__________________
Playboy Webmasters - The name says it all! $35 per signup or 60% revshare.
oast is offline   Reply With Quote
Old 2006-10-15, 08:45 AM   #7
oast
With $10,000, we'd be millionaires! We could buy all kinds of useful things like ... love!
 
oast's Avatar
 
Join Date: May 2004
Location: UK
Posts: 316
Matt, I just found a 'problem' with your code.

Let's say domain 'ThisWebsite.com' was listed. If, later, someone wanted to list 'EBSite.com' it would catch it as already listed... at least in my present MySQL query of "SELECT * FROM table WHERE url LIKE '%$url%'" or even "SELECT * FROM table WHERE url RLIKE '^http://([.a-z0-9]+)?$url'"

I think I'm going to have to re-write the database, and add a 'domain' field for the extracted domain; and do an exact match on that.
Oh well, at least there only 4,569 domains listed.
__________________
Playboy Webmasters - The name says it all! $35 per signup or 60% revshare.
oast is offline   Reply With Quote
Old 2006-10-15, 09:54 AM   #8
Mateusz
Screw you, guys. I'm going home.
 
Mateusz's Avatar
 
Join Date: Mar 2004
Location: Gliwice, Poland
Posts: 996
Send a message via ICQ to Mateusz
Yeah, I must admit I misunderstood what exactly you are looking for.. The fact you are 'comparing' domain names only changes a lot.

Well, at least you could filter all of domains you have in database and and for the ones that have 1 dot (no subdomains and no .com.au like domains) in the host name run the script - #1 at http://pl.php.net/manual/pl/function.parse-url.php

Maybe thats not much but I guess it would save you shitload of work anyway; rest would have to be done by hand...
Mateusz is offline   Reply With Quote
Old 2006-10-15, 04:45 PM   #9
matt
Trying is the first step towards failure
 
Join Date: Apr 2003
Location: Australia
Posts: 123
Send a message via ICQ to matt
Yeah, you'll definately have to change the db if you're trying to search on it, but that will be really easy. Add a new field, SELECT all the urls, run the code on each of them and UPDATE.
__________________
Link List Land - Linklist Creation, Design and Implementation.
matt is offline   Reply With Quote
Old 2006-10-16, 05:11 AM   #10
oast
With $10,000, we'd be millionaires! We could buy all kinds of useful things like ... love!
 
oast's Avatar
 
Join Date: May 2004
Location: UK
Posts: 316
Quote:
Originally Posted by matt View Post
Yeah, you'll definately have to change the db if you're trying to search on it, but that will be really easy. Add a new field, SELECT all the urls, run the code on each of them and UPDATE.
That is the option I will be using.

Thanks for your input guys
__________________
Playboy Webmasters - The name says it all! $35 per signup or 60% revshare.
oast is offline   Reply With Quote
Old 2006-10-15, 04:49 PM   #11
matt
Trying is the first step towards failure
 
Join Date: Apr 2003
Location: Australia
Posts: 123
Send a message via ICQ to matt
Alternatively you could make sure www. exists for every url (add it yourself where it's not already there) and there's a trailing slash on all domains (so http://www.domain.com/ instead of http://www.domain.com), and search on ".EBSite.com/".
__________________
Link List Land - Linklist Creation, Design and Implementation.

Last edited by matt; 2006-10-15 at 04:51 PM..
matt is offline   Reply With Quote
Old 2006-10-18, 02:43 AM   #12
QuickDraw
Heh Heh Heh! Lisa! Vampires are make believe, just like elves and gremlins and eskimos!
 
Join Date: Jan 2006
Posts: 72
Here's a function I use to get only the domain from a URL.. I think that's what you're after? Just pass it a url and it will return the domain.

Code:
function getdomain($url)
{
    // patterns we need to match
    $pattern_hostname = '/^(http:\/\/)?([^\/]+)/i';
    $pattern_domain = '/[^\.\/]+\.[^\.\/]+(\.[^\.\/]{2})?$/';
    // extract hostname from URL string
    @preg_match($pattern_hostname, $url, $matches);
     $hostname = $matches[2];
    // extract sld.tld(.cctld if exists)
    @preg_match($pattern_domain, $hostname, $matches);
    $domain = $matches[0];
    return $domain;
}
The only thing is.. it can't differentiate between a domain and an IP - if you pass it an IP, it gives you the last 2 octets as the domain. It would be easy enough to validate that before or after you use the function though.

Hope this helps..
QD

Last edited by QuickDraw; 2006-10-18 at 02:55 AM..
QuickDraw is offline   Reply With Quote
Old 2006-10-18, 05:37 AM   #13
kitty_kate
Well you know boys, a nuclear reactor is a lot like women. You just have to read the manual and press the right button
 
kitty_kate's Avatar
 
Join Date: Jun 2006
Location: The Colloseum
Posts: 155
Send a message via ICQ to kitty_kate
You could check first if it's an IP, and if it's not, then do your stuff with the domain name.
__________________
My Asian Kitty - My Free Sites
kitty_kate is offline   Reply With Quote
Old 2006-10-18, 08:42 AM   #14
oast
With $10,000, we'd be millionaires! We could buy all kinds of useful things like ... love!
 
oast's Avatar
 
Join Date: May 2004
Location: UK
Posts: 316
QuickDraw, kitty_kate:
My submission rules do not permit IP address URLs, so that is not an issue.

QuickDraw:
Thanks for the function. Yes it appears do what I want it to... in that it gets the domain name... but it still has the issue I mentioned in post 7, where it matches similarly named, longer domains.
  • 'http://www.greenguysboard.com' is listed in my database.
  • Someone comes along and submits the URL 'http://links.guyandjim.com'
  • The functions suggested (and the one I was using before) all (falsely) alert the submitter "a site from 'guyandjim.com' has already been added to our lists"
I am going to have to get the domain for each site I already have, update the database with a new 'domain' field, and then do exact matching on that field for future submissions.
__________________
Playboy Webmasters - The name says it all! $35 per signup or 60% revshare.
oast is offline   Reply With Quote
Old 2006-10-18, 09:06 AM   #15
Halfdeck
You can now put whatever you want in this space :)
 
Halfdeck's Avatar
 
Join Date: Oct 2004
Location: New Haven, CT
Posts: 985
Send a message via ICQ to Halfdeck
Code:
function base_domain($url) { // parse out subdomains
		
	$url = parse_url($url);
	$host = $url['host'];
		
	$pattern = "/([\-a-z0-9]*\.)?([\-a-z0-9]*)\.([a-z0-9]*)/i";
	preg_match($pattern, $host, $matches);
	$host = $matches[2] ."." .$matches[3];

	return $host;
}
Gotta be modified for stuff like .co.uk I guess.
__________________
Success is going from failure to failure without a loss of enthusiasm.

Last edited by Halfdeck; 2006-10-18 at 09:08 AM..
Halfdeck is offline   Reply With Quote
Old 2006-10-18, 12:07 PM   #16
oast
With $10,000, we'd be millionaires! We could buy all kinds of useful things like ... love!
 
oast's Avatar
 
Join Date: May 2004
Location: UK
Posts: 316
Quote:
Originally Posted by Halfdeck View Post
Gotta be modified for stuff like .co.uk I guess.
Yes, Halfdeck. Also does the same as the others that have been mentioned. Thanks anyway.
__________________
Playboy Webmasters - The name says it all! $35 per signup or 60% revshare.
oast is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 03:33 PM.


Mark Read
Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
© Greenguy Marketing Inc