Greenguy's Board

Greenguy's Board (http://www.greenguysboard.com/board/index.php)
-   Programming & Scripting (http://www.greenguysboard.com/board/forumdisplay.php?f=15)
-   -   Getting the Hostname in PHP (http://www.greenguysboard.com/board/showthread.php?t=35247)

oast 2006-10-13 10:03 AM

Getting the Hostname in PHP
 
I've been searching all day, trying different functions, regular expressions, etc., but nothing I've tried seems to work fully, |banghead|
so I've decided to let someone else have a go...

What I want to do is check for uniqueness of the domain of a submitted site. I do not even want to allow subdomains.

So far I have this:
PHP Code:

$url_in_db 'http://www.domain.tld/thispage.html';
$submitted_url 'http://www.domain.tld/anotherpage.html';

$parsedurl parse_url($submitted_url);
$get_the_hostname $parsedurl["host"];

if (
strstr ($url_in_db$get_the_hostname))
echo 
"match found";
else
echo 
"no match"

That works, but it doesn't do exactly what I want it to do, because although it finds a match if the 'www.' is left off, it doesn't find a match on a subdomain entry like 'http://web.domain.tld/thispage.html'

There is an example at php.net to get the hostname:
PHP Code:

preg_match('@^(?:http://)?([^/]+)@i'"http://www.domain.tld/index.html"$matches);
$host $matches[1];
preg_match('/[^.]+\.[^.]+$/'$host$matches); 

This doesn't work for the '.co.uk', '.com.au', etc TLDs as it returns the last two elements of the domain, nor would it match any 'https://' URLs

So it is over to the PHP gurus at |greenguy| & |Jim

There is no rush on this, but if I could have a solution by Thursday 12th Oct, that would be great :D

Note - As you may have guessed, the '$url_in_db' is stored in a MySQL table, in case a function exists in a query expression that I may have missed.

matt 2006-10-14 06:08 PM

Here's a really ugly way to do it... I'm sure there's a better one but this should work. Something like:
PHP Code:

$tld = array('com','net','org','co'); // might need to add more
$bits explode('.',$host);

if(
$bits[0] == 'www')
array_shift($bits); // the www exists, but it's irrelevant so shift it off the start

if(count($bits) > 4)
{
//some crazy domain action with at least 2 subdomain levels, i'd give them the shaft... but you can reuse what's below for more subdomain levels if ya want to.
}
elseif(
count($bits) == 4)
{
  if(
array_search($bits[2],$tld) !== false)
  {
  
// third bit is a tld, so this is a domain.com.au with a subdomain
  
$domain $bits[1] .'.'$bits[2] .'.'$bits[3];
  }
  else
  {
  
//some crazy domain action with at least 2 subdomain levels or it's an ip, i'd give them the shaft
  
}
}
elseif(
count($bits) == 2)
{
//can only have the name + tld
$domain $bits[0] .'.'$bits[1];
}
elseif(
array_search($bits[1],$tld) !== false)
{
//there are three parts, second one is a tld, so this is a valid domain
$domain $bits[0] .'.'$bits[1] .'.'$bits[2];
}
else
{
//there are three parts, but no tld as the second one, must be using a subdomain.
$domain $bits[1] .'.'$bits[2];


Hope that helps :)

matt 2006-10-14 06:18 PM

I just realised that it might be easier for you to use regex after parse_url... Only just woken up, so the brain's only half fired up.. :) But it would still be pretty much doing with i did in php with regex... Depends how neat you like your code!

oast 2006-10-14 08:01 PM

Thanks matt

I'll put the code to some in-house testing and let you know how I get on.

Steve

Mateusz 2006-10-15 06:19 AM

Try this - this function can be usefull for blacklist and recip checking as well
PHP Code:

if (eregi($new_url$old_url)) {do sth}
   else {do 
sth else}; 

Please note there will be problem if someone submited
http://mydomain.com/freesite1/freesite2/freesite3/
and
http://mydomain.com/freesite1/
was already in DB

I see no point in creating such directory structure for free sites and I guess nobody does

oast 2006-10-15 07:38 AM

Quote:

Originally Posted by Mateusz (Post 306322)
Please note there will be problem if someone submited
http://mydomain.com/freesite1/freesite2/freesite3/
and
http://mydomain.com/freesite1/
was already in DB

That is the sort of thing I am trying to prevent. It is not a freesite list, it is a directory listing domains (websites) not pages.

As far as I can see, your function will not do what I want it to.
I.e. if [url]http://mydomain.com/freesite1/ was in the DB, if they then submitted http://www.mydomain.com/freesite1/, http://web.mydomain.com/freesite1/, http://sub.mydomain.com/freesite1/ and http://another.mydomain.com/freesite1/ it wouldn't catch them.

oast 2006-10-15 08:45 AM

Matt, I just found a 'problem' with your code.

Let's say domain 'ThisWebsite.com' was listed. If, later, someone wanted to list 'EBSite.com' it would catch it as already listed... at least in my present MySQL query of "SELECT * FROM table WHERE url LIKE '%$url%'" or even "SELECT * FROM table WHERE url RLIKE '^http://([.a-z0-9]+)?$url'"

I think I'm going to have to re-write the database, and add a 'domain' field for the extracted domain; and do an exact match on that.
Oh well, at least there only 4,569 domains listed. :(

Mateusz 2006-10-15 09:54 AM

Yeah, I must admit I misunderstood what exactly you are looking for.. The fact you are 'comparing' domain names only changes a lot.

Well, at least you could filter all of domains you have in database and and for the ones that have 1 dot (no subdomains and no .com.au like domains) in the host name run the script - #1 at http://pl.php.net/manual/pl/function.parse-url.php

Maybe thats not much but I guess it would save you shitload of work anyway; rest would have to be done by hand...

matt 2006-10-15 04:45 PM

Yeah, you'll definately have to change the db if you're trying to search on it, but that will be really easy. Add a new field, SELECT all the urls, run the code on each of them and UPDATE. :)

matt 2006-10-15 04:49 PM

Alternatively you could make sure www. exists for every url (add it yourself where it's not already there) and there's a trailing slash on all domains (so http://www.domain.com/ instead of http://www.domain.com), and search on ".EBSite.com/".

oast 2006-10-16 05:11 AM

Quote:

Originally Posted by matt (Post 306427)
Yeah, you'll definately have to change the db if you're trying to search on it, but that will be really easy. Add a new field, SELECT all the urls, run the code on each of them and UPDATE. :)

That is the option I will be using.

Thanks for your input guys

QuickDraw 2006-10-18 02:43 AM

Here's a function I use to get only the domain from a URL.. I think that's what you're after? Just pass it a url and it will return the domain.

Code:

function getdomain($url)
{
    // patterns we need to match
    $pattern_hostname = '/^(http:\/\/)?([^\/]+)/i';
    $pattern_domain = '/[^\.\/]+\.[^\.\/]+(\.[^\.\/]{2})?$/';
    // extract hostname from URL string
    @preg_match($pattern_hostname, $url, $matches);
    $hostname = $matches[2];
    // extract sld.tld(.cctld if exists)
    @preg_match($pattern_domain, $hostname, $matches);
    $domain = $matches[0];
    return $domain;
}

The only thing is.. it can't differentiate between a domain and an IP - if you pass it an IP, it gives you the last 2 octets as the domain. It would be easy enough to validate that before or after you use the function though.

Hope this helps..
QD

kitty_kate 2006-10-18 05:37 AM

You could check first if it's an IP, and if it's not, then do your stuff with the domain name.

oast 2006-10-18 08:42 AM

QuickDraw, kitty_kate:
My submission rules do not permit IP address URLs, so that is not an issue.

QuickDraw:
Thanks for the function. Yes it appears do what I want it to... in that it gets the domain name... but it still has the issue I mentioned in post 7, where it matches similarly named, longer domains.
  • 'http://www.greenguysboard.com' is listed in my database.
  • Someone comes along and submits the URL 'http://links.guyandjim.com'
  • The functions suggested (and the one I was using before) all (falsely) alert the submitter "a site from 'guyandjim.com' has already been added to our lists"
I am going to have to get the domain for each site I already have, update the database with a new 'domain' field, and then do exact matching on that field for future submissions.

Halfdeck 2006-10-18 09:06 AM

Code:

function base_domain($url) { // parse out subdomains
               
        $url = parse_url($url);
        $host = $url['host'];
               
        $pattern = "/([\-a-z0-9]*\.)?([\-a-z0-9]*)\.([a-z0-9]*)/i";
        preg_match($pattern, $host, $matches);
        $host = $matches[2] ."." .$matches[3];

        return $host;
}

Gotta be modified for stuff like .co.uk I guess.

oast 2006-10-18 12:07 PM

Quote:

Originally Posted by Halfdeck (Post 306965)
Gotta be modified for stuff like .co.uk I guess.

Yes, Halfdeck. Also does the same as the others that have been mentioned. Thanks anyway.


All times are GMT -4. The time now is 03:10 PM.

Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
© Greenguy Marketing Inc