197

I want to use PHP to check, if string stored in $myoutput variable contains a valid link syntax or is it just a normal text. The function or solution, that I'm looking for, should recognize all links formats including the ones with GET parameters.

A solution, suggested on many sites, to actually query string (using CURL or file_get_contents() function) is not possible in my case and I would like to avoid it.

I thought about regular expressions or another solution.

2
  • Using CURL or getting it's HTTP contents may be slow, if you want something more speedy and almost as reliable, consider using gethostbyaddr() on the hostname. If it resolves to an IP, then it probably has a website. Of course this depends on your needs.
    – TravisO
    Commented Jan 13, 2010 at 18:28
  • 1
    I would be interested in the use case for this. Commented Jun 26, 2021 at 6:47

13 Answers 13

410

You can use a native Filter Validator

filter_var($url, FILTER_VALIDATE_URL);

Validates value as URL (according to » http://www.faqs.org/rfcs/rfc2396), optionally with required components. Beware a valid URL may not specify the HTTP protocol http:// so further validation may be required to determine the URL uses an expected protocol, e.g. ssh:// or mailto:. Note that the function will only find ASCII URLs to be valid; internationalized domain names (containing non-ASCII characters) will fail.

Example:

if (filter_var($url, FILTER_VALIDATE_URL) === FALSE) {
    die('Not a valid URL');
}
18
  • 20
    Be aware that FILTER_VALIDATE_URL will not validate the protocol of a url. So ssh://, ftp:// etc will pass.
    – Seph
    Commented May 10, 2014 at 14:03
  • 6
    @SephVelut expected behavior since these are valid URLs.
    – Gordon
    Commented May 10, 2014 at 16:10
  • 1
    @Gordon Still important to point out the caveat. http/https is blankly seen as the de-facto aspect of url's.
    – Seph
    Commented May 11, 2014 at 1:39
  • 6
    @JoshHabdas, I think you're missing the point. The PHP code does exactly what it claims to do. But it can't read your mind. There's a huge difference between invalid and unwanted.. Unwanted is very subjective, which is why it's left to the programmer to work out that detail. You might also note the code validates the URL, but doesn't prove it exists. It's not PHP's fault that a user mistyped "amazon," "amozon," which would validate, but is still unwanted.
    – JBH
    Commented Mar 22, 2018 at 19:15
  • 3
    @Jeffz ttps://www.youtube.com is a syntactically valid URL. Mind the quote in the answer.
    – Gordon
    Commented May 19, 2020 at 15:57
38

Here is the best tutorial I found over there:

http://www.w3schools.com/php/filter_validate_url.asp

<?php
$url = "http://www.qbaki.com";

// Remove all illegal characters from a url
$url = filter_var($url, FILTER_SANITIZE_URL);

// Validate url
if (filter_var($url, FILTER_VALIDATE_URL) !== false) {
echo("$url is a valid URL");
} else {
echo("$url is not a valid URL");
}
?>

Possible flags:

FILTER_FLAG_SCHEME_REQUIRED - URL must be RFC compliant (like http://example)
FILTER_FLAG_HOST_REQUIRED - URL must include host name (like http://www.example.com)
FILTER_FLAG_PATH_REQUIRED - URL must have a path after the domain name (like www.example.com/example1/)
FILTER_FLAG_QUERY_REQUIRED - URL must have a query string (like "example.php?name=Peter&age=37")
4
  • @ErichGarcía this code doesn't check that it's a valid HTTP/S URL's like the OP asks. This will pass things like ssh://, ftp:// etc this only checks if its a syntactically valid URL according to RFC 2396
    – twigg
    Commented May 20, 2019 at 19:31
  • Do not use FILTER_VALIDATE_URL. It is messy and unreliable. E.g. it validates ttps://www.youtube.com as valid
    – Jeffz
    Commented May 17, 2020 at 13:23
  • 1
    The very necessary filter flags were removed as of PHP 8
    – Hobbamok
    Commented Nov 10, 2022 at 22:31
  • @DomenicoDeFelice, filter_var returns "the filtered data", which is a string in this case, or a boolean false if the filter condition is not met. It won't return a boolean true value. So checking filter_var(...) === true just won't work.
    – scott8035
    Commented Oct 4, 2023 at 11:54
25

Using filter_var() will fail for urls with non-ascii chars, e.g. (http://pt.wikipedia.org/wiki/Guimarães). The following function encode all non-ascii chars (e.g. http://pt.wikipedia.org/wiki/Guimar%C3%A3es) before calling filter_var().

Hope this helps someone.

<?php

function validate_url($url) {
    $path = parse_url($url, PHP_URL_PATH);
    $encoded_path = array_map('urlencode', explode('/', $path));
    $url = str_replace($path, implode('/', $encoded_path), $url);

    return filter_var($url, FILTER_VALIDATE_URL) ? true : false;
}

// example
if(!validate_url("http://somedomain.com/some/path/file1.jpg")) {
    echo "NOT A URL";
}
else {
    echo "IS A URL";
}
6
  • This is it. Finally someone came back in 2017
    – Kyle KIM
    Commented Apr 24, 2018 at 14:51
  • Works for me (the others do not BTW) :)
    – Jono
    Commented Aug 18, 2018 at 13:07
  • This is the ONLY solution that worked for me. Thanks!
    – Silas
    Commented Dec 6, 2019 at 17:47
  • This is not a check which will get 100% correct results! This will only handle non-ascii characters in the path, not in the domain path of the URL. Nowadays, you can also use other unicode chars in the domain - which will be converted to punycode (see en.wikipedia.org/wiki/Punycode), e.g. "guimarães.org". So if you regard the non-punycode converted URLs as valid - your check will fail on these. Even if you handle this in the check, there is still the question of e.g. "ttps://mydomain.org" being falsely interpreted as valid! (as pointed out in other answers) Commented Jun 26, 2021 at 6:40
  • Not necessary anymore (at least for my PHP 7.4 installation)
    – rabudde
    Commented Mar 24, 2022 at 12:17
11
function is_url($uri){
    if(preg_match( '/^(http|https):\\/\\/[a-z0-9_]+([\\-\\.]{1}[a-z_0-9]+)*\\.[_a-z]{2,5}'.'((:[0-9]{1,5})?\\/.*)?$/i' ,$uri)){
      return $uri;
    }
    else{
        return false;
    }
}
2
7

Actually... filter_var($url, FILTER_VALIDATE_URL); doesn't work very well. When you type in a real url, it works but, it only checks for http:// so if you type something like "http://weirtgcyaurbatc", it will still say it's real.

2
6

Personally I would like to use regular expression here. Bellow code perfectly worked for me.

$baseUrl     = url('/'); // for my case https://www.xrepeater.com
$posted_url  = "home";
// Test with one by one
/*$posted_url  = "/home";
$posted_url  = "xrepeater.com";
$posted_url  = "www.xrepeater.com";
$posted_url  = "http://www.xrepeater.com";
$posted_url  = "https://www.xrepeater.com";
$posted_url  = "https://xrepeater.com/services";
$posted_url  = "xrepeater.dev/home/test";
$posted_url  = "home/test";*/

$regularExpression  = "((https?|ftp)\:\/\/)?"; // SCHEME Check
$regularExpression .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?"; // User and Pass Check
$regularExpression .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP Check
$regularExpression .= "(\:[0-9]{2,5})?"; // Port Check
$regularExpression .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path Check
$regularExpression .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)?"; // GET Query String Check
$regularExpression .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor Check

if(preg_match("/^$regularExpression$/i", $posted_url)) { 
    if(preg_match("@^http|https://@i",$posted_url)) {
        $final_url = preg_replace("@(http://)+@i",'http://',$posted_url);
        // return "*** - ***Match : ".$final_url;
    }
    else { 
          $final_url = 'http://'.$posted_url;
          // return "*** / ***Match : ".$final_url;
         }
    }
else {
     if (substr($posted_url, 0, 1) === '/') { 
         // return "*** / ***Not Match :".$final_url."<br>".$baseUrl.$posted_url;
         $final_url = $baseUrl.$posted_url;
     }
     else { 
         // return "*** - ***Not Match :".$posted_url."<br>".$baseUrl."/".$posted_url;
         $final_url = $baseUrl."/".$final_url; }
}
1
  • 3
    This is the best answer to validate websites URL. With few changes this work perfectly. Thanks Commented Nov 27, 2019 at 15:37
4

Given issues with filter_var() needing http://, I use:

$is_url = filter_var($filename, FILTER_VALIDATE_URL) || array_key_exists('scheme', parse_url($filename));

3
  • 2
    Do not use FILTER_VALIDATE_URL. It is messy and unreliable. E.g. it validates ttps://www.youtube.com as valid
    – Jeffz
    Commented May 17, 2020 at 13:24
  • 1
    @Jeffz FILTER_VALIDATE_URL does validate urls. A scheme is not limited to http or https only, these are all valid schemes ftp, mailto, file, data and irc. They are registered with IANA but also non registered schemes can be used. So as per URI definition ttps is a valid scheme Commented May 2, 2022 at 23:38
  • @MarinaDunst Yeah but kkdjf://www.youtube.com is valid too according to FILTER_VALIDATE_URL. It's definitely unreliable. Commented Feb 11, 2023 at 23:17
3

You can use this function, but its will return false if website offline.

  function isValidUrl($url) {
    $url = parse_url($url);
    if (!isset($url["host"])) return false;
    return !(gethostbyname($url["host"]) == $url["host"]);
}
3

Another way to check if given URL is valid is to try to access it, below function will fetch the headers from given URL, this will ensure that URL is valid AND web server is alive:

function is_url($url){
        $response = array();
        //Check if URL is empty
        if(!empty($url)) {
            $response = get_headers($url);
        }
        return (bool)in_array("HTTP/1.1 200 OK", $response, true);
/*Array
(
    [0] => HTTP/1.1 200 OK 
    [Date] => Sat, 29 May 2004 12:28:14 GMT
    [Server] => Apache/1.3.27 (Unix)  (Red-Hat/Linux)
    [Last-Modified] => Wed, 08 Jan 2003 23:11:55 GMT
    [ETag] => "3f80f-1b6-3e1cb03b"
    [Accept-Ranges] => bytes
    [Content-Length] => 438
    [Connection] => close
    [Content-Type] => text/html
)*/ 
    }   
2
  • Nice idea. This will fail if the server is using HTTP/1.0 or HTTP/2.0, or returns a redirect. Commented Feb 23, 2017 at 8:16
  • Yes, it is a starting point, further improvements can be done easily. Commented Feb 23, 2017 at 8:31
1

Came across this article from 2012. It takes into account variables that may or may not be just plain URLs.

The author of the article, David Müeller, provides this function that he says, "...could be worth wile [sic]," along with some examples of filter_var and its shortcomings.

/**
 * Modified version of `filter_var`.
 *
 * @param  mixed $url Could be a URL or possibly much more.
 * @return bool
 */
function validate_url( $url ) {
    $url = trim( $url );

    return (
        ( strpos( $url, 'http://' ) === 0 || strpos( $url, 'https://' ) === 0 ) &&
        filter_var(
            $url,
            FILTER_VALIDATE_URL,
            FILTER_FLAG_SCHEME_REQUIRED || FILTER_FLAG_HOST_REQUIRED
        ) !== false
    );
}
2
  • Works better than simple filter_var, but also validates youtube, which basically is a valid url, but a local one (without tld)
    – NemoXP
    Commented Dec 15, 2020 at 8:48
  • 5
    FILTER_FLAG_ will now be removed in php 8.0, so this seems to be no loger an option.
    – Andreas
    Commented Jun 16, 2021 at 9:01
0
public function testing($Url=''){
    $ch = curl_init($Url);
    curl_setopt($ch, CURLOPT_TIMEOUT, 5);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $data = curl_exec($ch);
    $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);
    return ($httpcode >= 200 && $httpcode < 300) ? true : false;
}
2
  • 1
    Please add some explanation to your answer such that others can learn from it. Where does $this->output come from?
    – Nico Haase
    Commented Sep 15, 2020 at 14:34
  • have made it more clear now
    – katulamu
    Commented Jan 16, 2022 at 8:27
0

There are three separated function I wrote in this case, I hope be useful:

/**
 * Check if the string is a relative or absolute URL
 * @param null|string $url The url string
 * @return bool
 */
function isUrl(string|null $url):bool{
    return (!empty($url)) && preg_match("/^(\w+\:[\/]*)?(\/?[^\/\{\}\|^\[\]\"`\r\n\t\f]){1,}$/",$url);
}
/**
 * Check if the string is only a relative URL
 * @param null|string $url The url string
 * @return bool
 */
function isRelativeUrl(string|null $url):bool{
    return (!empty($url)) && preg_match("/^(\/?[^\/\{\}\|\^\[\]\"\`\r\n\t\f]){1,}$/",$url);
}
/**
 * Check if the string is only an absolute URL
 * @param null|string $url The url string
 * @return bool
 */
function isAbsoluteUrl(string|null $url):bool{
    return (!empty($url)) && preg_match("/^\w+\:\/*(\/?[^\/\{\}\|^\[\]\"\`\r\n\t\f]){1,}$/",$url);
}

Enjoy...

-2

if anyone is interested to use the cURL for validation. You can use the following code.

<?php 
public function validationUrl($Url){
        if ($Url == NULL){
            return $false;
        }
        $ch = curl_init($Url);
        curl_setopt($ch, CURLOPT_TIMEOUT, 5);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        $data = curl_exec($ch);
        $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        curl_close($ch);
        return ($httpcode >= 200 && $httpcode < 300) ? true : false; 
    }

Not the answer you're looking for? Browse other questions tagged or ask your own question.