Yet another regular expression to parse URL
I needed an expression which can separate directory and file parts from URL. So, here it is:
^(https?)://([.0-9a-zA-Z-]+)(/?.*?)([^/]*)$
It extracts four URL parts: protocol://domain/directory/fileWithParams
It can be used in Perl this way:
my $url = "http://example.com/directory/file?parameters";
my ($proto, $domain, $dir, $file) = ($url =~ m{^(https?)://([.0-9a-zA-Z-]+)(/?.*?)([^/]*)$});
print “$proto|$domain|$dir|$file\n”;
And will print: http|example.com|/directory/|file?parameters
Here is the same in C++ using boost_regex:
string url = "http://example.com/directory/file?parameters";
regex expr("(https?)://([.0-9a-zA-Z-]+)(/?.*?)([^/]*)");
smatch match;
if (regex_match(url, match, expr)) {
string proto = match[1], domain = match[2], dir = match[3], file = match[4];
cout << proto << '|' << domain << '|' << dir << '|' << file << cout;
}