Email mail address format checking regular expressions and code.
Check it out and see if you can find an fault.
http://SimonSlick.com/VEAF/ValidateEmailAddressFormat.html
preg_match
(PHP 4, PHP 5)
preg_match — Perform a regular expression match
Popis
int preg_match ( string $pattern, string $subject [, array &$matches [, int $flags [, int $offset]]] )Searches subject for a match to the regular expression given in pattern.
Seznam parametrů
- pattern
The pattern to search for, as a string.
- subject
The input string.
- matches
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
- flags
flags can be the following flag:
- PREG_OFFSET_CAPTURE
- If this flag is passed, for every occurring match the appendant string offset will also be returned. Note that this changes the return value in an array where every element is an array consisting of the matched string at offset 0 and its string offset into subject at offset 1.
- offset
Normally, the search starts from the beginning of the subject string. The optional parameter offset can be used to specify the alternate place from which to start the search.
Poznámka: Using offset is not equivalent to passing substr($subject, $offset) to preg_match() in place of the subject string, because pattern can contain assertions such as ^, $ or (?<=x). Compare:
<?php
$subject = "abcdef";
$pattern = '/^def/';
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 3);
print_r($matches);
?>Výše uvedený příklad vypíše:
Array ( )while this example
<?php
$subject = "abcdef";
$pattern = '/^def/';
preg_match($pattern, substr($subject,3), $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
?>will produce
Array ( [0] => Array ( [0] => def [1] => 0 ) )
Návratové hodnoty
preg_match() returns the number of times pattern matches. That will be either 0 times (no match) or 1 time because preg_match() will stop searching after the first match. preg_match_all() on the contrary will continue until it reaches the end of subject. preg_match() returns FALSE if an error occurred.
ChangeLog (záznam změn)
| Verze | Popis |
|---|---|
| 4.3.3 | The offset parameter was added |
| 4.3.0 | The PREG_OFFSET_CAPTURE flag was added |
| 4.3.0 | The flags parameter was added |
Příklady
Příklad 1444. Find the string of text "php"
<?php
// The "i" after the pattern delimiter indicates a case-insensitive search
if (preg_match("/php/i", "PHP is the web scripting language of choice.")) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
?>
Příklad 1445. Find the word "web"
<?php
/* The \b in the pattern indicates a word boundary, so only the distinct
* word "web" is matched, and not a word partial like "webbing" or "cobweb" */
if (preg_match("/\bweb\b/i", "PHP is the web scripting language of choice.")) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
if (preg_match("/\bweb\b/i", "PHP is the website scripting language of choice.")) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
?>
Příklad 1446. Getting the domain name out of a URL
<?php
// get host name from URL
preg_match('@^(?:http://)?([^/]+)@i',
"http://www.php.net/index.html", $matches);
$host = $matches[1];
// get last two segments of host name
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
echo "domain name is: {$matches[0]}\n";
?>
Výše uvedený příklad vypíše:
domain name is: php.net
Poznámky
Do not use preg_match() if you only want to check if one string is contained in another string. Use strpos() or strstr() instead as they will be faster.
Viz také
| preg_match_all() |
| preg_replace() |
| preg_split() |
preg_match
29-Aug-2007 11:27
25-Aug-2007 11:17
regex for validating emails, from Perl's RFC2822 package:
http://en.wikipedia.org/wiki/Talk:E-mail_address
23-Aug-2007 01:51
Для e-mail'a написал рег. выражение:
#[a-zA-Z]+[0-9a-zA-Z\.|\-|_]*
@([a-z]+[0-9a-zA-Z\.|\-|_]*){1,8}\.[a-z]{2,5}#
Пропускает:
- Стандартные мыла
- Мыла вида g.d.gd.v.pyp.ks-t.rv.ua@kkk.ddd.rs.tw
21-Aug-2007 12:41
A very good description with examples of how to build a RegEx matching mail addresses can be found here:
http://www.regular-expressions.info/email.html
01-Aug-2007 06:06
>>what about .mil, .golf,.tv etc etc
ICANN Does not list .golf TLD
A complete List of Top Level Domains from ICANN here:
http://data.iana.org/TLD/tlds-alpha-by-domain.txt
I also found this article about verifying Email-Adresses:
http://www.regular-expressions.info/email.html
26-Jul-2007 04:47
Maybe it will sound obvious, but I've encountered this a few times...
If you are using preg_match() to validate user input, remember about including ^ and $ to your regex or take input from $matches[0] after successfully matching a pattern ie.
preg_match('/[0-9]+/', '123 UNION SELECT ... --') will return TRUE, but when you it in a SQL statement, injected code will be probably executed(if you don't escape user argument). Note that $matches[0] == '123', so it can be used as a valid input.
24-Jul-2007 01:44
Match and replace for arrays. Useful for parsing entire $_POST
Only array_preg_match examples:
<?php
function array_preg_match(array $patterns, array $subjects, &$errors = array()) {
$errors = array();
foreach ($patterns as $k => $v) preg_match($v, $subjects[$k]) or $errors[$k] = TRUE;
return count($errors) == 0 ? TRUE : FALSE;
}
function array_preg_replace(array $patterns, array $replacements, array $subject) {
$r = array();
foreach ($patterns as $k => $v) $r[$k] = preg_replace($v, $replacements[$k], $subject[$k]);
return $r+$subject;
}
$arr1 = array('name' => 'Alexandre', 'phone' => '44559999');
$arr2 = array('name' => '', 'phone' => '44559999c');
array_preg_match(array(
'name' => '#.+#', //Not empty
'phone' => '#^$|(\d[^\D])+#' // Only digits, optional
), $arr1, $match_errors);
print_r($match_errors); // Empty, it is ok.
array_preg_match(array(
'name' => '#.+#', //Not empty
'phone' => '#^$|(\d[^\D])+#' // Only digits, optional
), $arr2, $match_errors);
print_r($match_errors); // Two indexes, name and phone, both not ok.
?>
23-Jul-2007 03:22
Ne'er try to verify email address by using some random regex you just invented sitting on the toilet seat. It will not work properly. The proper regex for email validation is something along the lines of
"([-!#$%&'*+/=?_`{|}~a-z0-9^]
+(\.[-!#$%&'*+/=?_`{|}~a-z0-9
^]+)*|"([\x0b\x0c\x21\x01-\x08\
x0e-\x1f\x23-\x5b\x5d-\x7f]|\\[\x
0b\x0c\x01-\x09\x0e-\x7f])*")@((
[a-z0-9]([-a-z0-9]*[a-z0-9])?\.)+[
a-z0-9]([-a-z0-9]*[a-z0-9]){1,1}|
\[((25[0-5]|2[0-4][0-9]|[01]?[0-9]
[0-9]?)\.){3,3}(25[0-5]|2[0-4][0-9
]|[01]?[0-9][0-9]?|[-a-z0-9]*[a-z0
-9]:([\x0b\x0c\x01-\x08\x0e-\x1f\
x21-\x5a\x53-\x7f]|\\[\x0b\x0c\x0
1-\x09\x0e-\x7f])+)\])".
However, you shouldn't even try that regex. If you do not understand what that regexp does, then please do not try to write one yourself. If you need a _truly_ _valid_ e-mail address, no regexp is going to help you - just send a verification message to the user-supplied address with a link or code the user can paste to verify the address. IF you still WISH - against my recommendation - to use some validating regexp then *please* just make it warn loudly that the address may be invalid; do not write code that throws a fatal error outright. I am quite fed up with sites that do not accept my .name e-mail address, or some other valid, working forms for that matter.
16-Jul-2007 04:09
>>..'com|org|net|gov|biz|info|name|aero|biz|info|jobs|'
.'museum)
what about .mil, .golf,.tv etc etc
the point of the code should be that you don't have to continuously go and update it
10-Jul-2007 01:33
I just started using PHP and this section doesn't clarify whether or not you must use "/" as your regular expression delimiters.
I want to clarify that you can use almost any character as your delimiter. The delimiter is automatically the first character of your regular expression string. This makes it a bit easier if you are looking for things that might contain a forward slash. For example::
preg_match('#</b>#', $string);
Instead of:
preg_match('/<\/b>/', $string);
Or:
preg_match('@/my/dir/name/@', $string);
Instead of:
preg_match('/\/my\/dir\/name\//', $string);
This can greatly boost readability. Not quite as flexible as in Perl (You can't use control characters or \n which can really come in handy when you aren't quite sure what characters might be in your regular expression), but switching to another delimiter can make your code a bit easier to read.
29-Jun-2007 06:36
function checkEmailAddress($mail)
$regex = '/\A(?:[a-z0-9!#$%&\'*+\/=?^_`{|}~-]+'
.'(?:\.[a-z0-9!#$%&\'*+\/=?^_`{|}~-]+)*@'
.'(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[a-z]{2}|'
.'com|org|net|gov|biz|info|name|aero|biz|info|jobs|'
.'museum)\b)\Z/i';
if (preg_match($regex, $mail)) {
return true;
} else {
return false;
}
11-Jun-2007 08:55
I'm not happy with any pattern of email address that I have seen.
The fallowing address are wrong:
email1..@myserver.com
email1.-@myserver.com
email1._@myserver.com
email1@2sub.myserver.com
email1@sub.sub.2sub.myserver.com
So, this is my pattern:
$pat =
"/^[a-z]+[a-z0-9]*[\.|\-|_]?[a-z0-9]+
@([a-z]+[a-z0-9]*[\.|\-]?[a-z]+[a-z0-9]*[a-z0-9]+){1,4}
\.[a-z]{2,4}$/";
Best Regards, Elier
http://www.faqs.org/rfcs/rfc1035.html
RFC 1035 - Domain names - implementation and specification
06-Dec-2006 06:19
This is a function to convert byte offsets into (UTF-8) character offsets (this is reagardless of whether you use /u modifier:
<?php
function mb_preg_match($ps_pattern, $ps_subject, &$pa_matches, $pn_flags = NULL, $pn_offset = 0, $ps_encoding = NULL) {
// WARNING! - All this function does is to correct offsets, nothing else:
//
if (is_null($ps_encoding))
$ps_encoding = mb_internal_encoding();
$pn_offset = strlen(mb_substr($ps_subject, 0, $pn_offset, $ps_encoding));
$ret = preg_match($ps_pattern, $ps_subject, $pa_matches, $pn_flags, $pn_offset);
if ($ret && ($pn_flags & PREG_OFFSET_CAPTURE))
foreach($pa_matches as &$ha_subpattern)
$ha_subpattern[1] = mb_strlen(substr($ps_subject, 0, $ha_subpattern[1]), $ps_encoding);
return $ret;
}
?>
20-Sep-2006 02:37
If you want to have perl equivalent regexp match:
$`, $& and $'
before the match, the match itself, after the match
Here's one way to do it:
echo preg_match("/(.*?)(and)(.*)/", "this and that",$matches);
print_r($matches);
$` = ${1};
$& = ${2};
$' = ${3};
Notice (.*) else the end won't match.
Note that if you only need $&, simply use ${0}.
Here's another way, which is a bit simpler to remember:
echo preg_match("/^(.*?)(and)(.*?)$/", "this and that",$matches);
print_r($matches);
17-Aug-2006 12:27
Concerning the German umlauts (and other language-specific chars as accented letters etc.): If you use unicode (utf-8), you can match them easily with the unicode character property \pL (match any unicode letter) and the "u" modifier, so e.g.
<?php preg_match("/[\w\pL]/u",$var); ?>
would really match all "words" in $var - whether they contain umlauts or not. Took me a while to figure this out, so maybe this comment will safe the day for someone else :-)
05-Jul-2006 11:48
This function (for PHP 4.3.0+) uses preg_match to return the regex position (like strpos, but using a regex pattern instead):
function preg_pos($sPattern, $sSubject, &$FoundString, $iOffset = 0) {
$FoundString = NULL;
if (preg_match($sPattern, $sSubject, $aMatches, PREG_OFFSET_CAPTURE, $iOffset) > 0) {
$FoundString = $aMatches[0][0];
return $aMatches[0][1];
}
else {
return FALSE;
}
}
It also returns the actual string found using the pattern, via $FoundString.
13-Feb-2006 10:25
How to verify a Canadian postal code!
if (!preg_match("/^[a-z]\d[a-z] ?\d[a-z]\d$/i" , $postalcode))
{
echo "Your postal code has an incorrect format."
}
29-Jan-2006 10:17
This is the only function in which the assertion \\G can be used in a regular expression. \\G matches only if the current position in 'subject' is the same as specified by the index 'offset'. It is comparable to the ^ assertion, but whereas ^ matches at position 0, \\G matches at position 'offset'.
26-Jan-2006 03:18
Intending to use preg_match to check whether an email address is in a valid format? The following page contains some very useful information about possible formats of email addresses, some of which may surprise you: http://en.wikipedia.org/wiki/E-mail_address
27-Dec-2005 08:27
Here's a format for matching US phone numbers in the following formats:
###-###-####
(###) ###-####
##########
It restricts the area codes to >= 200 and exchanges to >= 100, since values below these are invalid.
<?php
$pattern = "/(\([2-9]\d{2}\)\s?|[2-9]\d{2}-|[2-9]\d{2})"
. "[1-9]\d{2}"
. "-?\d{4}/";
?>
26-Oct-2005 10:37
Test for valid US phone number, and get it back formatted at the same time:
function getUSPhone($var) {
$US_PHONE_PREG ="/^(?:\+?1[\-\s]?)?(\(\d{3}\)|\d{3})[\-\s\.]?"; //area code
$US_PHONE_PREG.="(\d{3})[\-\.]?(\d{4})"; // seven digits
$US_PHONE_PREG.="(?:\s?x|\s|\s?ext(?:\.|\s)?)?(\d*)?$/"; // any extension
if (!preg_match($US_PHONE_PREG,$var,$match)) {
return false;
} else {
$tmp = "+1 ";
if (substr($match[1],0,1) == "(") {
$tmp.=$match[1];
} else {
$tmp.="(".$match[1].")";
}
$tmp.=" ".$match[2]."-".$match[3];
if ($match[4] <> '') $tmp.=" x".$match[4];
return $tmp;
}
}
usage:
$phone = $_REQUEST["phone"];
if (!($phone = getUSPhone($phone))) {
//error gracefully :)
}
22-Sep-2005 11:34
To check a Romanian landline phone number, and to return "Bucharest", "Proper" or "Unknown", I've used this function:
<?
function verify_destination($destination) {
$dst_length=strlen($destination);
if ($dst_length=="10"){
if(preg_match("/^021[2-7]{1}[0-9]{6}$/",$destination)) {
$destination_match="Bucharest";
} elseif (preg_match("/^02[3-6]{1}[0-9]{1}[1-7]{1}[0-9]{5}$/",$destination)) {
$destination_match = "Proper";
} else {
$destination_match = "Unknown";
}
}
return ($destination_match);
}
?>
26-Jul-2005 09:38
Watch out when using c-style comments around a preg_match or preg_* for that matter. In certain situations (like example below) the result will not be as expected. This one is of course easy to catch but worth noting.
/*
we will comment out this section
if (preg_match ("/anything.*/", $var)) {
code here;
}
*/
This is (I believe) because comments are interpreted first when parsing the code (and they should be). So in the preg_match the asterisk (*) and the ending delimiter (/) are interpreted as the end of the comment and the rest of your (supposedly commented) code is intrepreted as php.
04-Jul-2005 09:03
Do not forget PCRE has many compatible features with Perl.
One that is often neglected is the ability to return the matches as an associative array (Perl's hash).
For example, here's a code snippet that will parse a subset of the XML Schema 'duration' datatype:
<?php
$duration_tag = 'PT2M37.5S'; // 2 minutes and 37.5 seconds
// drop the milliseconds part
preg_match(
'#^PT(?:(?P<minutes>\d+)M)?(?P<seconds>\d+)(?:\.\d+)?S$#',
$duration_tag,
$matches);
print_r($matches);
?>
Here is the corresponding output:
Array
(
[0] => PT2M37.5S
[minutes] => 2
[1] => 2
[seconds] => 37
[2] => 37
)
14-Mar-2005 03:57
The ExtractString function does not have a real error, but some disfunction. What if is called like this:
ExtractString($row, 'action="', '"');
It would find 'action="' correctly, but perhaps not the first " after the $start-string. If $row consists of
<form method="post" action="script.php">
strpos($str_lower, $end) would return the first " in the method-attribute. So I made some modifications and it seems to work fine.
function ExtractString($str, $start, $end)
{
$str_low = strtolower($str);
$pos_start = strpos($str_low, $start);
$pos_end = strpos($str_low, $end, ($pos_start + strlen($start)));
if ( ($pos_start !== false) && ($pos_end !== false) )
{
$pos1 = $pos_start + strlen($start);
$pos2 = $pos_end - $pos1;
return substr($str, $pos1, $pos2);
}
}
11-Feb-2005 10:03
Pointing to the post of "internet at sourcelibre dot com": Instead of using PerlRegExp for e.g. german "Umlaute" like
<?php
$bolMatch = preg_match("/^[a-zA-Z]+$/", $strData);
?>
use the setlocal command and the POSIX format like
<?php
setlocale (LC_ALL, 'de_DE');
$bolMatch = preg_match("/^[[:alpha:]]+$/", $strData);
?>
This works for any country related special character set.
Remember since the "Umlaute"-Domains have been released it's almost mandatory to change your RegExp to give those a chance to feed your forms which use "Umlaute"-Domains (e-mail and internet address).
Live can be so easy reading the manual ;-)
13-Jan-2005 05:11
Note that the PREG_OFFSET_CAPTURE flag, as far as I've tested, returns the offset in bytes not characters, which may not be what you're expecting if you're using the /u pattern modifier to make the regex UTF-8 aware (i.e. multibyte characters will result in a greater offset than you expect)
This is a constant that helps in getting a valid phone number that does not need to be in a particular format. The following is a constant that matches the following US Phone formats:
Phone number can be in many variations of the following:
(Xxx) Xxx-Xxxx
(Xxx) Xxx Xxxx
Xxx Xxx Xxxx
Xxx-Xxx-Xxxx
XxxXxxXxxx
Xxx.Xxx.Xxxx
define( "REGEXP_PHONE", "/^(\(|){1}[2-9][0-9]{2}(\)|){1}([\.- ]|)[2-9][0-9]{2}([\.- ]|)[0-9]{4}$/" );
06-Jul-2004 01:53
To regex a North American phone number you can assume NxxNxxXXXX, where N = 2 through 9 and x = 0 through 9. North American numbers can not start with a 0 or a 1 in either the Area Code or the Office Code. So, adpated from the other phone number regex here you would get:
/^[2-9][0-9]{2}[-][2-9][0-9]{2}[-][0-9]{4}$/
A very simple Phone number validation function.
Returns the Phone number if the number is in the xxx-xxx-xxxx format. x being 0-9.
Returns false if missing digits or improper characters are included.
<?
function VALIDATE_USPHONE($phonenumber)
{
if ( (preg_match("/^[0-9]{3,3}[-]{1,1}[0-9]{3,3}[-]{1,1}
[0-9]{4,4}$/", $phonenumber) ) == TRUE ) {
return $phonenumber;
} else {
return false;
}
}
?>
02-Feb-2004 06:30
<?php // some may find this usefull... :)
$iptables = file ('/proc/net/ip_conntrack');
$services = file ('/etc/services');
$GREP = '!([a-z]+) ' .// [1] protocol
'\\s*([^ ]+) ' .// [2] protocl in decimal
'([^ ]+) ' .// [3] time-to-live
'?([A-Z_]|[^ ]+)?'.// [4] state
' src=(.*?) ' .// [5] source address
'dst=(.*?) ' .// [6] destination address
'sport=(\\d{1,5}) '.// [7] source port
'dport=(\\d{1,5}) '.// [8] destination port
'src=(.*?) ' .// [9] reversed source
'dst=(.*?) ' .//[10] reversed destination
'sport=(\\d{1,5}) './/[11] reversed source port
'dport=(\\d{1,5}) './/[12] reversed destination port
'\\[([^]]+)\\] ' .//[13] status
'use=([0-9]+)!'; //[14] use
$ports = array();
foreach($services as $s) {
if (preg_match ("/^([a-zA-Z-]+)\\s*([0-9]{1,5})\\//",$s,$x)) {
$ports[ $x[2] ] = $x[1];
} }
for($i=0;$i <= count($iptables);$i++) {
if ( preg_match ($GREP, $iptables[$i], $x) ) {
// translate known ports... . .
$x[7] =(array_key_exists($x[7],$ports))?$ports[$x[7]]:$x[7];
$x[8] =(array_key_exists($x[8],$ports))?$ports[$x[8]]:$x[8];
print_r($x);
} // on a nice sortable-table... bon appetite!
}
?>
17-Jan-2004 11:31
As I did not find any working IPv6 Regexp, I just created one. Here is it:
$pattern1 = '([A-Fa-f0-9]{1,4}:){7}[A-Fa-f0-9]{1,4}';
$pattern2 = '[A-Fa-f0-9]{1,4}::([A-Fa-f0-9]{1,4}:){0,5}[A-Fa-f0-9]{1,4}';
$pattern3 = '([A-Fa-f0-9]{1,4}:){2}:([A-Fa-f0-9]{1,4}:){0,4}[A-Fa-f0-9]{1,4}';
$pattern4 = '([A-Fa-f0-9]{1,4}:){3}:([A-Fa-f0-9]{1,4}:){0,3}[A-Fa-f0-9]{1,4}';
$pattern5 = '([A-Fa-f0-9]{1,4}:){4}:([A-Fa-f0-9]{1,4}:){0,2}[A-Fa-f0-9]{1,4}';
$pattern6 = '([A-Fa-f0-9]{1,4}:){5}:([A-Fa-f0-9]{1,4}:){0,1}[A-Fa-f0-9]{1,4}';
$pattern7 = '([A-Fa-f0-9]{1,4}:){6}:[A-Fa-f0-9]{1,4}';
patterns 1 to 7 represent different cases. $full is the complete pattern which should work for all correct IPv6 addresses.
$full = "/^($pattern1)$|^($pattern2)$|^($pattern3)$
|^($pattern4)$|^($pattern5)$|^($pattern6)$|^($pattern7)$/";
23-Nov-2003 01:23
A web server log record can be parsed as follows:
$line_in = '209.6.145.47 - - [22/Nov/2003:19:02:30 -0500] "GET /dir/doc.htm HTTP/1.0" 200 6776 "http://search.yahoo.com/search?p=key+words=UTF-8" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"';
if (preg_match('!^([^ ]+) ([^ ]+) ([^ ]+) \[([^\]]+)\] "([^ ]+) ([^ ]+) ([^/]+)/([^"]+)" ([^ ]+) ([^ ]+) ([^ ]+) (.+)!',
$line_in,
$elements))
{
print_r($elements);
}
Array
(
[0] => 209.6.145.47 - - [22/Nov/2003:19:02:30 -0500] "GET /dir/doc.htm HTTP/1.0" 200 6776 "http://search.yahoo.com/search?p=key+words=UTF-8" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
[1] => 209.6.145.47
[2] => -
[3] => -
[4] => 22/Nov/2003:19:02:30 -0500
[5] => GET
[6] => /dir/doc.htm
[7] => HTTP
[8] => 1.0
[9] => 200
[10] => 6776
[11] => "http://search.yahoo.com/search?p=key+words=UTF-8"
[12] => "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
)
Notes:
1) For the referer field ($elements[11]), I intentially capture the double quotes (") and don't use them as delimiters, because sometimes double-quotes do appear in a referer URL. Double quotes can appear as %22 or \". Both have to be handled correctly. So, I strip off the double quotes in a second step.
2) The URLs should be further parsed, using parse_url, which is quicker and more reliable then preg_match.
3) I assume the requested protocol (HTTP/1.1) always has a slash character in the middle, which might not always be the case, but I'll take the risk.
4) The agent field ($elments[12]) is the most unstructured field, so I make no assumptions about it's format. If the record is truncated, the agent field will not be delimited properly with a quote at the end. So, both cases must be handled.
5) A hyphen (- or "-") means a field has no value. It is necessary to convert these to appropriate value (such as empty string, null, or 0).
6) Finally, there should be appropriate code to handle malformed web log enteries, which are common, due to junk data. I never assume I've seen all cases.
11-Nov-2003 12:29
Backreferences (ala preg_replace) work within the search string if you use the backslash syntax. Consider:
<?php
if (preg_match("/([0-9])(.*?)(\\1)/", "01231234", $match))
{
print_r($match);
}
?>
Result: Array ( [0] => 1231 [1] => 1 [2] => 23 [3] => 1 )
This is alluded to in the description of preg_match_all, but worth reiterating here.
31-Mar-2003 05:56
I you want to match all scandinavian characters () in addition to those matched by \w, you might want to use this regexp:
/^[\w\xe6\xc6\xf8\xd8\xe5\xc5\xf6\xd6\xe4\xc4]+$/
Remember that \w respects the current locale used in PCRE's character tables.
