Regex Checking of Foreign Characters in PHP

Regular expressions are great at checking names for invalid characters, but you need to keep our international friends — and their fancy letters — in mind when doing so. The basic A-z regex pattern won't allow things like the é in café. Don't take my word for it, see for yourself:

$input = 'café';
$pattern = '/[^A-z ]+/';
if (preg_match($pattern,$input)) { $r = 'invalid'; } else { $r = 'valid'; }
// $r: invalid

The good news is that there's a really easy way around this. Simply add \p{L} to your regex pattern (for more on why, see PHP's unicode character properties) and add the u pattern modifier so that the string is evaluated as UTF-8. See this in action here:

$input = 'café';
$pattern = '/[^A-z \p{L}]+/u';
if (preg_match($pattern,$input)) { $r = 'invalid'; } else { $r = 'valid'; }
// $r: valid

Just to be sure we didn't screw anything up, let's test with two more inputs. The first, diner, should still be evaluated as valid and the second, $400, should still be invalid. Let's make sure that's the case:

$input = 'diner';
$pattern = '/[^A-z \p{L}]+/u';
if (preg_match($pattern,$input)) { $r = 'invalid'; } else { $r = 'valid'; }
// $r: valid
$input = '$400';
$pattern = '/[^A-z \p{L}]+/u';
if (preg_match($pattern,$input)) { $r = 'invalid'; } else { $r = 'valid'; }
// $r: invalid

Both of those worked as expected, so we're all set. That was easy, right?


Comments

Loading…

This post was published on November 4th, 2016 by Robert James Reese in PHP. Before using any of the code or other content in this post, you must read and agree to our terms of use.