Marius van Witzenburg We fight for our survival, we fight!

2jul/110

Some useful information about PHP RegEx

Posted by mariusvw

This information can be handy with the following PHP functions:
* preg_filter
* preg_grep
* preg_last_error
* preg_match_all
* preg_match
* preg_quote
* preg_replace_callback
* preg_replace
* preg_split

What is a regex?

At its most basic level, a regex can be considered a method of pattern matching or matching patterns within a string. In PHP the most oft used is PCRE or "Perl Compatible Regular Expressions". Here we will try to decypher the meaningless hieroglyphics and set you on your way to a powerful tool for use in your applications. Do not try to understand all this in a single sitting. Instead, take in a little and come back as you grasp various concepts.

Where do I begin?

At the beginning.

Lets create a string.

<?php
// create a string
$string = 'abcdefghijklmnopqrstuvwxyz0123456789';
 
// echo our string
echo $string;
?>

If we simply wanted to see if the pattern 'abc' was within our larger string we could easily do something like this:

<?php
// create a string
$string = 'abcdefghijklmnopqrstuvwxyz0123456789';
 
echo preg_match("/abc/", $string);
?>

The code above will echo '1'. This is because it has found 1 match for the regex. That means it has found the pattern 'abc' once in the larger string. [[http://www.php.net/preg-match|preg_match()]] will count zero if there is no matches, or one if it finds a match. This function will stop searching after the first match. Of course, you would not do this in a real world situation as php has functions for this such as [[http://www.php.net/strpos|strpos()]] and [[http://www.php.net/strstr|strstr()]] which will do this much faster.

Match beginning of a string

Now we wish to see if the string begins with abc. The regex character for beginning is the caret ^. To see if our string begins with abc, we use it like this:

<?php
// create a string
$string = 'abcdefghijklmnopqrstuvwxyz0123456789';
 
// try to match the beginning of the string
if(preg_match("/^abc/", $string)) {
    // if it matches we echo this line
    echo 'The string begins with abc';
} else {
    // if no match is found echo this line
    echo 'No match found';
}
?>

From the code above we see that it echo's the line ''The string begins with abc''. The forward slashes are a delimeter that hold our regex pattern. The quotations are used to 'wrap it all up'. So we see that using the caret(^) will give us the beginning of the string, but NOT whatever is after it.

What if I want case insensitive?

If you used the above code to find the pattern ABC like this:

if(preg_match("/^ABC/", $string))

the script would have returned the message:

No match found

This is because the search is case sensitive. The pattern 'abc' is not the same as 'ABC'. To match both 'abc' and 'ABC' we need to use a modifier to make the search case in-sensitive. With php regex, like most regex, is use 'i' for insensitive. So now our script might look like this:

<?php
// create a string
$string = 'abcdefghijklmnopqrstuvwxyz0123456789';
 
// try to match our pattern
if(preg_match("/^ABC/i", $string)) {
    // echo this is it matches
    echo 'The string begins with abc';
} else {
    // if not match is found echo this line
    echo 'No match found';
}
?>

Now the script will find the pattern abc. It would also match any case in-sensitive combination of abc, ABC, Abc, aBc, and so on.

More on [[http://php.net/manual/en/reference.pcre.pattern.modifiers.php|modifiers]] later.

How do I find a pattern at the end of a string?

This is done in much the same way as with finding a a pattern at the beginning of a string. A common mistake made by many is to use the $ character to match the end of a string. This is incorrect and \z should be used. Consider this..

preg_match("/^foo$/", "foo\n")

This will return true as $ is like \Z which is like (?=\z|\n\z). So when a newline is not wanted, $ should not be used. Also $ will match multiple times with the /m modifier whereas \z will not. Lets make a small change to the code from above by removing the caret(^) at the beginning of the pattern and putting \z at the end of the pattern, we will keep the case in-sensitive modifier in to match any case.

<?php
// create a string
$string = 'abcdefghijklmnopqrstuvwxyz0123456789';
 
// try to match our pattern at the end of the string
if(preg_match("/89\z/i", $string)) {
    // if our pattern matches we echo this 
    echo 'The string ends with 89';
} else {
    // if no match is found we echo this line
    echo 'No match found';
}
?>

The script now will show the line
''The string ends with 89''
because we have matched the end of the string with the pattern 89. Pretty easy stuff so far.

Meta characters

During our first look at regex we did some simple pattern matching. We also introduced the caret(^) and the dollar($). These characters have special meaning. As we saw, the caret(^) matched the beginning of a string and the dollar matched the end of a string. These characters, along with others are called Meta characters. Here is a list of the Meta characters used for regex:
* . (Full stop)
* ^ (Carat)
* * (Asterix)
* + (Plus)
* ? (Question Mark)
* { (Opening curly brace)
* [ (Opening brace)
* ] (Closing brace)
* \ (Backslash)
* | (Pipe)
* ( (Opening parens)
* ) (Closing parens)
* } (Closing curly brace)
We will look at each of these during this tutorial, but it is important that you know what they are. If you wish to search a string that contains one of these characters, eg: "1+1" then you need to escape the the meta character with a backslash like this:

<?php
// create a string
$string = '1+1=2';
 
// try to match our pattern
if(preg_match("/^1\+1/i", $string)) {
    // if the pattern matches we echo this
    echo 'The string begins with 1+1';
} else {
    // if no match is found we echo this line
    echo 'No match found';
}
?>

From the code above you will see the script print:
''The string begins with 1+1''
because it found the pattern 1+1 and ignored or escaped the special meaning of the + symbol. If you were to not escape the meta character and use the regex

preg_match("/^1+1/i", $string)

you would not find a match.
If you are looking for a backslash, you need to escape that also. But, we also need to escape the control character too, which is itself a backslash, hence we need to escape twice like this

\\\\

What do the other Meta characters do?

We have already seen the caret **^** and the dollar **$** in action, so now lets look at the others, beginning with the square braces [ ]. These Meta characters are used for specifying a character class.

A what?

A Character Class. This is just a set of characters you wish to match. They can be listed individually like:

[abcdef]

or as a range seperated by a **-** symbol like:

[a-f]
<?php
// create a string
$string = 'big';
 
// Search for a match
echo preg_match("/b[aoiu]g/", $string, $matches);
?>

Source: http://www.phpro.org/tutorials/Introduction-to-PHP-Regex.html

Geëtiketeerd als: , , , , , Geen reacties
14jun/110

How to replace content in columns with a simple MySQL query

Posted by mariusvw

Actually searching and replacing content in MySQL is quite simple, you just have to know the REPLACE function of MySQL :-)

Here you have a simple example how to do the trick:

UPDATE `table` SET `field` = REPLACE(`field`, 'search_for', 'replace_with');

Simple huh? ;-)

Geëtiketeerd als: , , , Geen reacties
25aug/100

How to fix inconsistent line ending (EOL) style with find and Perl

Posted by mariusvw

If you work with subversion you might get this error when you got files that have been edited on different operating systems like Windows, Linux, FreeBSD or Mac OS X.

Well, the fix is quite simple. You simply replace the wrong line endings with right ones depending of which you want. In my situation I want unix style line endings.

Replace in PHP/JavaScript files:

find ./ -name '*.php' -type f -exec perl -i -wpe 's/rn/n/g' '{}' ;
find ./ -name '*.php' -type f -exec perl -i -wpe 's/r/n/g' '{}' ;
find ./ -name '*.js' -type f -exec perl -i -wpe 's/rn/n/g' '{}' ;
find ./ -name '*.js' -type f -exec perl -i -wpe 's/r/n/g' '{}' ;

In case you want to replace them in multiple file types you can adjust the command. In this example we want to replace in the following file types:

  1. asp
  2. cfm
  3. css
  4. html
  5. js
  6. php
  7. pl
  8. txt

Use the following commands:

find ./ -name '*.asp' -or -name '*.cfm' -or -name '*.css' -or -name '*.html' -or -name '*.js' -or -name '*.php' -or -name '*.pl' -or -name '*.txt' -type f -exec perl -i -wpe 's/rn/n/g' '{}' ;
find ./ -name '*.asp' -or -name '*.cfm' -or -name '*.css' -or -name '*.html' -or -name '*.js' -or -name '*.php' -or -name '*.pl' -or -name '*.txt' -type f -exec perl -i -wpe 's/r/n/g' '{}' ;

Keep in mind, you may use these commands on your own risk. I'm not responsible if you lose your work ;-)

Now you should be able to commit your files again :-)