Monday, December 24, 2012

Using Regular Expressions with Javascript

Regular expressions are very powerful when you need to search or replace a certain pattern in a String.
In javascript there are two ways to create a RegExp object.

1. Using RegExp constructor
var pattern = new RegExp(expression);
or
var pattern = new RegExp(expression, modifiers);

Example
var pattern = new RegExp("[a-z]+");//case insensitive search - No modifiers
var pattern = new RegExp("[a-z]+", "i");//case sensitive search - With modifier

2. More simple way
var pattern = /expression/modifiers;

Example
var pattern = /[a-z]+/;//case insensitive search - No modifiers
var pattern = /[a-z]+/i;//case sensitive search - With modifiers

✔  Using RegExp constructor or literal regular expression will give you the same result. But the advantage of using the constructor is you can pass a varible as the argument of the constructor. So you can change the RegExp at runtime.

var exp = "[a-z]+";
var pattern = new RegExp(exp);

✔  When it comes to performance the use of literal regular expression is little bit faster than the
use of RegExp constructor.

Followings are the methods that you can use with RegExp.

1. RegExp.test("text")
The RegExp.test() method searches for a given pattern and returns true if a match found and else false.

Example
var matched = new RegExp("[a-z]+").test("abcDefg"); //true
or
var matched = /[a-z]+/.test("abcDefg"); //true

2. "text".match(pattern)
Without the modifier "g" this method returns the first match.
var result = "abcDefg".match(new RegExp("[a-z]+"));  //abc
or
var result = "abcDefg".match(/[a-z]+/);  //abc

You may have noticed that the above method gives you only one result("abc") although there are
two matches("abc" and "efg").
In order to find all matches you should use the "group" modifier "g".

var result = "abcDefg".match(new RegExp("[a-z]+", "g"));//[abc, efg]
or
var result = "abcDefg".match(/[a-z]+/g);  //[abc, efg]

You can search for groups like this
var arr = "abcDDDefg".match(/[a-z](D+)/); //[cDDD,DDD] 
alert(arr[0]); //whole match - cDDD 
alert(arr[1]); //First group - DDD


3. "text".search(pattern)
This method returns the index of the first letter of the match and -1 if not found.
Example
var result = "123abcDe".search(/abc/); //3


4. "text".replace(pattern, "new text")
Find and replace the occurrences and returns the modified string.
Example
var result = "abcDefg".replace(/[a-z]/, "x"); //xbcDefg
If you want to replace all the matches use modifier "g"
var result = "abcDefg".replace(/[a-z]/g, "x"); //xxxDxxx


5. "text".split(pattern)
Split the given text at the places where the pattern matches and return an array which contains the pieces of the text.
Example
var arr = "regular expressions are smart".split(/\s/);//[regular,expressions,are,smart]


6. RegExp.exec("text")
The exec() method is little bit complicated than the others. That is why I am discussing it here after
the others.
If a match found, this method returns an array which has the matched text as the first item. The other elements of the array contains one item for each capturing group that matched. If no matches found it returns null.
Not only that, the exec() method updates the properties of the regular expression object when it is called.
Yes, I know this is little bit ambiguous. You will understand this properly after looking at following examples.

Example
var pattern = /a(b+)(c)/i;
var text = "aabBBcdeabcde";
var result1 = pattern.exec(text);
var result2 = pattern.exec(text);

alert(result1);//abBBc,bBB,c
alert(result2);//abBBc,bBB,c

Here I have called the exec() method twice on the same RegExp object. Both have given the same result.
Observe the array it has returned.

First element  : This is the first whole match it found.
Second element : This is the first match it found for the group "(b+)"
Third element  : This is the first match it found for the group "(c)"

Obviously there are more matches and they have been ignored.

In order to retrive the ignored matches, I add the modifier "g" to the RegExp.

var pattern = /a(b+)(c)/ig;
var text = "aabBBcdeabcde";
var result1 = pattern.exec(text);
var result2 = pattern.exec(text);

alert(result1);//abBBc,bBB,c
alert(result2);//abc,b,c

See the results. Now result1 and result2 are not the same. result1 has not changed. But result2 has
given a different result.
The elements of the array "result2" can be described as below.

First element  : This is the second whole match found.
Second element : This is the second match found for the group "(b+)"
Third element  : This is the second match found for the group "(c)"

Now you can understand something important about exec() method. When you call the same method twise,
it has given you two different results. When you call it first time it returns the first matches and at
your second call it returns second matches.
Yes, really...!!! The exec() method with the modifier "g" is ideal for iterations.

var pattern = /a(b+)(c)/ig;
var text = "aabBBcdeabcde";
while(result = pattern.exec(text)) {
   alert(result);
}

Those are the methods that you meet when working with regular expressions in javascript.

Another common requirement you meet when working with Strings and Regular expressions is to replace the matched groups with different texts.

For an example lets assume that you have the following text which contains the name and the age of
the student combined.

var text = "Tom15";
Assume you want to display this information like this.
    "The age of Tom is 15."

This is simple with regular expressions.

var text = "Tom15";
var pattern = /([a-z]+)(\d+)/ig;
var sentence = "The age of $1 is $2.";
var result = text.replace(pattern, sentence);
alert(result);//The age of Tom is 15.

Yes, it is such simple.
Note that $1 and $2 represents the group1 and group2 respectively.

No comments:

Post a Comment