Posts Tagged ‘Regex’

It is a citation of an article that I liked very much


How to use regular expressions in JavaScript?

After having examined the theoretical basis and syntax of regular expressions, it is important to know how to use this knowledge in productive terms. In fact, although the rules are common, every programming language uses its own routines: in this article we will deal with the implementation of regex in JavaScript and the main methods associated with their management.

The regular expressions in JavaScript are created as simple strings, delimited by forward slash, followed by some special characters, called flags, which modify the search methodology. We might thus have a variable (or more precisely a literal):

var regex = /modelloRegex/flags;

The model can be any regular expression, while for the flags the three characters g, i or m can be used:

  • g: the model is searched in the entire string, instead of stopping after the first occurrence found.
  • i: the search doesn’t keep track of the difference between characters capital/small.
  • m: the search is executed even after the end of the first text line.

A first simple example of a regular expression could be that of the search for a string:

var regex = /Your Inspiration Web/i;

Find in the hypothetical text the string “Your Inspiration Web”, irrespective of eventual differences between capital and small letters. Or we can reuse the regular expression used to verify a date:

//search a date
var regex = "/\d{2}-\d{2}-\d{4}/g";

However, there are some variations in the implementation  of regular expressions in Javascript: that of the metacharacter ‘full stop’, or rather “every character”, is maybe the most relevant. In fact, the character ‘new paragraph’ is not corresponded when flag m is set.  As a remedy a whole class of characters and its negation, often in the form of “[\s\S]“, can be used. But what does it mean? By checking the table of the previous article you can notice that the class ‘\s’ refers to “every spacing character”.  By adding  ‘\S’ which stands for “all characters which are not spaces” practically we are saying “take all spaces and all characters which are not spaces”; in a few words “all characters”.

The one of the full stop is the only absence worth mentioning  for a ‘normal’  use of the regex in Javascript (there are other differences). A question that arises at this moment is: how to test our regular expressions in JavaScript?

The RegExp Object

In reality in JavaScript all regular expressions  (even the examples mentioned above) are an example of the RegExp object. It’s not relevant to this scope knowing  programming with objects: enough bearing in mind that an object is a structure (such as an array) which has properties (the attributes) and behaviors (the methods). The methods and the attributes belong to the object they refer to and the syntax to recall them is the following:

/* this is denominated "dot notation", for the use of the full stop */

The relevant methods on a RegExp object are two: test and exec. The first performs a test, exactly, of a regular expression on a string, verifying that in the latter there is at least a correspondence. The second instead, restores the information at the eventual occurrence found in the string.

With regard to the attributes worth mentioning lastIndex, which contains the position of the next character to be examined by the regex, and source, which contains the string of the regular expression itself.

Before testing these functions, we create a html page for the tests:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
<html xmlns="" xml:lang="it" lang="it">
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  <title>Regular expressions in JavaScript | Your Inspiration Web</title>
<h1>Test page for the regular expressions </h1>
<script type="text/javascript">
/* Here go our tests */

We will write our JavaScript code inside the tag “script” in the bottom of the page.

The test() method

The test method accepts as parameter a string to which apply the regular expression, and has as a return value a boolean: true, if the string corresponds to the regular expression, false otherwise.

As a first example of the test method, let’s search in a string the correspondences of the regex /CaSa/i:

/* Create regular expression non distinguishing between capital and small letters */
var regex = /CaSa/i;
/* Set the string on which to search */
var str = "Home sweet home"
/* if there's a correspondence, display a message */
if( regex.test(str) ){
    /* action to execute if there's a correspondence */
    alert("Correspondence Found");

Let’s analyze step by step this portion of the code. We have:

  1. created our regular expression, in this case a simple string, setting the case-insensitive flag;
  2. initialized the string which will act as search text, in reality this will be represented by the values of a form, or the text taken from a web page;
  3. used the test method in its most congenial environment, an if construction, if there is a correspondence it executes relative action, otherwise proceeds with the next code.

Notice how, after having created the regular expression, we have recalled the “test” method making use of the  dot-notation: recalling “test” without putting before the name of the object will result in an error.

The exec method

Using the same regular expression and the same string, let’s see how the “exec” method functions. The later takes as a parameter a string to which apply the regex, but has as a return value an array containing the found occurrence. Such array has an attribute, index which contains the occurrence position within the string (the character number from the beginning of the string, more precisely).

Let’s see a simple example:

/* Create the regular expression */
var regex = /CaSa/i;
/* Set the string on which to search */
var str = "Oh, Home sweet home"
var resultat = regex.exec(str);
/* print string found as correspondence */
alert(result[0]); //Home
/* print position on the string in which a
 * correspondence was found: starts from 0 */
alert(result.index); //4

The creation of the regular expression and that of the string is identical to the previous one. The first difference is the return value of the method, which is saved in the “result” variable. This variable is an array containing in the first position the occurrence of the regular expression found on the string, while in the “index” attribute is saved the position of the occurrence of the string.

An interesting thing to note is that, by repeating the execution of the “exec” method without specifying the “g” flag, always and only the first occurrence is obtained:

/* Create the regular expression */
var regex = /Home/i;
/* Set the string on which to search */
var str = "Oh, Home sweet home"
var result = regex.exec(str);
/* print string found as correspondence */
alert(result[0]); //Home
/* print position on the string in which a
 * correspondence was found: starts from 0 */
alert(result.index); //4
result = regex.exec(str);
result = regex.exec(str);
result = regex.exec(str);
result = regex.exec(str);
/* the string is always the same */
alert(result[0]); //Home
/* and it's always the first, the one in position 4 */
alert(result.index); //4

In order to allow JavaScript to search successive occurrences of the regex you have to, as emphasized, use the “global” flag in the definition of the regular expression:

/* Create the regular expression: I have specified the g flag */
var regex = /Home/ig;
/* Set the string on which to search */
var str = "Oh, Home sweet home"
var result = regex.exec(str);
/* print string found as correspondence */
alert(result[0]); //Home
/* print the position on the string in which a
 * correspondence has been found: starts from 0 */
alert(result.index); //4
result = regex.exec(str);
/* the string  */
alert(result[0]); //Home
/* this time is the second occurrence, the one in position 15 */
alert(result.index); //15

By specifying the global flag we inform JavaScript that it has to continue the search of the text (a little bit like the “Find next” function of text editors): when there are no longer any occurrences the “exec” method will return the “NULL” value, which we have to manage adequately.

Final considerations

Regular expressions can significantly simplify the management of a web application. It has to be mentioned that the engine of regex requires an extremely high CPU percentage, thus it’s good to use them with wariness, verifying that there are no other faster ways to obtain the same result. For example, wishing to search all lines of a string which end in a semicolon, you might be tempted to use the regular expression:

var regex = /;$/;

This simple regular expression could engage your browser for various seconds (even minutes, in case of very long texts): this because the search is not executed only at the end of the string, instead it starts from the first character, examining them one by one, until the end, and then show the results. But how to improve this search?

Regular expressions are useful when there are no hypothesis on the text: in this case, instead, we only want to know if the last character is a semicolon. Why not execute the test only on the last character then?

var str = "Hello everybody;";
if( str.charAt(str.length - 1) == ";" )
    alert("The string ends in a semicolon");

As you saw I used the charAt method to retrieve the last character and confront it with my search string.

In general thus, when you deal with the search or substitution of simple strings, it could be a good idea to experiment first with the JavaScript methods which operate on the strings (charAt, indexOf, slice, substr, substring), and use regular expressions if it’s impossible to find an alternative solution.