Introduction to regular expressions

Basic idea

A regular expression is a notation to represent standards in strings. It serves to validate data entries or to search and extract information in texts

It is just a series of characters which define an abstract search pattern. In a nutshell, most search engines perform such functions as search for specific words, phrases, and webpages, or find and replace operations on text. Essentially, it is a mathematical technique developed by formal language and computer science experts.

The process of generating the patterns can be a complex algorithm based on mathematical principles. A Regex is simply a string containing one or more repetitions of a particular set of characters, such as ‘. ‘,’..’ or ‘,’. In order to use a regular expression, you first need to determine what to search for. For example, to look for the word ‘cat’, you would input: “cat” followed by a space.

A Regex also has several sub-patterns that can be searched by using the regular expression. These sub-patterns are commonly known as ‘matches’. The search is done by matching each pattern to a preceding one in the Regex, then applying the match to the current pattern. This allows you to find and replace the parts of a string that the whole pattern will match. The end result will always be matched exactly, as opposed to a normal search, which may leave out many of the matches.

Some examples

See some examples with brief explanations for a general idea:

{5}- {3}
The pattern of a postcode as 05432-001: 5 digits, a – (hyphen) and 3 more digits. The sequence \d is a metacharacter, a wild card that matches one digit (0 to 9). The sequence {5} is a quantifier: it indicates that the previous pattern must be repeated 5 times, so {5} is the same as {d}.

[012]\d:[0-5]\d

Similar to the hours and minutes format, like 03:10 or 23:59. The sequence in brackets [012] defines a set. In this case, the set specifies that the first character should be 0, 1 or 2. Within the [] the hyphen indicates a character range, i.e. [0-5] is an abbreviated form for the set [012345]; the set representing all digits, [0-9] is the same as \d. Note that this regular expression also accepts the text 29:00 which is not a valid time (valid times will be the theme of one of the Exercises).

[A-Z]{3}- {d{4}

It is the pattern of a number plate in Brazil: three letters from A to Z is followed by a – (hyphen) followed by four digits, like CKD-4592.

About the “” signs used in this text

When describing in a generic way some part of the syntax of regular expressions we use in this document the symbols “”, to indicate a part that must be provided by the user.

For example, the reference to a group has the syntax \”n” where “n” is the number of the group to be retrieved. The “” signs are not part of the syntax, so the reference to the third group is written as Ê3.

Similarly, the moderate quantifier syntax is “q”?, where “q” is any quantifier, such as * in *? or {1,3}?.

For example, to check if a given data is a number from 0.00 to 9.99 you can use the regular expression \d,\d, because the symbol \d is a wild card that matches a digit.

The verb marry here is being used translation for match, in the sense of combine, fit, stop. We say that the expression \d,\d marries with 1,23 but it doesn’t marry with 123 (the comma is missing) nor with 1,2c (“c” doesn’t marry with \d, because it is not a digit).

The term in English is regular expression where the abbreviations regex and re (the name of the Python module) come from. In computer science, the term has a very specific meaning (see regular expression in the Glossary).

Online tools

In order for your search to be successful, you need to make sure that the Regex you use can be found easily with any of the many online tools available. The best way to do this is by using a tool which searches the most popular search engines in order to find out what kind of Regex is being used on the web today. Once you have learned what the most common ones are, you will be able to find and use them more effectively.

To begin your search, you must first find a reliable source that allows you to run Regex searches using their free Regex search box, and then you can use the code to find your own Regex. There are a variety of resources on the internet that can provide you with a Regex.

If you do not find the search engine you want at a particular site, do not be afraid to search for it yourself. A simple search of Google can turn up some good results. There are also several search engines that can provide you with a variety of search parameters that will allow you to do a Regex search on many websites.

Another option is to use a website that allows you to input a series of strings in an attempt to create a Regex from scratch. The results of this are less reliable, but you can still get some good information from these sites.

Finally, there is the option to pay for a Regex Search Engine. You will find a wide variety of these, and they are usually a lot less expensive than the free options.

Once you have learned what kind of Regex is being used on the internet, you will be able to do a much better job of finding the tool that works best for you. Once you have found the tool that you are looking for, and it works well with your specific set of patterns, you will be ready to perform your own search with Regex.

Leave a Reply