The Use of Regular Expressions in Python

Regular expressions (regex) are a powerful tool for matching patterns in text. They can be used in Python to search, replace, and manipulate strings. In this blog post, we will explore the basics of regular expressions in Python, and provide some examples of how to use them.

USING REGULAR EXPRESSIONS IN PYTHON

In Python, the re module provides support for regular expressions. The basic steps for using regular expressions in Python are:

  1. Import the re module
  2. Define the pattern you want to match using regular expression syntax
  3. Use the re module to search, replace, or manipulate strings based on the pattern

Here is an example of how to use regular expressions to search for a pattern in a string:

import re
 
text = "My Wi-Fi is awesome!"
pattern = r"Wi-Fi"
 
matches = re.findall(pattern, text)
 
print(matches)

In this example, we import the re module, define the pattern we want to match as “Wi-Fi”, and then use the re.findall() method to find all occurrences of the pattern in the text. The output of this code will be the string “Wi-Fi”.

REGULAR EXPRESSION SYNTAX

Regular expressions use a syntax of special characters to represent patterns. Here are some of the most commonly used characters:

  • . : Matches any character except newline
  • ^ : Matches the start of a string
  • $ : Matches the end of a string
  • *: Matches zero or more occurrences of the preceding character
  • + : Matches one or more occurrences of the preceding character
  • ? : Matches zero or one occurrence of the preceding character
  • {} : Matches the specified number of occurrences of the preceding character
  • [] : Matches any character within the brackets
  • () : Groups characters together

Here is an example of how to use some of these characters to create a pattern:

import re
 
text = "My Wi-Fi is awesome!"
pattern = r"W.+i"
 
matches = re.findall(pattern, text)
 
print(matches)

In this example, we define the pattern as “W.+i”, which means to match any character between “W” and “i”. The output of this code will be the string “Wi-Fi”.

MORE ON REGULAR EXPRESSION SYNTAX

To dive deeper into the syntax of regular expressions, let’s take a closer look at some of the characters we introduced earlier.

The . character matches any character except a newline. This means that it can be used to match any character in a string. For example, the pattern . will match any single character in a string.

The ^ character matches the start of a string. This means that it can be used to match a pattern only if it appears at the start of a string. For example, the pattern ^My will match the string “My Wi-Fi is awesome!” because it starts with the word “My”.

The $ character matches the end of a string. This means that it can be used to match a pattern only if it appears at the end of a string. For example, the pattern awesome!$ will match the string “My Wi-Fi is awesome!” because it ends with the word “awesome!”.

The * character matches zero or more occurrences of the preceding character. This means that it can be used to match a pattern that may or may not appear in a string. For example, the pattern o* will match any string that contains zero or more occurrences of the letter “o”.

The + character matches one or more occurrences of the preceding character. This means that it can be used to match a pattern that appears one or more times in a string. For example, the pattern o+ will match any string that contains one or more occurrences of the letter “o”.

The ? character matches zero or one occurrence of the preceding character. This means that it can be used to match a pattern that may or may not appear in a string, but if it does appear, it appears only once. For example, the pattern the? will match the strings “the” and “thee”.

The {} character matches the specified number of occurrences of the preceding character. This means that it can be used to match a pattern that appears a specific number of times in a string. For example, the pattern o{2} will match any string that contains two occurrences of the letter “o”.

The [] character matches any character within the brackets. This means that it can be used to match a range of characters in a string. For example, the pattern [aeiou] will match any string that contains any of the vowels.

The () character groups characters together. This means that it can be used to group a pattern together, which can be useful when using the | character to match multiple patterns. For example, the pattern (My|is) will match the strings “My” and “is”.

REGULAR EXPRESSION PATTERN CREATION

My favourite tool to create regular expression patterns is https://regexr.com/.

The website allows you to create regular expressions using a variety of syntax options and provides a live preview of the matches as you build your expression. You can also test your regular expressions against sample text and view the results in real-time. The interface is intuitive and easy to use, with options to customize the expression’s flags and input settings.

Additionally, regexr.com provides a handy reference guide to the regular expression syntax and various operators you can use to create complex patterns. There are also community-contributed expressions and helpful articles on best practices and common use cases for regular expressions.

EXAMPLE

In this example, we are going to try to build a regular expression that will help us to parse AP names and retrieve two pieces of information:

  • Site Name
  • AP Number

Here is the list of access points name we are dealing with:

  • HQ-AP001
  • HQ-AP101
  • HQ-AP201
  • B1-AP01
  • B2-AP21

In order to be able to extract the site name and AP number, we will use groups (using parentheses). Here is what the regular expression looks like:

([A-Z0-9]+)-AP(\d{2,3})

Here is what the python code will look like if we are using this pattern in some code:

import re
 
# Define the regular expression pattern
pattern = r"([A-Z0-9]+)-AP(\d{2,3})"
 
# Define the list of access point names to parse
ap_names = ["HQ-AP001", "HQ-AP101", "HQ-AP201", "B1-AP01", "B2-AP21"]
 
# Loop through each access point name and extract the site name and AP number using the regexp
for ap_name in ap_names:
    matches = re.findall(pattern, ap_name)
 
# Extract the site name and AP number from the matches
site_name = matches[0][0]
ap_number = matches[0][1]
 
# Print the results
print(f"Access point name: {ap_name}")
print(f"Site name: {site_name}")
print(f"AP number: {ap_number}")

This script first defines the regular expression pattern as ([A-Z0-9]+)-AP(\d{2,3}) to match the provided access point names in the format of “PREFIX-APNUMBER”.

It then defines a list of access point names to parse, and loops through each name to extract the site name and AP number using the re.findall() method. The results are printed for each access point name in the format of “Access point name: <AP_NAME>, Site name: <SITE_NAME>, AP number: <AP_NUMBER>”.

When run, this script will output the following results:

Access point name: HQ-AP001
Site name: HQ
AP number: 001
Access point name: HQ-AP101
Site name: HQ
AP number: 101
Access point name: HQ-AP201
Site name: HQ
AP number: 201
Access point name: B1-AP01
Site name: B1
AP number: 01
Access point name: B2-AP21
Site name: B2
AP number: 21

CONCLUSION

Regular expressions are a powerful tool for working with strings in Python. By using regular expression syntax, you can search, replace, and manipulate strings based on patterns. With practice, you can become proficient at using regular expressions to solve a wide range of string manipulation tasks. They are quite scary at first but trust me, they can become very useful!

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments