Easy way to use regular expressions with Python.

REGEX in action.


Regular Expressions or “regex” are way of text data processing which is used to match and filter strings of text such as particular characters, words, or patterns of characters. It means that we can match and extract any string/text pattern from the text data with the help of regular expressions.

Regular expressions with Python.



The general use of RE is form validation, such as email validation, password validation, phone number extraction and many other common form fields. Regex use cases can vary from very simple to extremely complex and building complex regular expressions is a skill that you learn only by practice. The Python module re provides full support for regular expressions tasks.

Meta characters:



. Matches any single character
\ Escapes one of the meta characters to treat it as a regular character
[...] Matches a single character or a range that is contained within brackets
_- -_ order does not matter but without brackets order does matter
+ Matches the preeceding element one or more times
? Matches the preeceding pattern element zero or one time
* Matches the preeceding element zero or more times
{m,n} Matches the preeceding element at least m and not more than n times
^ Matches the beginning of a line or string
$ Matches the end of a line or string
[^...] Matches a single character or a range that is not contained within the brackets
?:...|..."Or" operator
() Matches an optional expression



Extracting emails from string:



import re

text = 'To contact my wonderful jokes site please use andrey@python-code.pro instead of example@python-code.pro email address'

pattern = re.compile("[^ ]+@[^ ]+.[a-z]+")
matches = pattern.findall(text)
matches 

OUT: ['andrey@python-code.pro', 'example@python-code.pro']


Extracting URLs from text file:


The task here only extract .net URLs.



import re

with open('urlsintext.txt', 'r') as file:
    content = file.read()

pattern = re.compile("https?://(?:www.)?[^ \n]+\.net")
matches = pattern.findall(content)
matches

OUT: ['https://python-code.pro',
'http://www.python-code.pro',
'http://stupidname.net']



Extracting IP addresses from text file:


Additional condition here to get addresses containing 33 in the beginning of a third part.



with open('ipaddresses.txt', 'r') as file:
    content = file.read()

import re

pattern = re.compile("[0-9]{3}\.[0-9]{3}\.33[0-9]{1}\.[0-9]{3}")
matches = pattern.findall(content)
matches    

OUT: ['912.121.330.123', '912.121.339.123']


Extracting filenames according to pattern:


Additional condition here to get bills for january 1-20.



from pathlib import Path 

root_dir = Path('files')
filenames = root_dir.iterdir()
filenames_str = [filename.name for filename in filenames]

import re

pattern = re.compile("jan[a-z]*-(?:[1-9]|1[0-9]|20).txt", re.IGNORECASE)
matches = [filename for filename in filenames_str if pattern.findall(filename)]
matches   

OUT: ['Jan-12.txt', 'bill_Jan-13.txt', 'january-14.txt']





See also related topics: