Unleash the Magic of Regular Expressions with Python.

REGEX in action.


Regular Expressions or “regex” are way of text data processing which is used to match and filter strings of text such as particular characters, words, or patterns of characters. It means that we can match and extract any string/text pattern from the text data with the help of regular expressions.

Regular expressions with Python.
Regular expressions with Python meme.

Python Knowledge Base: Make coding great again.
- Updated: 2024-12-01 by Andrey BRATUS, Senior Data Analyst.




    The general use of RE is form validation, such as email validation, password validation, phone number extraction and many other common form fields. Regex use cases can vary from very simple to extremely complex and building complex regular expressions is a skill that you learn only by practice. The Python module re provides full support for regular expressions tasks.



  1. Meta characters:


  2. . Matches any single character
    \ Escapes one of the meta characters to treat it as a regular character
    [...] Matches a single character or a range that is contained within brackets
    _- -_ order does not matter but without brackets order does matter
    + Matches the preeceding element one or more times
    ? Matches the preeceding pattern element zero or one time
    * Matches the preeceding element zero or more times
    {m,n} Matches the preeceding element at least m and not more than n times
    ^ Matches the beginning of a line or string
    $ Matches the end of a line or string
    [^...] Matches a single character or a range that is not contained within the brackets
    ?:...|..."Or" operator
    () Matches an optional expression



  3. Extracting emails from string:


  4. 
    import re
    
    text = 'To contact my wonderful jokes site please use andrey@python-code.pro instead of example@python-code.pro email address'
    
    pattern = re.compile("[^ ]+@[^ ]+.[a-z]+")
    matches = pattern.findall(text)
    matches 
    

    OUT: ['andrey@python-code.pro', 'example@python-code.pro']


  5. Extracting URLs from text file:


  6. The task here only extract .net URLs.


    
    import re
    
    with open('urlsintext.txt', 'r') as file:
        content = file.read()
    
    pattern = re.compile("https?://(?:www.)?[^ \n]+\.net")
    matches = pattern.findall(content)
    matches
    

    OUT: ['https://python-code.pro',
    'http://www.python-code.pro',
    'http://stupidname.net']



  7. Extracting IP addresses from text file:


  8. Additional condition here to get addresses containing 33 in the beginning of a third part.


    
    with open('ipaddresses.txt', 'r') as file:
        content = file.read()
    
    import re
    
    pattern = re.compile("[0-9]{3}\.[0-9]{3}\.33[0-9]{1}\.[0-9]{3}")
    matches = pattern.findall(content)
    matches    
    

    OUT: ['912.121.330.123', '912.121.339.123']


  9. Extracting filenames according to pattern:


  10. Additional condition here to get bills for january 1-20.


    
    from pathlib import Path 
    
    root_dir = Path('files')
    filenames = root_dir.iterdir()
    filenames_str = [filename.name for filename in filenames]
    
    import re
    
    pattern = re.compile("jan[a-z]*-(?:[1-9]|1[0-9]|20).txt", re.IGNORECASE)
    matches = [filename for filename in filenames_str if pattern.findall(filename)]
    matches   
    

    OUT: ['Jan-12.txt', 'bill_Jan-13.txt', 'january-14.txt']





See also related topics: