Definition
A specialized domain-specific language (DSL) for describing text patterns, implemented in Python via the re module.
Why It Matters
Without regex, finding patterns in unstructured data requires writing hundreds of lines of fragile procedural code. It is the difference between an afternoon of manual data entry and a one-second automated script, making it the ‘superpower’ of data processing.
Core Concepts
- The
reWorkflow:re.compile(r'pattern'): Creates a Regex object (use raw strings).mo = regex.search('text'): Returns a Match Object (the first match).mo.group(): Extracts the matched text.
- Special Characters:
?(Optional),*(Zero or more),+(One or more).{n,m}(Repetition),|(Pipe/OR)..(Wildcard),^(Starts with),$(Ends with).
- Greedy vs. Non-greedy: Python’s regexes are greedy by default (match the longest possible string). Add a
?after a quantifier (e.g.,*?or{n,m}?) to make it non-greedy (match the shortest). - findall() Nuance:
- If no groups: Returns a list of strings.
- If groups exist: Returns a list of tuples of strings.
- Substitution:
regex.sub('NEW', 'text')replaces matches with a new string. Use\1,\2for backreferences to groups. - Example Usage:
import re
# Compile a pattern for a US phone number (e.g., 415-555-1011)
phone_regex = re.compile(r'\d{3}-\d{3}-\d{4}')
# Search for the pattern in a string
message = 'Call me at 415-555-1011 tomorrow.'
match = phone_regex.search(message)
if match:
print(f"Found number: {match.group()}")
- Flags:
re.IGNORECASE,re.DOTALL(dot matches newlines),re.VERBOSE(allows comments/whitespace). Combine with|(e.g.,re.I | re.V).