Std
StdRegexEngine()
¶
Bases: ScanEngine[S]
Engine that uses the Python standard library's regex engine.
This ScanEngine implementation leverages Python's re module
to perform pattern matching. It compiles regular expressions from
byte patterns and uses them to find state transitions within text
decoded from byte buffers.
Type variables
S: An Enum type representing the possible states a file can be in.
Initializes the StdRegexEngine.
Sets up internal lists to store uncompiled and compiled regex patterns along with their associated states.
compile(patterns)
¶
Compile regex patterns to speed up searching.
This method takes a dictionary of state-to-pattern mappings, decodes
the byte patterns into strings, and compiles them into re.Pattern
objects. These compiled patterns are stored internally for efficient
streaming.
| PARAMETER | DESCRIPTION |
|---|---|
|
A dictionary mapping an enum representing the states a file could be in to a list of byte patterns (regex) that signify a transition to that state.
TYPE:
|
stream(buf)
¶
Stream (match_start, match_end, pattern, target_state) triples.
Decodes the input byte buffer into a UTF-8 string (ignoring errors) and then uses the compiled regular expressions to find all matches. For each match, it yields the start and end positions, the original byte pattern that matched, and the associated target state.
| PARAMETER | DESCRIPTION |
|---|---|
|
The byte buffer to be scanned.
TYPE:
|
| YIELDS | DESCRIPTION |
|---|---|
tuple[int, int, bytes, S]
|
A tuple containing:
|