Skip to content

Std

StdRegexEngine()

Bases: ScanEngine[S]

Engine that uses the Python standard library's regex engine.

This ScanEngine implementation leverages Python's re module to perform pattern matching. It compiles regular expressions from byte patterns and uses them to find state transitions within text decoded from byte buffers.

Type variables

S: An Enum type representing the possible states a file can be in.

Initializes the StdRegexEngine.

Sets up internal lists to store uncompiled and compiled regex patterns along with their associated states.

compile(patterns)

Compile regex patterns to speed up searching.

This method takes a dictionary of state-to-pattern mappings, decodes the byte patterns into strings, and compiles them into re.Pattern objects. These compiled patterns are stored internally for efficient streaming.

PARAMETER DESCRIPTION

patterns

A dictionary mapping an enum representing the states a file could be in to a list of byte patterns (regex) that signify a transition to that state.

TYPE: dict[S, list[bytes]]

stream(buf)

Stream (match_start, match_end, pattern, target_state) triples.

Decodes the input byte buffer into a UTF-8 string (ignoring errors) and then uses the compiled regular expressions to find all matches. For each match, it yields the start and end positions, the original byte pattern that matched, and the associated target state.

PARAMETER DESCRIPTION

buf

The byte buffer to be scanned.

TYPE: bytes

YIELDS DESCRIPTION
tuple[int, int, bytes, S]

A tuple containing:

  • int: The inclusive starting byte offset of the match.
  • int: The exclusive ending byte offset of the match.
  • bytes: The original byte pattern that was matched.
  • S: The target state associated with the matched pattern.