Skip to content

Parser

S = TypeVar('S')

FileParser

Bases: ABC, Generic[S]

Main parser orchestrator for a specific file type.

This abstract base class provides the core functionality for scanning a file to identify state transitions and then parsing the identified regions using appropriate StateParser instances.

Type variables

S: An Enum type representing the possible states a file can be in.

get_parser()

Return parser for specific state, None if no parsing needed.

This method is intended to return the collection of state parsers, not a single parser for a specific state. It might be a placeholder or an incomplete implementation.

RETURNS DESCRIPTION
StateParser[S] | None

The dictionary of StateParser instances, or None if no parsers are configured (though the current implementation always returns the dictionary).

get_scanner()

Return the scanner for this file type.

RETURNS DESCRIPTION
StateScanner

The StateScanner instance configured for this file type.

parse_file(file_path)

Full parse of file.

This abstract method must be implemented by concrete FileParser subclasses to orchestrate the complete parsing process: scanning the file, identifying regions, and then parsing each region using the appropriate StateParser.

PARAMETER DESCRIPTION

file_path

The path to the file to be fully parsed.

TYPE: str

RETURNS DESCRIPTION
ParsedFile

A ParsedFile object containing all extracted data and metadata.

scan_bytes(buf)

Scan a byte buffer to identify state transitions.

This method delegates the scanning operation to the internal StateScanner to find all state changes within a given byte sequence.

PARAMETER DESCRIPTION

buf

The byte buffer to be scanned.

TYPE: bytes

RETURNS DESCRIPTION
list[StateTransition]

A list of StateTransition objects, detailing the start and end positions of each state and the transition patterns.

scan_file(file_path)

Single pass through file to scan for states and their transitions.

Reads the entire file into memory and then scans it for state transitions.

PARAMETER DESCRIPTION

file_path

The path to the file to be scanned.

TYPE: str

RETURNS DESCRIPTION
list[StateTransition]

A list of StateTransition objects, detailing the start and

list[StateTransition]

end positions of each state and the transition patterns.

ParsedFile(file_path, file_type, regions, metadata)

Complete parsed file result.

This dataclass encapsulates all information extracted from a file after a full parsing operation, including its path, type, a list of parsed regions, and any global metadata.

file_path

The absolute or relative path to the file that was parsed.

file_type

A string identifier for the type of file that was parsed (e.g., "amber_v22").

metadata

A dictionary containing any high-level metadata about the parsed file, such as the number of regions found.

regions

A list of ParsedRegion objects, each representing a distinct, parsed section of the file. The order of regions in this list corresponds to their order of appearance in the file.

ParsedRegion(state, byte_range, data)

Container for parsed data from a state region

byte_range

A tuple (start_byte, end_byte) indicating the inclusive start and exclusive end byte positions of this region within the original file.

data

A dictionary containing the parsed data specific to this region's state. The structure and content of this dictionary will vary depending on the StateParser used for this region.

state

The Enum member representing the state of the content from which this region was parsed.

StateParser

Bases: ABC, Generic[S]

Abstract parser for parsing specific state regions.

This abstract base class defines the interface for parsers that are responsible for extracting structured data from a specific byte range corresponding to a particular state within a file.

Type variables

S: An Enum type representing the possible states a file can be in.

parse(data, state)

Parse the bytes for a specific state region.

This method should be implemented by concrete StateParser subclasses to transform raw bytes from a file region into a structured dictionary representation.

PARAMETER DESCRIPTION

data

The bytes corresponding to the specific region of the file that needs to be parsed.

TYPE: bytes

state

The Enum member indicating the current state of the content, which guides how the data should be parsed.

TYPE: S

RETURNS DESCRIPTION
dict[str, Any]

A dictionary containing the structured data extracted from the provided data bytes, relevant to the given state.