Parser
S = TypeVar('S')
¶
FileParser
¶
Bases: ABC, Generic[S]
Main parser orchestrator for a specific file type.
This abstract base class provides the core functionality for scanning
a file to identify state transitions and then parsing the identified
regions using appropriate StateParser instances.
Type variables
S: An Enum type representing the possible states a file can be in.
get_parser()
¶
Return parser for specific state, None if no parsing needed.
This method is intended to return the collection of state parsers, not a single parser for a specific state. It might be a placeholder or an incomplete implementation.
| RETURNS | DESCRIPTION |
|---|---|
StateParser[S] | None
|
The dictionary of |
get_scanner()
¶
Return the scanner for this file type.
| RETURNS | DESCRIPTION |
|---|---|
StateScanner
|
The |
parse_file(file_path)
¶
Full parse of file.
This abstract method must be implemented by concrete FileParser
subclasses to orchestrate the complete parsing process: scanning
the file, identifying regions, and then parsing each region using
the appropriate StateParser.
| PARAMETER | DESCRIPTION |
|---|---|
|
The path to the file to be fully parsed.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ParsedFile
|
A |
scan_bytes(buf)
¶
Scan a byte buffer to identify state transitions.
This method delegates the scanning operation to the internal
StateScanner to find all state changes within a given byte
sequence.
| PARAMETER | DESCRIPTION |
|---|---|
|
The byte buffer to be scanned.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[StateTransition]
|
A list of |
scan_file(file_path)
¶
Single pass through file to scan for states and their transitions.
Reads the entire file into memory and then scans it for state transitions.
| PARAMETER | DESCRIPTION |
|---|---|
|
The path to the file to be scanned.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[StateTransition]
|
A list of |
list[StateTransition]
|
end positions of each state and the transition patterns. |
ParsedFile(file_path, file_type, regions, metadata)
¶
Complete parsed file result.
This dataclass encapsulates all information extracted from a file after a full parsing operation, including its path, type, a list of parsed regions, and any global metadata.
file_path
¶
The absolute or relative path to the file that was parsed.
file_type
¶
A string identifier for the type of file that was parsed (e.g., "amber_v22").
metadata
¶
A dictionary containing any high-level metadata about the parsed file, such as the number of regions found.
regions
¶
A list of ParsedRegion objects, each representing a
distinct, parsed section of the file. The order of regions
in this list corresponds to their order of appearance in the file.
ParsedRegion(state, byte_range, data)
¶
Container for parsed data from a state region
byte_range
¶
A tuple (start_byte, end_byte) indicating the inclusive start and exclusive
end byte positions of this region within the original file.
data
¶
A dictionary containing the parsed data specific to this
region's state. The structure and content of this dictionary
will vary depending on the StateParser used for this region.
state
¶
The Enum member representing the state of the content from which this region was parsed.
StateParser
¶
Bases: ABC, Generic[S]
Abstract parser for parsing specific state regions.
This abstract base class defines the interface for parsers that are responsible for extracting structured data from a specific byte range corresponding to a particular state within a file.
Type variables
S: An Enum type representing the possible states a file can be in.
parse(data, state)
¶
Parse the bytes for a specific state region.
This method should be implemented by concrete StateParser
subclasses to transform raw bytes from a file region into a
structured dictionary representation.
| PARAMETER | DESCRIPTION |
|---|---|
|
The bytes corresponding to the specific region of the file that needs to be parsed.
TYPE:
|
|
The
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
A dictionary containing the structured data extracted from the
provided |