pyben
Classes
Iterator over assignments in a BEN or XBEN file. |
|
Encoder for Binary Ensemble (.ben) files. |
Functions
|
Converts a JSONL file to a BEN file. |
|
Converts a BEN file to an XBEN file. |
|
Converts a JSONL file to an XBEN file. |
|
Converts a BEN file to a JSONL file. |
|
Converts an XBEN file to a JSONL file. |
|
Converts an XBEN file to a BEN file. |
Package Contents
- class pyben.PyBenDecoder(file_path: str | pathlib.Path, mode: Literal['ben', 'xben'] = 'ben')
Iterator over assignments in a BEN or XBEN file. Open a decoder over a BEN (.ben) or XBEN (.xben) file.
- Parameters:
file_path – Path to the input file.
mode ({"ben", "xben"}, default "ben") – Select container format.
- Raises:
OSError – If the file cannot be opened.
Exception – If the underlying decoder fails to initialize.
- subsample_indices(indices: Iterable[int]) PyBenDecoder
Keep only the given 1-based sample indices.
Duplicates are ignored and order is irrelevant; the set is sorted & deduped internally. Returns the same decoder (fluent API).
- Parameters:
indices – Iterable of 1-based sample indices to keep.
- Returns:
The same decoder (fluent API).
- Return type:
- subsample_range(start: int, end: int) PyBenDecoder
Keep only samples in the inclusive 1-based range [start, end].
- Parameters:
start – 1-based index of the first sample to keep.
end – 1-based index of the last sample to keep.
- Returns:
The same decoder (fluent API).
- Return type:
- subsample_every(step: int, offset: int = 1) PyBenDecoder
Keep every step-th sample starting at 1-based offset. Returns the same decoder (fluent API).
- Parameters:
step – Step size (keep every step-th sample).
offset – 1-based index of the first sample to keep (default: 1).
- Returns:
The same decoder (fluent API).
- Return type:
- class pyben.PyBenEncoder(file_path: str | pathlib.Path, overwrite: bool = False, variant: Literal['standard', 'mkv_chain'] | None = None)
Encoder for Binary Ensemble (.ben) files.
The encoder supports writing assignments to a BEN file using a context manager and the write method.
Example
from pyben import PyBenEncoder assignments = [ [1, 2, 1, 1, 2, 2], [2, 1, 1, 2, 2, 1], [1, 1, 2, 1, 2, 2], ] with PyBenEncoder("output.ben", overwrite=True) as encoder: for assignment in assignments: encoder.write(assignment)
- write(assignment: list[int]) None
Write a single assignment to the BEN file.
- Parameters:
assignment – List of integers representing the assignment.
- close() None
Closes the encoder and the underlying file.
Also handles flushing any buffered data.
- pyben.compress_jsonl_to_ben(in_file: str | pathlib.Path, out_file: str | pathlib.Path, overwrite: bool = False, variant: Literal['standard', 'mkv_chain'] | None = None) None
Converts a JSONL file to a BEN file.
- Parameters:
in_file – Path to the input JSONL file.
out_file – Path to the output BEN file.
overwrite – Whether to overwrite the output file if it exists. Defaults to False.
variant ({"standard", "markov"}, optional) – Select BEN variant. If None, defaults to “markov” (equivalent to “mkv_chain”).
- Raises:
OSError – If the input file cannot be opened or the output file cannot be created.
ValueError – If the input file is not a valid JSONL file or if the variant cannot be inferred.
- pyben.compress_ben_to_xben(in_file: str | pathlib.Path, out_file: str | pathlib.Path, overwrite: bool = False, n_threads: int | None = None, compression_level: int | None = None) None
Converts a BEN file to an XBEN file.
- Parameters:
in_file – Path to the input BEN file.
out_file – Path to the output XBEN file.
overwrite – Whether to overwrite the output file if it exists. Defaults to False.
n_threads – Number of threads to use for compression. If None, defaults to the number of CPU cores.
compression_level – Compression level to use for LZMA compression (0-9). If None, defaults to 9 (highest).
- Raises:
OSError – If the input file cannot be opened or the output file cannot be created.
- pyben.compress_jsonl_to_xben(in_file: str | pathlib.Path, out_file: str | pathlib.Path, overwrite: bool = False, variant: Literal['standard', 'mkv_chain'] | None = None, n_threads: int | None = None, compression_level: int | None = None) None
Converts a JSONL file to an XBEN file.
- Parameters:
in_file – Path to the input JSONL file.
out_file – Path to the output XBEN file.
overwrite – Whether to overwrite the output file if it exists. Defaults to False.
variant ({"standard", "markov"}, optional) – Select BEN variant. If None, defaults to “markov” (equivalent to “mkv_chain”).
n_threads – Number of threads to use for compression. If None, defaults to the number of CPU cores.
compression_level – Compression level to use for LZMA compression (0-9). If None, defaults to 9 (highest).
- Raises:
OSError – If the input file cannot be opened or the output file cannot be created.
ValueError – If the input file is not a valid JSONL file or if the variant cannot be inferred.
- pyben.decompress_ben_to_jsonl(in_file: str | pathlib.Path, out_file: str | pathlib.Path, overwrite: bool = False) None
Converts a BEN file to a JSONL file.
- Parameters:
in_file – Path to the input BEN file.
out_file – Path to the output JSONL file.
overwrite – Whether to overwrite the output file if it exists. Defaults to False.
- Raises:
OSError – If the input file cannot be opened or the output file cannot be created.
- pyben.decompress_xben_to_jsonl(in_file: str | pathlib.Path, out_file: str | pathlib.Path, overwrite: bool = False) None
Converts an XBEN file to a JSONL file.
- Parameters:
in_file – Path to the input XBEN file.
out_file – Path to the output JSONL file.
overwrite – Whether to overwrite the output file if it exists. Defaults to False.
- Raises:
OSError – If the input file cannot be opened or the output file cannot be created.
- pyben.decompress_xben_to_ben(in_file: str | pathlib.Path, out_file: str | pathlib.Path, overwrite: bool = False) None
Converts an XBEN file to a BEN file.
- Parameters:
in_file – Path to the input XBEN file.
out_file – Path to the output BEN file.
overwrite – Whether to overwrite the output file if it exists. Defaults to False.
- Raises:
OSError – If the input file cannot be opened or the output file cannot be created.