pyben ===== .. py:module:: pyben Classes ------- .. autoapisummary:: pyben.PyBenDecoder pyben.PyBenEncoder Functions --------- .. autoapisummary:: pyben.compress_jsonl_to_ben pyben.compress_ben_to_xben pyben.compress_jsonl_to_xben pyben.decompress_ben_to_jsonl pyben.decompress_xben_to_jsonl pyben.decompress_xben_to_ben Package Contents ---------------- .. py:class:: PyBenDecoder(file_path: str | pathlib.Path, mode: Literal['ben', 'xben'] = 'ben') Iterator over assignments in a BEN or XBEN file. Open a decoder over a BEN (`.ben`) or XBEN (`.xben`) file. :param file_path: Path to the input file. :param mode: Select container format. :type mode: {"ben", "xben"}, default "ben" :raises OSError: If the file cannot be opened. :raises Exception: If the underlying decoder fails to initialize. .. py:method:: subsample_indices(indices: Iterable[int]) -> PyBenDecoder Keep only the given **1-based** sample indices. Duplicates are ignored and order is irrelevant; the set is sorted & deduped internally. Returns the same decoder (fluent API). :param indices: Iterable of 1-based sample indices to keep. :returns: The same decoder (fluent API). :rtype: PyBenDecoder .. py:method:: subsample_range(start: int, end: int) -> PyBenDecoder Keep only samples in the inclusive **1-based** range [start, end]. :param start: 1-based index of the first sample to keep. :param end: 1-based index of the last sample to keep. :returns: The same decoder (fluent API). :rtype: PyBenDecoder .. py:method:: subsample_every(step: int, offset: int = 1) -> PyBenDecoder Keep every `step`-th sample starting at **1-based** `offset`. Returns the same decoder (fluent API). :param step: Step size (keep every `step`-th sample). :param offset: 1-based index of the first sample to keep (default: 1). :returns: The same decoder (fluent API). :rtype: PyBenDecoder .. py:class:: PyBenEncoder(file_path: str | pathlib.Path, overwrite: bool = False, variant: Literal['standard', 'mkv_chain'] | None = None) Encoder for Binary Ensemble (.ben) files. The encoder supports writing assignments to a BEN file using a context manager and the `write` method. .. rubric:: Example .. code-block:: python from pyben import PyBenEncoder assignments = [ [1, 2, 1, 1, 2, 2], [2, 1, 1, 2, 2, 1], [1, 1, 2, 1, 2, 2], ] with PyBenEncoder("output.ben", overwrite=True) as encoder: for assignment in assignments: encoder.write(assignment) .. py:method:: write(assignment: list[int]) -> None Write a single assignment to the BEN file. :param assignment: List of integers representing the assignment. .. py:method:: close() -> None Closes the encoder and the underlying file. Also handles flushing any buffered data. .. py:function:: compress_jsonl_to_ben(in_file: str | pathlib.Path, out_file: str | pathlib.Path, overwrite: bool = False, variant: Literal['standard', 'mkv_chain'] | None = None) -> None Converts a JSONL file to a BEN file. :param in_file: Path to the input JSONL file. :param out_file: Path to the output BEN file. :param overwrite: Whether to overwrite the output file if it exists. Defaults to False. :param variant: Select BEN variant. If None, defaults to "markov" (equivalent to "mkv_chain"). :type variant: {"standard", "markov"}, optional :raises OSError: If the input file cannot be opened or the output file cannot be created. :raises ValueError: If the input file is not a valid JSONL file or if the variant cannot be inferred. .. py:function:: compress_ben_to_xben(in_file: str | pathlib.Path, out_file: str | pathlib.Path, overwrite: bool = False, n_threads: int | None = None, compression_level: int | None = None) -> None Converts a BEN file to an XBEN file. :param in_file: Path to the input BEN file. :param out_file: Path to the output XBEN file. :param overwrite: Whether to overwrite the output file if it exists. Defaults to False. :param n_threads: Number of threads to use for compression. If None, defaults to the number of CPU cores. :param compression_level: Compression level to use for LZMA compression (0-9). If None, defaults to 9 (highest). :raises OSError: If the input file cannot be opened or the output file cannot be created. .. py:function:: compress_jsonl_to_xben(in_file: str | pathlib.Path, out_file: str | pathlib.Path, overwrite: bool = False, variant: Literal['standard', 'mkv_chain'] | None = None, n_threads: int | None = None, compression_level: int | None = None) -> None Converts a JSONL file to an XBEN file. :param in_file: Path to the input JSONL file. :param out_file: Path to the output XBEN file. :param overwrite: Whether to overwrite the output file if it exists. Defaults to False. :param variant: Select BEN variant. If None, defaults to "markov" (equivalent to "mkv_chain"). :type variant: {"standard", "markov"}, optional :param n_threads: Number of threads to use for compression. If None, defaults to the number of CPU cores. :param compression_level: Compression level to use for LZMA compression (0-9). If None, defaults to 9 (highest). :raises OSError: If the input file cannot be opened or the output file cannot be created. :raises ValueError: If the input file is not a valid JSONL file or if the variant cannot be inferred. .. py:function:: decompress_ben_to_jsonl(in_file: str | pathlib.Path, out_file: str | pathlib.Path, overwrite: bool = False) -> None Converts a BEN file to a JSONL file. :param in_file: Path to the input BEN file. :param out_file: Path to the output JSONL file. :param overwrite: Whether to overwrite the output file if it exists. Defaults to False. :raises OSError: If the input file cannot be opened or the output file cannot be created. .. py:function:: decompress_xben_to_jsonl(in_file: str | pathlib.Path, out_file: str | pathlib.Path, overwrite: bool = False) -> None Converts an XBEN file to a JSONL file. :param in_file: Path to the input XBEN file. :param out_file: Path to the output JSONL file. :param overwrite: Whether to overwrite the output file if it exists. Defaults to False. :raises OSError: If the input file cannot be opened or the output file cannot be created. .. py:function:: decompress_xben_to_ben(in_file: str | pathlib.Path, out_file: str | pathlib.Path, overwrite: bool = False) -> None Converts an XBEN file to a BEN file. :param in_file: Path to the input XBEN file. :param out_file: Path to the output BEN file. :param overwrite: Whether to overwrite the output file if it exists. Defaults to False. :raises OSError: If the input file cannot be opened or the output file cannot be created.