The data contract¶
binary-ensemble is intentionally small at the storage layer: it stores integer
assignments, optional graph assets, and optional metadata. The important part is making sure
those pieces agree. This page spells out the contract that every encoder, decoder, and
converter assumes.
Assignment shape¶
An assignment is a plain sequence of integers:
assignment = [1, 1, 2, 2, 3, 3]
The list position is the graph node position. The value is the district id assigned to that node.
Rule |
Why it matters |
|---|---|
Every assignment in one stream must have the same length. |
A stream represents one ensemble over one fixed dual graph. |
Values must be district ids in |
The binary format stores district ids compactly. |
The order must match the graph order you intend to use when reading. |
BEN cannot infer geographic meaning from the values alone. |
Missing nodes are not represented. |
Use one entry per graph node, even for islands or zero-population units. |
from binary_ensemble import BendlDecoder
decoder = BendlDecoder("ensemble.bendl")
graph = decoder.read_graph()
first_assignment = next(iter(decoder))
assert graph is None or len(first_assignment) == graph.number_of_nodes()
Node order¶
Node order is the most important invariant in the system. Assignment index i means “the
node at position i in the dual graph.” If you change the graph order without changing the
assignment order, the file still decodes successfully but describes the wrong plans.
Bundles are the recommended default because they keep the graph and assignment stream together:
from binary_ensemble import BendlDecoder
decoder = BendlDecoder("ensemble.bendl")
graph = decoder.read_graph() # NetworkX graph in assignment order, or None
metadata = decoder.read_metadata()
When BendlEncoder.add_graph(..., sort="mlc"), sort="rcm", or sort="key" reorders a
graph, it returns the reordered graph. Write assignments in that returned order, and store
the permutation map the bundle creates for you.
import networkx as nx
from binary_ensemble import BendlEncoder
graph = nx.convert_node_labels_to_integers(nx.path_graph(4))
adjacency = nx.adjacency_data(graph)
encoder = BendlEncoder("ordered.bendl", overwrite=True)
ordered_graph = encoder.add_graph(adjacency, sort="rcm")
write_order = list(ordered_graph.nodes)
assert len(write_order) == 4
with encoder.ben_stream() as ensemble:
ensemble.write([1, 1, 2, 2])
Warning
Reordering the graph is lossless, but it is not cosmetic. Once you choose an order, every assignment written to that stream must use that exact order.
JSONL input¶
The whole-file codec helpers expect JSON Lines with one JSON object per line and an
assignment field:
{"assignment": [1, 1, 2, 2], "sample": 1}
{"assignment": [1, 2, 2, 2], "sample": 2}
Extra fields such as sample, scores, or metadata can be present in the input JSONL, but
only the assignment stream is encoded into .ben or .xben. If you need graph metadata,
sampler settings, or scores to travel with the file, put the stream in a .bendl file and
attach those payloads as assets.
from binary_ensemble import encode_jsonl_to_ben
encode_jsonl_to_ben("plans.jsonl", "plans.ben", overwrite=True)
Bundle assets¶
A .bendl file can carry well-known assets and custom assets:
Asset |
Reader helper |
Typical payload |
|---|---|---|
|
|
NetworkX adjacency JSON |
|
|
Sampler name, seed, date, chain settings |
|
|
Reversible old-to-new node order map |
Custom JSON/text/binary asset |
|
Scores, notes, provenance, run manifests, geometry blobs |
from binary_ensemble import BendlDecoder
decoder = BendlDecoder("ensemble.bendl")
for asset in decoder.list_assets():
print(asset["name"], asset["type"], asset["flags"])
Variants and formats¶
The data contract is independent of the stream variant. standard, mkv_chain, and
twodelta all decode to the same thing: a sequence of list[int] assignments. Choose the
variant for compression behavior, not for downstream semantics.
Likewise, .ben, .xben, and .bendl carry the same assignment stream at different
packaging layers:
Format |
Carries assignments |
Carries graph/assets |
Best use |
|---|---|---|---|
|
yes |
no |
Active work, fast streaming and subsampling |
|
yes |
no |
Small plain-stream archive |
|
yes |
yes |
Recommended working bundle |
|
yes |
yes |
Recommended share/archive bundle |
Validation checklist¶
Before you encode a real ensemble, check these points:
Decide the node order once, before the first sample is written.
If you reorder a graph, run the sampler or assignment extraction in the reordered graph’s node order.
Keep assignment length constant for the whole stream.
Store the graph in the bundle unless every reader already has the exact matching graph.
Store sampler settings, random seed, scoring definitions, and provenance as metadata or custom assets.
Use BEN while sampling and iterating; recompress to XBEN when the file is ready to share.