Binary Data Files
Here, I will regularly publish pre-compiled .bin files for zelph that you can load and use directly. These files contain prepared semantic networks, mainly based on Wikidata data, but also for other domains. The focus is on efficiency: Compared to JSON files (which can take hours to read), .bin files load in just a few minutes (depending on hardware).
I plan to regularly upload new .bin files based on the current Wikidata dumps (see Wikidata Dumps for transparency), but also for other data sources in the future.
Available Files
All .bin files are available on Hugging Face.
Currently, I offer the following Wikidata variants:
wikidata-20251222-all.bin: The full Wikidata dump, serialized for fast loading. Requires about 210 GB RAM.wikidata-20251222-pruned.bin: A reduced version of the full Wikidata dump, optimized for lower RAM requirements. This enables users with limited hardware to work with zelph and Wikidata – outside of the removed knowledge domains. Requires about 16 GB RAM.
Using the Files
To load a .bin file in zelph, start zelph in interactive mode and use the command:
.load /path/to/file.bin
This loads the network directly into memory. Afterward, you can execute queries, define rules, or start inferences (e.g., with .run). For Wikidata-specific work, first load the script wikidata.zph (see Wikidata Integration) and adjust the language:
.import sample_scripts/wikidata.zph
.lang wikidata
Tip: If you work with the full JSON file, zelph automatically creates a .bin cache file on the first import to speed up future runs.
Generation of wikidata-20251222-pruned.bin
The file wikidata-20251222-pruned.bin was created by systematically pruning (removing) large knowledge domains from the full wikidata-20251222-all.json (approx. 1.7 TB). The goal was to reduce biological, chemical, astronomical, and geographical domains to lower the RAM requirement without losing the core data. The process involved loading the data, targeted removal of nodes and facts based on instance (P31) and subclass relationships (P279), and cleanup operations.
Here is a tabular overview of the steps (based on the protocol; I have consolidated redundant or test-like steps such as loading a 50 GB test and made educated guesses for transitions, e.g., that prunes were continued in sequential sessions):
| Step | Command / Action | Description / Removed Domain | Removed Nodes |
|---|---|---|---|
| 1 | .load wikidata-20251222-all.json (implicit via cache or direct) |
Loading the full Wikidata data (113 million items) into zelph. Creates a .bin cache for quick reloading. |
N/A |
| 2 | .lang wikidata |
Switch to Wikidata language for correct ID handling. | N/A |
| 3 | .prune-nodes A P31 Q8054 |
Remove all instances of protein (Q8054). | 990416 |
| 4 | .prune-nodes A P279 Q8054 |
Remove all subclasses of protein (Q8054). | 17565 |
| 5 | .prune-nodes A P31 Q7187 |
Remove all instances of gene (Q7187). | 1074168 |
| 6 | .prune-nodes A P279 Q7187 |
Remove all subclasses of gene (Q7187). | 38756 |
| 7 | .prune-nodes A P31 Q11173 |
Remove all instances of chemical compound (Q11173). | 83 |
| 8 | .prune-nodes A P279 Q11173 |
Remove all subclasses of chemical compound (Q11173). | 1061177 |
| 9 | .prune-nodes A P31 Q13442814 |
Remove all instances of scholarly article (Q13442814). | 45381672 |
| 10 | .prune-nodes A P31 Q16521 |
Remove all instances of taxon (Q16521). | 3799221 |
| 11 | .prune-nodes A P31 Q5 |
Remove all instances of human (Q5). | 12930031 |
| 12 | .prune-nodes A P131 B |
Remove all administrative location relationships (P131: located in administrative territorial entity). | 13608039 |
| 13 | .cleanup |
Remove isolated nodes after pruning. | N/A |
| 14 | .prune-nodes A P31 Q4167836 |
Remove all instances of Wikimedia category (Q4167836). | 5725423 |
| 15 | .prune-nodes A P31 Q523 |
Remove all instances of star (Q523). | 3275598 |
| 16 | .prune-nodes A P31 Q318 |
Remove all instances of galaxy (Q318). | 2100179 |
| 17 | .prune-nodes A P31 Q4167410 |
Remove all instances of Wikimedia disambiguation page (Q4167410). | 1512307 |
| 18 | .prune-nodes A P31 Q113145171 |
Remove all instances of type of chemical entity (Q113145171). | 285332 |
| 19 | .prune-nodes A P31 Q11266439 |
Remove all instances of Wikimedia template (Q11266439). | 803009 |
| 20 | .prune-nodes A P31 Q79007 |
Remove all instances of street (Q79007). | 3903 |
| 21 | .prune-nodes A P31 Q13433827 |
Remove all instances of encyclopedia article (Q13433827). | 654456 |
| 22 | .prune-nodes A P31 Q101352 |
Remove all instances of family name (Q101352). | 661987 |
| 23 | .prune-nodes A P31 Q13100073 |
Remove all instances of village of the People's Republic of China (Q13100073). | 10695 |
| 24 | .prune-nodes A P279 Q277338 |
Remove all subclasses of pseudogene (Q277338). | 43974 |
| 25 | .prune-nodes A P31 Q277338 |
Remove all instances of pseudogene (Q277338). | 11172 |
| 26 | .prune-nodes A P1433 B |
Remove all publication relationships (P1433: published in). | 1110396 |
| 27 | .prune-nodes A P31 Q3305213 |
Remove all instances of painting (Q3305213). | 1038138 |
| 28 | .prune-nodes A P31 Q4022 |
Remove all instances of river (Q4022). | 70687 |
| 29 | .prune-nodes A P31 Q8502 |
Remove all instances of mountain (Q8502). | 102503 |
| 30 | .prune-nodes A P31 Q486972 |
Remove all instances of human settlement (Q486972). | 61361 |
| 31 | .prune-nodes A P31 Q2668072 |
Remove all instances of collection (Q2668072). | 502805 |
| 32 | .prune-nodes A P31 Q3331189 |
Remove all instances of version, edition or translation (Q3331189). | 685955 |
| 33 | .prune-nodes A P407 Q7850 |
Remove all language relationships to Chinese (Q7850: language of work or name). | 144497 |
| 34 | .prune-nodes A P407 Q7737 |
Remove all language relationships to Russian (Q7737: language of work or name). | 37121 |
| 35 | .prune-nodes A P921 B |
Remove all main subject relationships (P921: main subject). | 554601 |
| 36 | .prune-nodes A P17 B |
Remove all country relationships (P17: country). | 4903773 |
| 37 | .cleanup |
Final removal of isolated nodes. | 2519014 |
| 38 | .save wikidata-20251222-pruned.bin |
Save the final pruned file. | N/A |
After these steps, the file was ready for publication. The pruning steps focused on the largest data volumes (e.g., biology, chemistry, astronomy, geography) to make it easier for users with standard hardware to get started.