Description
Kaitai Struct is a declarative language used for describing various
binary data structures laid out in files or in memory: i.e. binary
file formats, network stream packet formats, etc.
The main idea is that a particular format is described in Kaitai
Struct language only once and then can be compiled with a ksc into
source files in one of the supported programming languages. These
modules will include a generated code for a parser that can read
described data structure from a file / stream and give access to it in
a nice, easy-to-comprehend API.
Kaitai Struct alternatives and similar packages
Based on the "Specific Formats Processing" category.
Alternatively, view Kaitai Struct alternatives based on common mentions on social networks and blogs.
-
PyPDF2
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files -
csvkit
A suite of utilities for converting to and working with CSV, the king of tabular file formats. -
PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. -
xlwings
xlwings is a Python library that makes it easy to call Python from Excel and vice versa. It works with Excel on Windows and macOS as well as with Google Sheets and Excel on the web. -
unoconv
Universal Office Converter - Convert between any document format supported by LibreOffice/OpenOffice. -
pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents. -
Meltano Singer SDK
Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com -
Python Schema Matching by XGboost and Sentence-Transformers
A python tool using XGboost and sentence-transformers to perform schema matching task on tables.
SaaSHub - Software Alternatives and Reviews
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of Kaitai Struct or a related project?
README
Kaitai Struct
Note: if you want to make changes to the project, do not fork this repository kaitai_struct. Instead, choose the component you want to modify in the file tree above and fork that individual component instead.
This is an umbrella repository, containing the components only as submodules to make it easier to check out the entire project. Unless you want to modify this README, it is not the repo where you can make edits.
What is Kaitai Struct?
Kaitai Struct is a declarative language used for describing various binary data structures laid out in files or in memory: i.e. binary file formats, network stream packet formats, etc.
The main idea is that a particular format is described in Kaitai
Struct language only once and then can be compiled with a ksc
into
source files in one of the supported programming languages. These
modules will include a generated code for a parser that can read
described data structure from a file / stream and give access to it in
a nice, easy-to-comprehend API.
What it's used for?
Have you ever found yourself writing repetitive, error-prone and hard-to-debug code that reads binary data structures from file / network stream and somehow represents them in memory for easier access?
Kaitai Struct tries to make this job easier β you only have to describe the binary format once and then everybody can use it from their programming languages β cross-language, cross-platform.
Kaitai Struct includes a growing collection of format descriptions, available in formats submodule repository.
Can you give me a quick example?
Sure. Consider this simple .ksy
format description file that
describes the header of a GIF file (a popular web image format):
meta:
id: gif
file-extension: gif
endian: le
seq:
- id: header
type: header
- id: logical_screen
type: logical_screen
types:
header:
seq:
- id: magic
contents: 'GIF'
- id: version
size: 3
logical_screen:
seq:
- id: image_width
type: u2
- id: image_height
type: u2
- id: flags
type: u1
- id: bg_color_index
type: u1
- id: pixel_aspect_ratio
type: u1
It declares that GIF files usually have a .gif
extension and use
little-endian integer encoding. The file itself starts with two
blocks: first comes header
and then comes logical_screen
:
- "Header" consists of "magic" string of 3 bytes ("GIF") that
identifies that it's a GIF file starting and then there are 3 more
bytes that identify format version (
87a
or89a
). - "Logical screen descriptor" is a block of integers:
image_width
andimage_height
are 2-byte unsigned intsflags
,bg_color_index
andpixel_aspect_ratio
take 1-byte unsigned int each
This .ksy
file can be compiled it into Gif.cs
/ Gif.java
/
Gif.js
/ Gif.php
/ gif.py
/ gif.rb
and then instantly one can load .gif
file and access, for example, it's width and height.
In C\
Gif g = Gif.FromFile("path/to/some.gif");
Console.WriteLine("width = " + g.LogicalScreen.ImageWidth);
Console.WriteLine("height = " + g.LogicalScreen.ImageHeight);
In Java
Gif g = Gif.fromFile("path/to/some.gif");
System.out.println("width = " + g.logicalScreen().imageWidth());
System.out.println("height = " + g.logicalScreen().imageHeight());
In JavaScript
See JavaScript notes in the documentation for a more complete quick start guide.
var g = new Gif(new KaitaiStream(someArrayBuffer));
console.log("width = " + g.logicalScreen.imageWidth);
console.log("height = " + g.logicalScreen.imageHeight);
In Lua
local g = Gif:from_file("path/to/some.gif")
print("width = " .. g.logical_screen.image_width)
print("height = " .. g.logical_screen.image_height)
In Nim
let g = Gif.fromFile("path/to/some.gif")
echo "width = " & $g.logicalScreen.imageWidth
echo "height = " & $g.logicalScreen.imageHeight
In PHP
$g = Gif::fromFile('path/to/some.gif');
printf("width = %d\n", $g->logicalScreen()->imageWidth());
printf("height = %d\n", $g->logicalScreen()->imageHeight());
In Python
g = Gif.from_file("path/to/some.gif")
print "width = %d" % (g.logical_screen.image_width)
print "height = %d" % (g.logical_screen.image_height)
In Ruby
g = Gif.from_file("path/to/some.gif")
puts "width = #{g.logical_screen.image_width}"
puts "height = #{g.logical_screen.image_height}"
Of course, this example shows only a very limited subset of what Kaitai Struct can do. Please refer to the tutorials and documentation for more insights.
Supported languages
Official Kaitai Struct compiler now supports compiling .ksy
into
source modules for the following languages:
- C#
- Java
- JavaScript
- Lua
- Nim
- PHP
- Python
- Ruby
Downloading and installing
The easiest way to check out the whole Kaitai Struct project is to download the main project repository that already imports all other parts as submodules. Use:
git clone --recursive https://github.com/kaitai-io/kaitai_struct.git
Note the --recursive
option.
Alternatively, one can check out individual subprojects that consitute Kaitai Struct suite. They are:
- kaitai_struct_compiler β compiler that translates
.ksy
into a parser source code written in a target programming language - kaitai_struct_tests β tests & specs to ensure that compiler work as planned
- Runtime libraries
- kaitai_struct_cpp_stl_runtime β for C++/STL
- kaitai_struct_csharp_runtime β for C#
- kaitai_struct_java_runtime β for Java
- kaitai_struct_javascript_runtime β for JavaScript
- kaitai_struct_nim_runtime β for Nim
- kaitai_struct_lua_runtime β for Lua
- kaitai_struct_python_runtime β for Python
- kaitai_struct_ruby_runtime β for Ruby
- kaitai_struct_swift_runtime β for Swift
- kaitai_struct_formats
β library of widely used formats and binary structures described as
.ksy
files
Using KS in your project
Typically, using formats described in KS in your project, involves the following steps:
- Describe the format β i.e. create a
.ksy
file - Compile
.ksy
file into target language source file and include that file in your project - Add KS runtime library for your particular language into your project (don't worry, it's small and it's there mostly to ensure readability of generated code)
- Use generated class(es) to parse your binary file / stream and access its components
Check out the tutorial and documentation for more information.
Licensing
- Compiler β GPLv3+
- Runtime libraries β MIT or Apache v2 (=> you can include generated code even into proprietary applications) β see individual libraries for details