[Avg. reading time: 11 minutes]
Serialization-Deserialization
Serialization converts a data structure or object state into a format that can be stored or transmitted (e.g., file, message, or network).
Deserialization is the reverse process, reconstructing the original object from the serialized form.
(Python/Scala/Rust) Objects to JSON back to Objects (Python/Scala/Rust)
The analogy of translating from Spanish to English (Universal Language) and to German
JSON
JavaScript Object Notation (JSON)
A lightweight, human-readable, and machine-parsable text format.
Pros
- Easy to read and debug.
- Supported by almost all programming languages.
- Ideal for APIs and configuration files.
Cons
- Text-based -> larger size on disk.
- No native schema enforcement.
import json
# Serialization
data = {"name": "Alice", "age": 25, "city": "New York"}
json_str = json.dumps(data)
print(json_str)
# Deserialization
obj = json.loads(json_str)
print(obj["name"])
AVRO
Apache Avro is a binary serialization format designed for efficiency, compactness, and schema evolution.
- Compact & Efficient: Binary encoding → smaller and faster than JSON.
- Schema Evolution: Supports backward/forward compatibility.
- Rich Data Types: Handles nested, array, map, union types.
- Language Independent: Works across Python, Java, Scala, Rust, etc.
- Big Data Integration: Works seamlessly with Hadoop, Kafka, Spark.
- Self-Describing: Schema travels with the data.
Schemas
An Avro schema defines the structure of the Avro data format. It’s a JSON document that describes your data types and protocols, ensuring that even complex data structures are adequately represented. The schema is crucial for data serialization and deserialization, allowing systems to interpret the data correctly.
Example of Avro Schema
{
"type": "record",
"name": "Person",
"namespace": "com.example",
"fields": [
{"name": "firstName", "type": "string"},
{"name": "lastName", "type": "string"},
{"name": "age", "type": "int"},
{"name": "email", "type": ["null", "string"], "default": null}
]
}
Here is the list of Primitive & Complex Data Types which Avro supports:
- null,boolean,int,long,float,double,bytes,string
- records,enums,arrays,maps,unions,fixed
JSON vs Avro
Feature | JSON | Avro |
---|---|---|
Format Type | Text-based (human-readable) | Binary (machine-efficient) |
Size | Larger (verbose) | Smaller (compact) |
Speed | Slower to serialize/deserialize | Much faster (binary encoding) |
Schema | Optional / loosely defined | Mandatory and embedded with data |
Schema Evolution | Not supported | Fully supported (backward & forward compatible) |
Data Types | Basic (string, number, bool, array, object) | Rich (records, enums, arrays, maps, unions, fixed) |
Readability | Human-friendly | Not human-readable |
Integration | Common in APIs, configs | Common in Big Data (Kafka, Spark) |
Use Case | Simple data exchange (REST APIs) | High-performance data pipelines, streaming systems |
In short,
- Use JSON when simplicity & readability matter.
- Use Avro when performance, compactness, and schema evolution matter (especially in Big Data systems).
git clone https://github.com/gchandra10/python_serialization_deserialization_examples.git
Parquet vs Avro
Feature | Avro | Parquet |
---|---|---|
Format Type | Row-based binary format | Columnar binary format |
Best For | Streaming, message passing, row-oriented reads/writes | Analytics, queries, column-oriented reads |
Compression | Moderate (row blocks) | Very high (per column) |
Read Pattern | Reads entire rows | Reads only required columns → faster for queries |
Write Pattern | Fast row inserts / appends | Best for batch writes (not streaming-friendly) |
Schema | Embedded JSON schema, supports evolution | Embedded schema, supports evolution (with constraints) |
Data Evolution | Flexible backward/forward compatibility | Supported, but limited (column addition/removal) |
Use Case | Kafka, Spark streaming, data ingestion pipelines | Data warehouses, lakehouse tables, analytics queries |
Integration | Hadoop, Kafka, Spark, Hive | Spark, Hive, Trino, Databricks, Snowflake |
Readability | Not human-readable | Not human-readable |
Typical File Extension | .avro | .parquet |