App Features
View Parquet File Details
Quickly preview file contents, including metadata, row count, and file size.
View Parquet Schema
View detailed schema information including column types, compression algorithms, and encoding schemes with intuitive visualizations.
What is Parquet?
Parquet vs Other Formats
Parquet vs JSON
- Storage Efficiency: Parquet's columnar storage typically offers 3-5x better compression than row-based JSON
- Schema Evolution: Parquet supports explicit schema definition with backward/forward compatibility
- Query Performance: Columnar format enables efficient column pruning and predicate pushdown
- Best Use: JSON for APIs/web data, Parquet for analytical workloads
Parquet vs CSV
- Type Safety: Parquet enforces data types while CSV relies on inference
- Compression: Parquet achieves 75%+ compression rates vs CSV's typical 20-40%
- Chunking: Parquet supports efficient data partitioning with row groups
- Metadata: Built-in statistics in Parquet enable optimization without full scans
Working with Parquet
Java Implementation
Read Parquet:
ParquetReader<GenericRecord> reader = ParquetReader.builder(new Path("data.parquet"))
.withConf(new Configuration())
.build();
GenericRecord record;
while ((record = reader.read()) != null) {
// Process record
}
Write Parquet:
MessageType schema = parseSchema();
ParquetWriter<GenericRecord> writer = AvroParquetWriter
.<GenericRecord>builder(new Path("output.parquet"))
.withSchema(schema)
.build();
writer.write(record);
writer.close();
JavaScript Implementation
Read Parquet:
import { ParquetReader } from 'parquetjs';
const reader = await ParquetReader.openFile('data.parquet');
const cursor = reader.getCursor();
let record;
while (record = await cursor.next()) {
// Process record
}
Write Parquet:
import { ParquetWriter } from 'parquetjs';
const schema = new ParquetSchema({ /* schema definition */ });
const writer = await ParquetWriter.openFile(schema, 'output.parquet');
await writer.appendRow({ /* data */ });
await writer.close();
Python Implementation
Read Parquet:
import pandas as pd
df = pd.read_parquet('data.parquet', engine='pyarrow')
print(df.head())
Write Parquet:
df.to_parquet('output.parquet',
engine='pyarrow',
compression='snappy')