
Like awk, sed, cut, join, and sort for name-indexed data (CSV, JSON, etc)
Miller is a powerful command-line data processing tool that combines the functionality of traditional Unix utilities like awk, sed, cut, join, and sort, but designed specifically for structured data formats including CSV, TSV, JSON, JSON Lines, and positionally-indexed data. Unlike traditional tools that work with integer-indexed fields, Miller operates on key-value-pair data using insertion-ordered hash maps as its natural data structure, allowing you to work with named fields without counting positional indices.
Miller excels at data cleaning, transformation, and analysis tasks. You can add new fields as functions of existing ones, drop unwanted columns, sort data, perform statistical aggregations, convert between formats, and pretty-print output—all while maintaining format awareness (like keeping CSV headers intact during operations). The tool is streaming-capable, processing most operations with only a single record in memory at a time, making it suitable for large datasets that exceed available RAM.
The tool is particularly valuable for data scientists, system administrators, and developers who need to preprocess data before feeding it into analysis tools like R or pandas, clean log files, perform quick database query post-processing, or handle format conversions. Miller's single binary design with zero runtime dependencies makes it highly portable and easy to deploy across different systems. Its high-throughput performance matches traditional Unix tools while providing modern data format support and record heterogeneity handling for no-SQL style data processing.
# via Homebrew
brew install miller
# via APT
apt-get install miller
# via Go Install
go install github.com/johnkerl/miller/v6/cmd/mlr@latest
