Command-line program for indexing, slicing, analyzing, splitting and joining CSV files with high performance.
xsv is a command-line toolkit for CSV file manipulation that emphasizes speed and composability. It provides over 20 commands for common CSV operations including counting rows, selecting columns, filtering data, generating statistics, joining files, and reformatting output. The tool can create indexes for CSV files that enable constant-time slicing operations and significantly faster statistics generation.
The toolkit includes commands like stats for generating descriptive statistics, frequency for building frequency tables, join for combining datasets, search for regex-based filtering, and sample for random row selection using reservoir sampling. Operations can be chained together using Unix pipes, allowing complex data processing workflows. For example, you can slice specific rows, select certain columns, and format the output as aligned tables in a single pipeline.
xsv is designed for performance-critical CSV processing tasks. Creating an index allows operations like slice to parse only the required portion of large files rather than scanning from the beginning. The frequency command uses parallelism when an index is present, and statistics generation is significantly faster compared to similar tools. However, the project is now unmaintained, with the author recommending qsv or xan as alternatives.
The tool targets data analysts, researchers, and developers who need to process large CSV files efficiently from the command line. It handles Unicode data correctly and supports various CSV dialects through configurable delimiters and quoting rules.
# via Cargo
cargo install xsv