Like jq but for HTML - extract data using CSS selectors
htmlq is a command-line tool that brings the power of jq's querying capabilities to HTML documents. Instead of JSON paths, it uses familiar CSS selectors to extract specific content from HTML files, making it perfect for web scraping, content extraction, and HTML processing tasks.
The tool offers flexible output options including extracting text content only, specific attributes, or pretty-printed HTML. It supports advanced features like removing unwanted nodes before output, detecting base URLs from HTML documents, and handling whitespace appropriately. htmlq can read from stdin or files, making it ideal for piping with other tools like curl for web scraping workflows.
Developers, system administrators, and data analysts who need to extract structured data from HTML documents will find htmlq invaluable. It's particularly useful for automating web scraping tasks, processing HTML in shell scripts, or quickly extracting specific information from web pages without writing complex parsing code.
# via Homebrew
brew install htmlq
# via Cargo
cargo install htmlq
