HTML parser that uses CSS selectors to extract content from HTML files, similar to jq for JSON
htmlq is a command-line HTML parser that applies CSS selectors to extract specific content from HTML documents. It functions as an HTML equivalent to jq, allowing users to query HTML files using familiar CSS selector syntax instead of complex parsing logic.
The tool supports multiple output modes including raw HTML extraction, text-only content, and attribute extraction. Users can target elements by ID, class, or any CSS selector, extract specific attributes like href from links, or retrieve only the text content of selected elements. It also provides options to remove unwanted nodes before output and pretty-print the results.
Common workflows include web scraping with curl, extracting links or text content from web pages, and processing HTML files in shell scripts. The tool reads from stdin by default, making it suitable for command pipelines, and can output to files when needed. It's particularly useful for developers and system administrators who need to parse HTML programmatically without writing custom parsing code.
# via Homebrew
brew install htmlq
# via Cargo
cargo install htmlq
# via FreeBSD pkg
pkg install htmlq
