Releases: aborruso/scrape-cli
Releases · aborruso/scrape-cli
v1.2.3
Agent-friendly CLI improvements
- All error messages now go to stderr (stdout stays clean for data)
- Missing
-eerror replaced with a concise actionable message + example --helpextended with examples for all major use cases (XPath, CSS, text, attributes, URL, stdin, check-existence)
v1.2.2
v1.2.1
What's changed
- Added automated pytest coverage for XPath/CSS detection and CLI options, including URL/file/stdin input paths and error handling.
- Hardened runtime behavior by adding
timeout=30to URL fetches and replacing a bareexcept:withexcept Exceptionin charset detection. - Raised
requires-pythonto>=3.8, removed legacysetup.py, and expanded.gitignorefor local test/venv artifacts.
Validation
pytest: 18 passedtwine check dist/*: passed
Release v1.2.0
Bug Fix
- Fixed XPath detection for expressions wrapped in parentheses
- XPath expressions like
(//div[@class='coordinate lat'])[1]are now correctly recognized as XPath instead of being incorrectly treated as CSS selectors - Enhanced the
is_xpathfunction with additional pattern recognition for XPath-specific syntax including attribute predicates, position predicates, and XPath functions
Installation
pip install scrape_cli==1.2.0v1.1.9: CSS Selector Fix
What Changed
🐛 Bug Fix: Fixed CSS selector parsing that was incorrectly identified as XPath
Details
- Fixed
is_xpath()function to properly distinguish CSS selectors from XPath expressions - CSS selectors like
a[href*="/talk/"]now work correctly - Improved selector recognition logic to be more restrictive for XPath detection
Technical Changes
- Updated XPath detection to only recognize expressions starting with
/or//or containing:: - This prevents CSS attribute selectors with square brackets from being misidentified as XPath
Installation
pip install --upgrade scrape-cliv1.1.8
What's Changed
Features
- Added text extraction functionality with -t option
- Extract only text content without HTML tags
- Automatically excludes text from script and style tags
- Cleans up whitespace for better readability
- Particularly useful for LLMs and text processing workflows
- Can be combined with CSS selectors or XPath expressions for targeted text extraction
v1.1.7
What's Changed
Features
- Improved XPath detection with support for complex expressions:
- Added support for predicates and square brackets
- Added support for XPath functions (last(), position(), contains(), text())
- Added support for XPath axes and attributes
- Better handling of complex XPath expressions