Skip to content

Releases: aborruso/scrape-cli

v1.2.3

06 Apr 09:05

Choose a tag to compare

Agent-friendly CLI improvements

  • All error messages now go to stderr (stdout stays clean for data)
  • Missing -e error replaced with a concise actionable message + example
  • --help extended with examples for all major use cases (XPath, CSS, text, attributes, URL, stdin, check-existence)

v1.2.2

22 Feb 23:32

Choose a tag to compare

  • Added -u/--user-agent option for HTTP requests
  • Default browser-like User-Agent to avoid 403 errors (e.g. Wikipedia)

v1.2.1

22 Feb 12:00

Choose a tag to compare

What's changed

  • Added automated pytest coverage for XPath/CSS detection and CLI options, including URL/file/stdin input paths and error handling.
  • Hardened runtime behavior by adding timeout=30 to URL fetches and replacing a bare except: with except Exception in charset detection.
  • Raised requires-python to >=3.8, removed legacy setup.py, and expanded .gitignore for local test/venv artifacts.

Validation

  • pytest: 18 passed
  • twine check dist/*: passed

Release v1.2.0

07 Sep 11:41

Choose a tag to compare

Bug Fix

  • Fixed XPath detection for expressions wrapped in parentheses
  • XPath expressions like (//div[@class='coordinate lat'])[1] are now correctly recognized as XPath instead of being incorrectly treated as CSS selectors
  • Enhanced the is_xpath function with additional pattern recognition for XPath-specific syntax including attribute predicates, position predicates, and XPath functions

Installation

pip install scrape_cli==1.2.0

v1.1.9: CSS Selector Fix

14 Aug 07:29

Choose a tag to compare

What Changed

🐛 Bug Fix: Fixed CSS selector parsing that was incorrectly identified as XPath

Details

  • Fixed is_xpath() function to properly distinguish CSS selectors from XPath expressions
  • CSS selectors like a[href*="/talk/"] now work correctly
  • Improved selector recognition logic to be more restrictive for XPath detection

Technical Changes

  • Updated XPath detection to only recognize expressions starting with / or // or containing ::
  • This prevents CSS attribute selectors with square brackets from being misidentified as XPath

Installation

pip install --upgrade scrape-cli

v1.1.8

02 Jun 13:11

Choose a tag to compare

What's Changed

Features

  • Added text extraction functionality with -t option
    • Extract only text content without HTML tags
    • Automatically excludes text from script and style tags
    • Cleans up whitespace for better readability
    • Particularly useful for LLMs and text processing workflows
    • Can be combined with CSS selectors or XPath expressions for targeted text extraction

v1.1.7

04 May 13:39

Choose a tag to compare

What's Changed

Features

  • Improved XPath detection with support for complex expressions:
    • Added support for predicates and square brackets
    • Added support for XPath functions (last(), position(), contains(), text())
    • Added support for XPath axes and attributes
    • Better handling of complex XPath expressions

v1.1.6

02 May 17:53

Choose a tag to compare

  • Added charset detection from HTML meta tags
  • Added support for ISO-8859-1 encoding fallback
  • Improved HTML parsing with better encoding handling

v1.1.1

02 Nov 14:50

Choose a tag to compare

update

1.1

02 Nov 14:06

Choose a tag to compare

1.1
Bump version to 0.2