Skip to content

Migration Guide: v1.x to v2.0.0

This guide covers all breaking changes, deprecations, and new features in v2.0.0. For the full list of changes, see the Changelog.

Breaking Changes

Default behavior: No FastAPI server

v1.x: GMapsExtractor automatically started a FastAPI server in a background thread. All requests to Google Maps were routed through this local server.

v2.0.0: GMapsExtractor makes direct HTTP requests to Google Maps using GMapsClient (powered by httpx). No server is started by default. FastAPI and uvicorn are no longer required dependencies.

Before (v1.x):

# Server auto-started in background
with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
    result = extractor.collect("NYC", "lawyers")

After (v2.0.0) -- no changes needed for basic usage:

# Same API, but no server started -- direct HTTP
with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
    result = extractor.collect("NYC", "lawyers")

If you need the server (e.g., for CLI usage or external API access):

# Opt-in to server mode
with GMapsExtractor(proxy="...", use_server=True) as extractor:
    result = extractor.collect("NYC", "lawyers")

This requires installing the server extra:

pip install gmaps-extractor[server]

Core dependencies reduced

v1.x: Required fastapi, uvicorn, pydantic, and httpx.

v2.0.0: Core install only requires httpx>=0.25.0. FastAPI, uvicorn, and pydantic are available via the [server] optional extra.

If your code imports from fastapi or uvicorn, add the server extra:

pip install gmaps-extractor[server]

auto_start_server parameter deprecated

v1.x: auto_start_server=True (default) controlled server auto-start.

v2.0.0: Use use_server=True instead. Setting auto_start_server=True explicitly still works (it implies use_server=True) but emits a deprecation warning.

Before:

GMapsExtractor(auto_start_server=False)  # Don't start server

After:

GMapsExtractor(use_server=False)  # Default -- no server
GMapsExtractor(use_server=True)   # Explicitly use server

Deprecations

ExtractorConfig.apply() is deprecated

The apply() method that monkey-patched module-level globals now emits a DeprecationWarning. It is still called internally for backward compatibility, but new code should use GMapsSettings and GMapsClient instead.

verbose parameter

The verbose=True parameter still works but now operates through Python's logging module instead of direct print() calls. When verbose=True and no user-configured logging handlers exist, a StreamHandler is added to the gmaps_extractor logger at INFO level.

New Features

Async API

Three new async methods on GMapsExtractor:

async with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
    # Batch collection (returns CollectionResult)
    result = await extractor.async_collect_v2("NYC", "lawyers", enrich=True)

    # Convenience alias (same as async_collect_v2)
    result = await extractor.async_collect("NYC", "lawyers")

    # Streaming (yields businesses one at a time)
    async for biz in extractor.stream_collect_v2("NYC", "lawyers"):
        print(biz["name"])

CollectionResult also supports async iteration:

result = await extractor.async_collect_v2("NYC", "lawyers")
async for biz in result:
    process(biz)

AsyncGMapsClient

Standalone async client for custom workflows:

from gmaps_extractor import AsyncGMapsClient
from gmaps_extractor.settings import GMapsSettings

settings = GMapsSettings(proxy_url="http://user:pass@host:port")

async with AsyncGMapsClient(settings) as client:
    businesses = await client.search("lawyers", lat=40.7, lng=-74.0)
    details = await client.place_details(hex_id="0x...:0x...", name="Acme")
    reviews = await client.reviews(hex_id="0x...:0x...", limit=20)

GMapsClient (direct HTTP)

Sync client that bypasses the server entirely:

from gmaps_extractor.client import GMapsClient
from gmaps_extractor.settings import GMapsSettings

settings = GMapsSettings(proxy_url="http://user:pass@host:port")
client = GMapsClient(settings)

businesses = client.search("lawyers", lat=40.7, lng=-74.0)
details = client.place_details(hex_id="0x...:0x...", name="Acme")

GMapsSettings

Centralized configuration dataclass:

from gmaps_extractor.settings import GMapsSettings

# From environment variables
settings = GMapsSettings.from_env(default_workers=30)

# From config.py
settings = GMapsSettings.from_config(proxy_url="http://override:8080")

# Explicit
settings = GMapsSettings(
    proxy_url="http://user:pass@host:port",
    default_workers=30,
    delay_between_cells=0.1,
    cookies_ttl=1800,
)

Event system

Lifecycle hooks for monitoring collection progress:

from gmaps_extractor import GMapsExtractor, EventType, EventEmitter

emitter = EventEmitter()
emitter.on(EventType.CELL_COMPLETE, lambda e: print(f"+{e.data['businesses_found']}"))
emitter.on(EventType.COLLECTION_COMPLETE, lambda e: print(f"Total: {e.data['total_businesses']}"))

with GMapsExtractor(proxy="...", events=emitter) as extractor:
    result = extractor.collect_v2("NYC", "lawyers")

Available event types: COLLECTION_START, CELL_COMPLETE, BUSINESS_FOUND, ENRICHMENT_START, ENRICHMENT_COMPLETE, RATE_LIMIT, CHECKPOINT_SAVED, SEARCH_COMPLETE, COLLECTION_COMPLETE, ERROR.

ProgressReporter

Pluggable progress output that attaches to the event system:

from gmaps_extractor.events import EventEmitter
from gmaps_extractor.progress import ProgressReporter

emitter = EventEmitter()
reporter = ProgressReporter(output_fn=lambda s: my_logger.info(s))
reporter.attach(emitter)

GMapsExtractor auto-creates a ProgressReporter when verbose=True.

Structured logging

All print() statements replaced with Python's logging module:

import logging

# Configure logging manually
logging.getLogger("gmaps_extractor").setLevel(logging.DEBUG)
logging.getLogger("gmaps_extractor").addHandler(logging.StreamHandler())

The library attaches a NullHandler by default, producing no output. When verbose=True, a StreamHandler is added at INFO level.

  • Auto-retry with fresh cookies on HTTP 429, consent redirects, and empty responses
  • Proactive cookie refresh every 500 requests during long collections
  • Fresh SOCS consent cookie generation with current timestamp
  • Per-client cookie cache (no global state)

Request freshness

  • Rotating User-Agent pool with recent Chrome versions
  • Full browser-like headers (Sec-Fetch-*, Accept-Encoding, Cache-Control)
  • Epoch millisecond timestamp in search protobuf for cache-busting
  • Anti-cache headers to prevent CDN-level stale responses

Dependency Changes

Dependency v1.x v2.0.0
httpx required required (>= 0.25.0)
fastapi required optional ([server])
uvicorn required optional ([server])
pydantic required optional ([server])
pytest dev dev
pytest-asyncio -- dev
ruff -- dev
mypy -- dev

Configuration Changes

v1.x v2.0.0 Notes
auto_start_server=True use_server=False Server no longer starts by default
server_port=8000 server_port=8000 Only relevant when use_server=True
verbose=True (print) verbose=True (logging) Uses logging module instead of print()
-- events=EventEmitter() New: lifecycle callbacks
-- progress=True New: pluggable progress reporter
-- on_business_found=cb New: convenience callback shortcut
-- on_collection_complete=cb New: convenience callback shortcut

Summary of Action Items

  1. No action needed if you only use GMapsExtractor.collect() / collect_v2() with sync workflows. The API is backward compatible.

  2. Install [server] extra if you use CLI commands (gmaps-collect, gmaps-server) or explicitly set use_server=True:

    pip install gmaps-extractor[server]
    

  3. Replace auto_start_server with use_server to silence deprecation warnings.

  4. Configure logging if you relied on verbose=True producing print() output. The output format is similar but now goes through logging.

  5. Consider async API for better performance in I/O-bound applications.

  6. Consider events for progress monitoring instead of parsing stdout.