Migration Guide: v1.x to v2.0.0
This guide covers all breaking changes, deprecations, and new features in v2.0.0. For the full list of changes, see the Changelog.
Breaking Changes
Default behavior: No FastAPI server
v1.x: GMapsExtractor automatically started a FastAPI server in a background thread. All requests to Google Maps were routed through this local server.
v2.0.0: GMapsExtractor makes direct HTTP requests to Google Maps using GMapsClient (powered by httpx). No server is started by default. FastAPI and uvicorn are no longer required dependencies.
Before (v1.x):
# Server auto-started in background
with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
result = extractor.collect("NYC", "lawyers")
After (v2.0.0) -- no changes needed for basic usage:
# Same API, but no server started -- direct HTTP
with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
result = extractor.collect("NYC", "lawyers")
If you need the server (e.g., for CLI usage or external API access):
# Opt-in to server mode
with GMapsExtractor(proxy="...", use_server=True) as extractor:
result = extractor.collect("NYC", "lawyers")
This requires installing the server extra:
Core dependencies reduced
v1.x: Required fastapi, uvicorn, pydantic, and httpx.
v2.0.0: Core install only requires httpx>=0.25.0. FastAPI, uvicorn, and pydantic are available via the [server] optional extra.
If your code imports from fastapi or uvicorn, add the server extra:
auto_start_server parameter deprecated
v1.x: auto_start_server=True (default) controlled server auto-start.
v2.0.0: Use use_server=True instead. Setting auto_start_server=True explicitly still works (it implies use_server=True) but emits a deprecation warning.
Before:
After:
GMapsExtractor(use_server=False) # Default -- no server
GMapsExtractor(use_server=True) # Explicitly use server
Deprecations
ExtractorConfig.apply() is deprecated
The apply() method that monkey-patched module-level globals now emits a DeprecationWarning. It is still called internally for backward compatibility, but new code should use GMapsSettings and GMapsClient instead.
verbose parameter
The verbose=True parameter still works but now operates through Python's logging module instead of direct print() calls. When verbose=True and no user-configured logging handlers exist, a StreamHandler is added to the gmaps_extractor logger at INFO level.
New Features
Async API
Three new async methods on GMapsExtractor:
async with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
# Batch collection (returns CollectionResult)
result = await extractor.async_collect_v2("NYC", "lawyers", enrich=True)
# Convenience alias (same as async_collect_v2)
result = await extractor.async_collect("NYC", "lawyers")
# Streaming (yields businesses one at a time)
async for biz in extractor.stream_collect_v2("NYC", "lawyers"):
print(biz["name"])
CollectionResult also supports async iteration:
AsyncGMapsClient
Standalone async client for custom workflows:
from gmaps_extractor import AsyncGMapsClient
from gmaps_extractor.settings import GMapsSettings
settings = GMapsSettings(proxy_url="http://user:pass@host:port")
async with AsyncGMapsClient(settings) as client:
businesses = await client.search("lawyers", lat=40.7, lng=-74.0)
details = await client.place_details(hex_id="0x...:0x...", name="Acme")
reviews = await client.reviews(hex_id="0x...:0x...", limit=20)
GMapsClient (direct HTTP)
Sync client that bypasses the server entirely:
from gmaps_extractor.client import GMapsClient
from gmaps_extractor.settings import GMapsSettings
settings = GMapsSettings(proxy_url="http://user:pass@host:port")
client = GMapsClient(settings)
businesses = client.search("lawyers", lat=40.7, lng=-74.0)
details = client.place_details(hex_id="0x...:0x...", name="Acme")
GMapsSettings
Centralized configuration dataclass:
from gmaps_extractor.settings import GMapsSettings
# From environment variables
settings = GMapsSettings.from_env(default_workers=30)
# From config.py
settings = GMapsSettings.from_config(proxy_url="http://override:8080")
# Explicit
settings = GMapsSettings(
proxy_url="http://user:pass@host:port",
default_workers=30,
delay_between_cells=0.1,
cookies_ttl=1800,
)
Event system
Lifecycle hooks for monitoring collection progress:
from gmaps_extractor import GMapsExtractor, EventType, EventEmitter
emitter = EventEmitter()
emitter.on(EventType.CELL_COMPLETE, lambda e: print(f"+{e.data['businesses_found']}"))
emitter.on(EventType.COLLECTION_COMPLETE, lambda e: print(f"Total: {e.data['total_businesses']}"))
with GMapsExtractor(proxy="...", events=emitter) as extractor:
result = extractor.collect_v2("NYC", "lawyers")
Available event types: COLLECTION_START, CELL_COMPLETE, BUSINESS_FOUND, ENRICHMENT_START, ENRICHMENT_COMPLETE, RATE_LIMIT, CHECKPOINT_SAVED, SEARCH_COMPLETE, COLLECTION_COMPLETE, ERROR.
ProgressReporter
Pluggable progress output that attaches to the event system:
from gmaps_extractor.events import EventEmitter
from gmaps_extractor.progress import ProgressReporter
emitter = EventEmitter()
reporter = ProgressReporter(output_fn=lambda s: my_logger.info(s))
reporter.attach(emitter)
GMapsExtractor auto-creates a ProgressReporter when verbose=True.
Structured logging
All print() statements replaced with Python's logging module:
import logging
# Configure logging manually
logging.getLogger("gmaps_extractor").setLevel(logging.DEBUG)
logging.getLogger("gmaps_extractor").addHandler(logging.StreamHandler())
The library attaches a NullHandler by default, producing no output. When verbose=True, a StreamHandler is added at INFO level.
Cookie lifecycle improvements
- Auto-retry with fresh cookies on HTTP 429, consent redirects, and empty responses
- Proactive cookie refresh every 500 requests during long collections
- Fresh SOCS consent cookie generation with current timestamp
- Per-client cookie cache (no global state)
Request freshness
- Rotating User-Agent pool with recent Chrome versions
- Full browser-like headers (Sec-Fetch-*, Accept-Encoding, Cache-Control)
- Epoch millisecond timestamp in search protobuf for cache-busting
- Anti-cache headers to prevent CDN-level stale responses
Dependency Changes
| Dependency | v1.x | v2.0.0 |
|---|---|---|
| httpx | required | required (>= 0.25.0) |
| fastapi | required | optional ([server]) |
| uvicorn | required | optional ([server]) |
| pydantic | required | optional ([server]) |
| pytest | dev | dev |
| pytest-asyncio | -- | dev |
| ruff | -- | dev |
| mypy | -- | dev |
Configuration Changes
| v1.x | v2.0.0 | Notes |
|---|---|---|
auto_start_server=True |
use_server=False |
Server no longer starts by default |
server_port=8000 |
server_port=8000 |
Only relevant when use_server=True |
verbose=True (print) |
verbose=True (logging) |
Uses logging module instead of print() |
| -- | events=EventEmitter() |
New: lifecycle callbacks |
| -- | progress=True |
New: pluggable progress reporter |
| -- | on_business_found=cb |
New: convenience callback shortcut |
| -- | on_collection_complete=cb |
New: convenience callback shortcut |
Summary of Action Items
-
No action needed if you only use
GMapsExtractor.collect()/collect_v2()with sync workflows. The API is backward compatible. -
Install
[server]extra if you use CLI commands (gmaps-collect,gmaps-server) or explicitly setuse_server=True: -
Replace
auto_start_serverwithuse_serverto silence deprecation warnings. -
Configure logging if you relied on
verbose=Trueproducingprint()output. The output format is similar but now goes throughlogging. -
Consider async API for better performance in I/O-bound applications.
-
Consider events for progress monitoring instead of parsing stdout.