Resuming Collections
Note: Checkpoint/resume is a V2-only feature. The V1 collector (
collect()/gmaps-collect) has no resume capability.
How Checkpoints Work
During collection, V2 periodically saves its state to a checkpoint file:
The checkpoint records:
- Which grid cells have been completed
- Which cells failed
- All collected
place_idandhex_idvalues (for deduplication) - Which businesses have been enriched
- Start time and last checkpoint time
Auto-Resume
Resume is enabled by default (resume=True). When V2 starts, it checks for an existing checkpoint file. If one exists, it:
- Loads the previous state
- Skips already-completed cells
- Continues from where it left off
- Maintains deduplication across sessions
Force a Fresh Start
To ignore an existing checkpoint and start over:
Python:
CLI:
Checkpoint Interval
Checkpoints are saved every N businesses (default: 100). Adjust with checkpoint_interval:
Checkpoint Cleanup
On successful completion with no failed cells, the checkpoint file is automatically deleted. If any cells failed, the checkpoint is preserved so you can resume later.
KeyboardInterrupt
If you press Ctrl+C during a V2 CLI collection, the process prints:
The state at the time of interruption is saved. Run the same command again (resume is on by default) to continue.
Manual Checkpoint Cleanup
To force a reset without --no-resume, delete the checkpoint file: