Output Format

All collection methods write results to disk. Default output directory is output/ with filenames derived from the category and area: {category}_in_{area}.json and .csv.

JSON Structure

{
  "metadata": {
    "area": "Manhattan, New York",
    "category": "lawyers",
    "boundary": {
      "name": "Manhattan Region",
      "north": 40.927,
      "south": 40.654,
      "east": -73.862,
      "west": -74.093
    },
    "search_mode": "grid",
    "enrichment": {
      "details_fetched": true,
      "reviews_fetched": true,
      "reviews_limit": 20
    }
  },
  "statistics": {
    "total_collected": 342,
    "duplicates_removed": 58,
    "filtered_outside_boundary": 23,
    "search_time_seconds": 45.2,
    "total_time_seconds": 180.7
  },
  "businesses": [
    {
      "name": "Smith & Associates Law Firm",
      "address": "123 Broadway, New York, NY 10006",
      "place_id": "ChIJabc123def456",
      "hex_id": "0x89c259a8669c0f0d:0x25d4109319b4f5a0",
      "ftid": "/g/1vs5xm_3",
      "rating": 4.5,
      "review_count": 87,
      "latitude": 40.7128,
      "longitude": -74.0060,
      "phone": "+1 212-555-0123",
      "website": "https://www.smithlaw.example.com",
      "category": "Lawyer",
      "categories": ["Lawyer", "Legal Services"],
      "hours": {
        "monday": "9:00 AM - 5:00 PM",
        "tuesday": "9:00 AM - 5:00 PM",
        "wednesday": "9:00 AM - 5:00 PM",
        "thursday": "9:00 AM - 5:00 PM",
        "friday": "9:00 AM - 5:00 PM"
      },
      "found_in": "Manhattan, New York, NY, USA",
      "reviews_data": [
        {
          "review_id": "ChdDSUh...",
          "author": "Jane Doe",
          "author_photo": "https://lh3.googleusercontent.com/...",
          "rating": 5,
          "date": "2 months ago",
          "text": "Excellent service..."
        }
      ]
    }
  ]
}

Business Fields

Each business dictionary contains up to 16 fields:

Field	Type	Description
`name`	string	Business name
`address`	string	Full street address
`place_id`	string	Google Places ID (e.g., `"ChIJ..."`)
`hex_id`	string	Hex format ID (e.g., `"0x...:0x..."`) used for details/reviews
`ftid`	string	Feature ID (e.g., `"/g/1vs5xm_3"`)
`rating`	float	Average rating (1.0-5.0), or `null`
`review_count`	int	Number of Google reviews
`latitude`	float	Geographic latitude
`longitude`	float	Geographic longitude
`phone`	string	Phone number (requires enrichment)
`website`	string	Website URL (requires enrichment)
`category`	string	Primary business category
`categories`	list	All categories assigned to the business
`hours`	dict	Operating hours keyed by day of week (requires enrichment)
`found_in`	string	Sub-area or area name where the business was found
`reviews_data`	list	List of review dicts (requires `reviews=True`)

Fields that require enrichment (enrich=True or reviews=True) are null or absent when enrichment is not enabled.

CSV Format

The CSV file contains the same 16 fields as columns:

name, address, place_id, hex_id, ftid, rating, review_count, latitude, longitude, phone, website, category, categories, hours, found_in, reviews_data

Dictionary and list values (categories, hours, reviews_data) are serialized as JSON strings within their CSV cells.

JSONL Streaming (V2 Only)

collect_v2() writes a .jsonl file alongside the JSON output. Each line is a single business as JSON, written as soon as it is collected:

output/lawyers_in_manhattan.jsonl

This is useful for monitoring progress or processing results before collection finishes. The JSONL file contains only the business objects (no metadata or statistics wrapper).

Output File Paths

Default paths use a sanitized form of the area and category:

output/{category}_in_{area}.json
output/{category}_in_{area}.csv
output/{category}_in_{area}.jsonl    (V2 only)

Where {area} is the text before the first comma, lowercased, with spaces replaced by underscores. For example, "Manhattan, New York" becomes manhattan.

Custom paths:

# Python
result = extractor.collect_v2("NYC", "lawyers", output_file="my_data.json", output_csv="my_data.csv")

# CLI
gmaps-collect-v2 "NYC" "lawyers" -o my_data.json --csv my_data.csv

Note: Output files are always written to disk, even when using the Python library. There is no option to suppress file output.