Batch Geocoding Sample

This sample demonstrates batch geocoding addresses using the LightBox API. It includes a Jupyter Notebook walkthrough, a runnable Python script, and sample input data.

Jupyter Notebook

View Raw

LightBox API - Purpose of Search Endpoint

Return an address based on the full text string 'text,' where each result includes a representative point for the address and references to related parcels. If there is no exact match, this will return the best possible match. The '$ref' value within each 'parcels' object for a specific address can be used to get information about the parcel.

Batch Geocoding Addresses with Search

This batch geocoder processes addresses in batches sequentially, which means it sends one request at a time to the LightBox API per address within a batch. This approach is straightforward but does not leverage concurrent processing techniques such as multi-threading or asynchronous requests.

This notebook provides a step-by-step guide to batch geocoding addresses using the LightBox API. The process involves several key steps, each detailed in subsequent sections:

  1. Setup

    • Importing necessary Python libraries.
    • Defining global configurations and API keys.
  2. Function Definitions

    • geocode_address: Function to geocode a single address.
    • read_addresses_from_csv: Function to read and format addresses from a CSV file.
    • batch_geocode_addresses: Function to process addresses in batches and geocode them.
  3. API Key

    • Enter your API Key for Authorization.
  4. Reading Input Data

    • Reading and formatting addresses from a user-specified CSV file.
  5. Batch Geocoding Process

    • Executing the batch geocoding process using the defined functions.
    • Handling different scenarios and errors during the geocoding process.
  6. Saving Results

    • Saving the geocoded data to a CSV file.
    • Format and content of the output data.

Additional Materials: LightBox Developer Portal

1. Import the required python packages

1import requests
2import pandas as pd
3from typing import Dict, List

2. Import function definitions

1# Function to geocode a single address using the LightBox API.
2def geocode_address(lightbox_api_key: str, address: str) -> Dict:
3    """
4    Geocodes the provided address using the LightBox API.
5    
6    Args:
7        lightbox_api_key (str): The API key for accessing the LightBox API.
8        address (str): The address string for matching.
9    
10    Returns:
11        dict: The geocoded address information in JSON format.
12    """
13    # API endpoint configuration
14    BASE_URL = "https://api.lightboxre.com/v1"
15    ENDPOINT = "/addresses/search"
16    URL = BASE_URL + ENDPOINT
17
18    # Setting up request parameters and headers
19    params = {'text': address}
20    headers = {'x-api-key': lightbox_api_key}
21
22    # Sending request to the LightBox API
23    response = requests.get(URL, params=params, headers=headers)
24
25    return response
26
27# Function to read addresses from a CSV file and format them.
28def read_addresses_from_csv(file_path: str) -> List[str]:
29    """
30    Reads addresses from a CSV file and formats them into 'Address, City State Zip Code'.
31    
32    Args:
33        file_path (str): Path to the CSV file.
34    
35    Returns:
36        List[str]: A list of formatted address strings.
37    """
38    df = pd.read_csv(file_path)
39
40    # Concatenating address components into a single address string per row
41    formatted_addresses = df.apply(
42        lambda row: f"{row['Address']}, {row['City']} {row['State']} {row['Zip Code']}", 
43        axis=1
44    )
45    return formatted_addresses.tolist()
46
47# Function to batch process addresses for geocoding.
48def batch_geocode_addresses(api_key: str, addresses: List[str], batch_size: int = 200) -> pd.DataFrame:
49    """
50    Batch processes a list of addresses for geocoding.
51
52    Args:
53        api_key (str): API key for the geocoding service.
54        addresses (List[str]): List of addresses to geocode.
55        batch_size (int): Number of addresses to process in each batch.
56    
57    Returns:
58        pd.DataFrame: DataFrame containing original addresses and expanded geocoded data.
59    """
60    batched_addresses = [addresses[i:i + batch_size] for i in range(0, len(addresses), batch_size)]
61    all_results = []
62
63    for batch in batched_addresses:
64        for address in batch:
65            result = geocode_address(api_key, address)
66            if result.status_code == 200:
67                data = result.json()
68                # Extracting data from the first match
69                if data['addresses']:
70                    first_match = data['addresses'][0]
71                    latitude = first_match['location']['representativePoint']['latitude']
72                    longitude = first_match['location']['representativePoint']['longitude']
73                    confidence_score = first_match['$metadata']['geocode']['confidence']['score']
74                    precision_code = first_match['$metadata']['geocode']['precisionCode']  # Extracting precision code
75                    all_results.append({
76                        "address": address, 
77                        "latitude": latitude, 
78                        "longitude": longitude, 
79                        "confidence_score": confidence_score,
80                        "precision_code": precision_code  # Adding precision code to the DataFrame
81                    })
82                else:
83                    all_results.append({
84                        "address": address, 
85                        "latitude": "No match", 
86                        "longitude": "No match", 
87                        "confidence_score": "No match",
88                        "precision_code": "No match"
89                    })
90            else:
91                all_results.append({
92                    "address": address, 
93                    "latitude": "Failed",
94                    "longitude": f"Status Code: {result.status_code}",
95                    "confidence_score": "Failed",
96                    "precision_code": "Failed"
97                })
98                print(f"Failed to geocode address '{address}', Status Code: {result.status_code}")
99
100    return pd.DataFrame(all_results)

3. Create variable(s) that will be used to authenticate your calls.

Get your key from the LightBox Developer Portal.

1lightbox_api_key = '<YOUR_API_KEY>'

4. Reading input data.

  • The user specifies the location and name of the input file of addresses.
    • Assuming the file is in the root folder, a user would input input_file_name.csv
    • This script assumes that the input csv file has data with the headers 'Address', 'City', 'State' and 'Zip Code'.
  • The user specifies the location and name of the output file for csv data.
    • Assuming the file is in the root folder, a user would input output_file_name.csv
1input_file_path = 'input.csv' # User inputs the file name
2output_file_path = 'output.csv'  # User inputs the output file name
3
4# Reading and processing addresses
5print("Reading addresses from CSV file...")
6addresses = read_addresses_from_csv(input_file_path)
Reading addresses from CSV file...

5. Batch Geocoding Process

1print("Starting batch geocoding...")
2geocoded_data = batch_geocode_addresses(lightbox_api_key, addresses)
3print("Batch geocoding completed.")
Starting batch geocoding...
Batch geocoding completed.

6. Saving Results

1# Saving geocoded data to output file
2geocoded_data.to_csv(output_file_path, index=False)
3print(f"Geocoded data saved to '{output_file_path}'.")
Geocoded data saved to 'output4.csv'.

Python Script

View Raw
1import requests
2import pandas as pd
3from typing import Dict, List
4
5
6# ----------------------------
7# Function Definitions
8# ----------------------------
9
10# Function to geocode a single address using the LightBox API.
11def geocode_address(lightbox_api_key: str, address: str) -> Dict:
12    """
13    Geocodes the provided address using the LightBox API.
14    
15    Args:
16        lightbox_api_key (str): The API key for accessing the LightBox API.
17        address (str): The address string for matching.
18    
19    Returns:
20        dict: The geocoded address information in JSON format.
21    """
22    # API endpoint configuration
23    BASE_URL = "https://api.lightboxre.com/v1"
24    ENDPOINT = "/addresses/search"
25    URL = BASE_URL + ENDPOINT
26
27    # Setting up request parameters and headers
28    params = {'text': address}
29    headers = {'x-api-key': lightbox_api_key}
30
31    # Sending request to the LightBox API
32    response = requests.get(URL, params=params, headers=headers)
33
34    return response
35
36# Function to read addresses from a CSV file and format them.
37def read_addresses_from_csv(file_path: str) -> List[str]:
38    """
39    Reads addresses from a CSV file and formats them into 'Address, City State Zip Code'.
40    
41    Args:
42        file_path (str): Path to the CSV file.
43    
44    Returns:
45        List[str]: A list of formatted address strings.
46    """
47    df = pd.read_csv(file_path)
48
49    # Concatenating address components into a single address string per row
50    formatted_addresses = df.apply(
51        lambda row: f"{row['Address']}, {row['City']} {row['State']} {row['Zip Code']}", 
52        axis=1
53    )
54    return formatted_addresses.tolist()
55
56# Function to batch process addresses for geocoding.
57def batch_geocode_addresses(api_key: str, addresses: List[str], batch_size: int = 200) -> pd.DataFrame:
58    """
59    Batch processes a list of addresses for geocoding.
60
61    Args:
62        api_key (str): API key for the geocoding service.
63        addresses (List[str]): List of addresses to geocode.
64        batch_size (int): Number of addresses to process in each batch.
65    
66    Returns:
67        pd.DataFrame: DataFrame containing original addresses and expanded geocoded data.
68    """
69    batched_addresses = [addresses[i:i + batch_size] for i in range(0, len(addresses), batch_size)]
70    all_results = []
71
72    for batch in batched_addresses:
73        for address in batch:
74            result = geocode_address(api_key, address)
75            if result.status_code == 200:
76                data = result.json()
77                # Extracting data from the first match
78                if data['addresses']:
79                    first_match = data['addresses'][0]
80                    latitude = first_match['location']['representativePoint']['latitude']
81                    longitude = first_match['location']['representativePoint']['longitude']
82                    confidence_score = first_match['$metadata']['geocode']['confidence']['score']
83                    precision_code = first_match['$metadata']['geocode']['precisionCode']  # Extracting precision code
84                    all_results.append({
85                        "address": address, 
86                        "latitude": latitude, 
87                        "longitude": longitude, 
88                        "confidence_score": confidence_score,
89                        "precision_code": precision_code  # Adding precision code to the DataFrame
90                    })
91                else:
92                    all_results.append({
93                        "address": address, 
94                        "latitude": "No match", 
95                        "longitude": "No match", 
96                        "confidence_score": "No match",
97                        "precision_code": "No match"
98                    })
99            else:
100                all_results.append({
101                    "address": address, 
102                    "latitude": "Failed",
103                    "longitude": f"Status Code: {result.status_code}",
104                    "confidence_score": "Failed",
105                    "precision_code": "Failed"
106                })
107                print(f"Failed to geocode address '{address}', Status Code: {result.status_code}")
108
109    return pd.DataFrame(all_results)
110
111
112# Testing function for verifying the response status of the geocode_address function
113def test_geocode_address_response_status(lightbox_api_key: str) -> None:
114    # Test cases for different scenarios
115    # Each test case asserts the expected HTTP status code
116
117    # Test case for successful request (HTTP status code 200)
118    address = '25482 Buckwood Land Forest, Ca, 92630'
119    address_search_data = geocode_address(lightbox_api_key, address)
120    assert address_search_data.status_code == 200, f"Expected status code 200, but got {address_search_data.status_code}"
121
122    # Test case for request with empty address (HTTP status code 400)
123    address = ''
124    address_search_data = geocode_address(lightbox_api_key, address)
125    assert address_search_data.status_code == 400, f"Expected status code 400, but got {address_search_data.status_code}"
126
127    # Test case with invalid API key (HTTP status code 401)
128    address = '25482 Buckwood Land Forest, Ca, 92630'
129    address_search_data = geocode_address("Invalid-LightBox-Key", address)
130    assert address_search_data.status_code == 401, f"Expected status code 401, but got {address_search_data.status_code}"
131
132    # Test case with incomplete address (HTTP status code 404)
133    address = '25482 Buckwood Land Forest'
134    address_search_data = geocode_address(lightbox_api_key, address)
135    assert address_search_data.status_code == 404, f"Expected status code 404, but got {address_search_data.status_code}"
136
137# ----------------------------
138# API Usage
139# ----------------------------
140
141lightbox_api_key = 'your_api_key'  # Replace with the actual API key
142input_file_path = input('Enter your input file name: ')  # User inputs the file name
143output_file_path = input('Enter your output file name: ')  # User inputs the output file name
144
145# Reading and processing addresses
146print("Reading addresses from CSV file...")
147addresses = read_addresses_from_csv(input_file_path)
148print("Starting batch geocoding...")
149geocoded_data = batch_geocode_addresses(lightbox_api_key, addresses)
150print("Batch geocoding completed.")
151
152# Saving geocoded data to output file
153geocoded_data.to_csv(output_file_path, index=False)
154print(f"Geocoded data saved to '{output_file_path}'.")
155
156# ----------------------------
157# API Testing
158# ----------------------------
159
160print("Starting API tests...")
161test_geocode_address_response_status(lightbox_api_key)
162print("API tests completed.")
163

Sample Input CSV

View on Gist

input.csv

view raw
AddressCityStateZip CodeSource
1807 Farson StreetBelpreOH45714AnnexA
228595 Orchard Lake RoadFarmington HillsMI48334AnnexA
31600 State StreetSalemOR97301AnnexA
413657 West McDowell RoadGoodyearAZ85395AnnexA
565 International DriveGreenvilleSC29615AnnexA
6818 Forest LaneWaterfordWI53185AnnexA
78545 Common RoadWarrenMI48093AnnexA
8500 West Main StreetFreeholdNJ07728AnnexA
9757 45th StreetMunsterIN46321AnnexA
102631 Centennial BoulevardTallahasseeFL32308AnnexA
112712 Lawrenceville HighwayDecaturGA30033AnnexA
12900 East Division StreetWautomaWI54982AnnexA
13875 North Greenfield RoadGilbertAZ85234AnnexA
144282 East Rockton RoadRoscoeIL61073AnnexA
1520095 Gilbert RoadBig RapidsMI49307AnnexA
16805 Sir Thomas CourtHarrisburgPA17109AnnexA
17850 Johns Hopkins DriveGreenvilleNC27834AnnexA
18233 College AvenueLancasterPA17603AnnexA
192140 Fisher RoadMechanicsburgPA17055AnnexA