Batch Geocoding Sample
This sample demonstrates batch geocoding addresses using the LightBox API. It includes a Jupyter Notebook walkthrough, a runnable Python script, and sample input data.
Jupyter Notebook
View RawLightBox API - Purpose of Search Endpoint
Return an address based on the full text string 'text,' where each result includes a representative point for the address and references to related parcels. If there is no exact match, this will return the best possible match. The '$ref' value within each 'parcels' object for a specific address can be used to get information about the parcel.
Batch Geocoding Addresses with Search
This batch geocoder processes addresses in batches sequentially, which means it sends one request at a time to the LightBox API per address within a batch. This approach is straightforward but does not leverage concurrent processing techniques such as multi-threading or asynchronous requests.
This notebook provides a step-by-step guide to batch geocoding addresses using the LightBox API. The process involves several key steps, each detailed in subsequent sections:
-
Setup
- Importing necessary Python libraries.
- Defining global configurations and API keys.
-
Function Definitions
geocode_address
: Function to geocode a single address.read_addresses_from_csv
: Function to read and format addresses from a CSV file.batch_geocode_addresses
: Function to process addresses in batches and geocode them.
-
API Key
- Enter your API Key for Authorization.
-
Reading Input Data
- Reading and formatting addresses from a user-specified CSV file.
-
Batch Geocoding Process
- Executing the batch geocoding process using the defined functions.
- Handling different scenarios and errors during the geocoding process.
-
Saving Results
- Saving the geocoded data to a CSV file.
- Format and content of the output data.
Additional Materials: LightBox Developer Portal
1. Import the required python packages
1import requests
2import pandas as pd
3from typing import Dict, List
2. Import function definitions
1# Function to geocode a single address using the LightBox API.
2def geocode_address(lightbox_api_key: str, address: str) -> Dict:
3 """
4 Geocodes the provided address using the LightBox API.
5
6 Args:
7 lightbox_api_key (str): The API key for accessing the LightBox API.
8 address (str): The address string for matching.
9
10 Returns:
11 dict: The geocoded address information in JSON format.
12 """
13 # API endpoint configuration
14 BASE_URL = "https://api.lightboxre.com/v1"
15 ENDPOINT = "/addresses/search"
16 URL = BASE_URL + ENDPOINT
17
18 # Setting up request parameters and headers
19 params = {'text': address}
20 headers = {'x-api-key': lightbox_api_key}
21
22 # Sending request to the LightBox API
23 response = requests.get(URL, params=params, headers=headers)
24
25 return response
26
27# Function to read addresses from a CSV file and format them.
28def read_addresses_from_csv(file_path: str) -> List[str]:
29 """
30 Reads addresses from a CSV file and formats them into 'Address, City State Zip Code'.
31
32 Args:
33 file_path (str): Path to the CSV file.
34
35 Returns:
36 List[str]: A list of formatted address strings.
37 """
38 df = pd.read_csv(file_path)
39
40 # Concatenating address components into a single address string per row
41 formatted_addresses = df.apply(
42 lambda row: f"{row['Address']}, {row['City']} {row['State']} {row['Zip Code']}",
43 axis=1
44 )
45 return formatted_addresses.tolist()
46
47# Function to batch process addresses for geocoding.
48def batch_geocode_addresses(api_key: str, addresses: List[str], batch_size: int = 200) -> pd.DataFrame:
49 """
50 Batch processes a list of addresses for geocoding.
51
52 Args:
53 api_key (str): API key for the geocoding service.
54 addresses (List[str]): List of addresses to geocode.
55 batch_size (int): Number of addresses to process in each batch.
56
57 Returns:
58 pd.DataFrame: DataFrame containing original addresses and expanded geocoded data.
59 """
60 batched_addresses = [addresses[i:i + batch_size] for i in range(0, len(addresses), batch_size)]
61 all_results = []
62
63 for batch in batched_addresses:
64 for address in batch:
65 result = geocode_address(api_key, address)
66 if result.status_code == 200:
67 data = result.json()
68 # Extracting data from the first match
69 if data['addresses']:
70 first_match = data['addresses'][0]
71 latitude = first_match['location']['representativePoint']['latitude']
72 longitude = first_match['location']['representativePoint']['longitude']
73 confidence_score = first_match['$metadata']['geocode']['confidence']['score']
74 precision_code = first_match['$metadata']['geocode']['precisionCode'] # Extracting precision code
75 all_results.append({
76 "address": address,
77 "latitude": latitude,
78 "longitude": longitude,
79 "confidence_score": confidence_score,
80 "precision_code": precision_code # Adding precision code to the DataFrame
81 })
82 else:
83 all_results.append({
84 "address": address,
85 "latitude": "No match",
86 "longitude": "No match",
87 "confidence_score": "No match",
88 "precision_code": "No match"
89 })
90 else:
91 all_results.append({
92 "address": address,
93 "latitude": "Failed",
94 "longitude": f"Status Code: {result.status_code}",
95 "confidence_score": "Failed",
96 "precision_code": "Failed"
97 })
98 print(f"Failed to geocode address '{address}', Status Code: {result.status_code}")
99
100 return pd.DataFrame(all_results)
3. Create variable(s) that will be used to authenticate your calls.
Get your key from the LightBox Developer Portal.
1lightbox_api_key = '<YOUR_API_KEY>'
4. Reading input data.
- The user specifies the location and name of the input file of addresses.
- Assuming the file is in the root folder, a user would input input_file_name.csv
- This script assumes that the input csv file has data with the headers 'Address', 'City', 'State' and 'Zip Code'.
- The user specifies the location and name of the output file for csv data.
- Assuming the file is in the root folder, a user would input output_file_name.csv
1input_file_path = 'input.csv' # User inputs the file name
2output_file_path = 'output.csv' # User inputs the output file name
3
4# Reading and processing addresses
5print("Reading addresses from CSV file...")
6addresses = read_addresses_from_csv(input_file_path)
Reading addresses from CSV file...
5. Batch Geocoding Process
1print("Starting batch geocoding...")
2geocoded_data = batch_geocode_addresses(lightbox_api_key, addresses)
3print("Batch geocoding completed.")
Starting batch geocoding...
Batch geocoding completed.
6. Saving Results
1# Saving geocoded data to output file
2geocoded_data.to_csv(output_file_path, index=False)
3print(f"Geocoded data saved to '{output_file_path}'.")
Geocoded data saved to 'output4.csv'.
Python Script
View Raw1import requests
2import pandas as pd
3from typing import Dict, List
4
5
6# ----------------------------
7# Function Definitions
8# ----------------------------
9
10# Function to geocode a single address using the LightBox API.
11def geocode_address(lightbox_api_key: str, address: str) -> Dict:
12 """
13 Geocodes the provided address using the LightBox API.
14
15 Args:
16 lightbox_api_key (str): The API key for accessing the LightBox API.
17 address (str): The address string for matching.
18
19 Returns:
20 dict: The geocoded address information in JSON format.
21 """
22 # API endpoint configuration
23 BASE_URL = "https://api.lightboxre.com/v1"
24 ENDPOINT = "/addresses/search"
25 URL = BASE_URL + ENDPOINT
26
27 # Setting up request parameters and headers
28 params = {'text': address}
29 headers = {'x-api-key': lightbox_api_key}
30
31 # Sending request to the LightBox API
32 response = requests.get(URL, params=params, headers=headers)
33
34 return response
35
36# Function to read addresses from a CSV file and format them.
37def read_addresses_from_csv(file_path: str) -> List[str]:
38 """
39 Reads addresses from a CSV file and formats them into 'Address, City State Zip Code'.
40
41 Args:
42 file_path (str): Path to the CSV file.
43
44 Returns:
45 List[str]: A list of formatted address strings.
46 """
47 df = pd.read_csv(file_path)
48
49 # Concatenating address components into a single address string per row
50 formatted_addresses = df.apply(
51 lambda row: f"{row['Address']}, {row['City']} {row['State']} {row['Zip Code']}",
52 axis=1
53 )
54 return formatted_addresses.tolist()
55
56# Function to batch process addresses for geocoding.
57def batch_geocode_addresses(api_key: str, addresses: List[str], batch_size: int = 200) -> pd.DataFrame:
58 """
59 Batch processes a list of addresses for geocoding.
60
61 Args:
62 api_key (str): API key for the geocoding service.
63 addresses (List[str]): List of addresses to geocode.
64 batch_size (int): Number of addresses to process in each batch.
65
66 Returns:
67 pd.DataFrame: DataFrame containing original addresses and expanded geocoded data.
68 """
69 batched_addresses = [addresses[i:i + batch_size] for i in range(0, len(addresses), batch_size)]
70 all_results = []
71
72 for batch in batched_addresses:
73 for address in batch:
74 result = geocode_address(api_key, address)
75 if result.status_code == 200:
76 data = result.json()
77 # Extracting data from the first match
78 if data['addresses']:
79 first_match = data['addresses'][0]
80 latitude = first_match['location']['representativePoint']['latitude']
81 longitude = first_match['location']['representativePoint']['longitude']
82 confidence_score = first_match['$metadata']['geocode']['confidence']['score']
83 precision_code = first_match['$metadata']['geocode']['precisionCode'] # Extracting precision code
84 all_results.append({
85 "address": address,
86 "latitude": latitude,
87 "longitude": longitude,
88 "confidence_score": confidence_score,
89 "precision_code": precision_code # Adding precision code to the DataFrame
90 })
91 else:
92 all_results.append({
93 "address": address,
94 "latitude": "No match",
95 "longitude": "No match",
96 "confidence_score": "No match",
97 "precision_code": "No match"
98 })
99 else:
100 all_results.append({
101 "address": address,
102 "latitude": "Failed",
103 "longitude": f"Status Code: {result.status_code}",
104 "confidence_score": "Failed",
105 "precision_code": "Failed"
106 })
107 print(f"Failed to geocode address '{address}', Status Code: {result.status_code}")
108
109 return pd.DataFrame(all_results)
110
111
112# Testing function for verifying the response status of the geocode_address function
113def test_geocode_address_response_status(lightbox_api_key: str) -> None:
114 # Test cases for different scenarios
115 # Each test case asserts the expected HTTP status code
116
117 # Test case for successful request (HTTP status code 200)
118 address = '25482 Buckwood Land Forest, Ca, 92630'
119 address_search_data = geocode_address(lightbox_api_key, address)
120 assert address_search_data.status_code == 200, f"Expected status code 200, but got {address_search_data.status_code}"
121
122 # Test case for request with empty address (HTTP status code 400)
123 address = ''
124 address_search_data = geocode_address(lightbox_api_key, address)
125 assert address_search_data.status_code == 400, f"Expected status code 400, but got {address_search_data.status_code}"
126
127 # Test case with invalid API key (HTTP status code 401)
128 address = '25482 Buckwood Land Forest, Ca, 92630'
129 address_search_data = geocode_address("Invalid-LightBox-Key", address)
130 assert address_search_data.status_code == 401, f"Expected status code 401, but got {address_search_data.status_code}"
131
132 # Test case with incomplete address (HTTP status code 404)
133 address = '25482 Buckwood Land Forest'
134 address_search_data = geocode_address(lightbox_api_key, address)
135 assert address_search_data.status_code == 404, f"Expected status code 404, but got {address_search_data.status_code}"
136
137# ----------------------------
138# API Usage
139# ----------------------------
140
141lightbox_api_key = 'your_api_key' # Replace with the actual API key
142input_file_path = input('Enter your input file name: ') # User inputs the file name
143output_file_path = input('Enter your output file name: ') # User inputs the output file name
144
145# Reading and processing addresses
146print("Reading addresses from CSV file...")
147addresses = read_addresses_from_csv(input_file_path)
148print("Starting batch geocoding...")
149geocoded_data = batch_geocode_addresses(lightbox_api_key, addresses)
150print("Batch geocoding completed.")
151
152# Saving geocoded data to output file
153geocoded_data.to_csv(output_file_path, index=False)
154print(f"Geocoded data saved to '{output_file_path}'.")
155
156# ----------------------------
157# API Testing
158# ----------------------------
159
160print("Starting API tests...")
161test_geocode_address_response_status(lightbox_api_key)
162print("API tests completed.")
163
Sample Input CSV
View on Gistinput.csv
view rawAddress | City | State | Zip Code | Source | |
---|---|---|---|---|---|
1 | 807 Farson Street | Belpre | OH | 45714 | AnnexA |
2 | 28595 Orchard Lake Road | Farmington Hills | MI | 48334 | AnnexA |
3 | 1600 State Street | Salem | OR | 97301 | AnnexA |
4 | 13657 West McDowell Road | Goodyear | AZ | 85395 | AnnexA |
5 | 65 International Drive | Greenville | SC | 29615 | AnnexA |
6 | 818 Forest Lane | Waterford | WI | 53185 | AnnexA |
7 | 8545 Common Road | Warren | MI | 48093 | AnnexA |
8 | 500 West Main Street | Freehold | NJ | 07728 | AnnexA |
9 | 757 45th Street | Munster | IN | 46321 | AnnexA |
10 | 2631 Centennial Boulevard | Tallahassee | FL | 32308 | AnnexA |
11 | 2712 Lawrenceville Highway | Decatur | GA | 30033 | AnnexA |
12 | 900 East Division Street | Wautoma | WI | 54982 | AnnexA |
13 | 875 North Greenfield Road | Gilbert | AZ | 85234 | AnnexA |
14 | 4282 East Rockton Road | Roscoe | IL | 61073 | AnnexA |
15 | 20095 Gilbert Road | Big Rapids | MI | 49307 | AnnexA |
16 | 805 Sir Thomas Court | Harrisburg | PA | 17109 | AnnexA |
17 | 850 Johns Hopkins Drive | Greenville | NC | 27834 | AnnexA |
18 | 233 College Avenue | Lancaster | PA | 17603 | AnnexA |
19 | 2140 Fisher Road | Mechanicsburg | PA | 17055 | AnnexA |