ESRI Geocoder Plugin

This plugin is actually for GeoKettle, which is the geospatial version of Pentaho Data Integration. You can get it from www.geokettle.org. As of this writing, it is based on the latest version of PDI, v3.2.

This plugin will allow you to geocode addresses using ESRI's software. You can geocode using two different interfaces:

  1. SOAP interface
  2. ESRI Desktop Arc Editor license

The SOAP interface does not require you to have ESRI software installed on your desktop. It will call a web service to do the geocoding. The other interface requires ESRI software to be installed on the computer from which you are running the geocoder, and you must have arcobjects.jar (from version 9.3 of ESRI).

The reason there are two choices has to do with the way ESRI licenses their commercial geocoding. If you purchase geocodes for use with the desktop software, you cannot register to also use the SOAP (or REST, or any other web service) interface. If you purchase the web service options, you cannot geocode using the desktop software.

Also, please note that this plugin will allow you to use the free versions of ESRI's online geocoding services. These services are subject to certain licensing restrictions, so you should read the terms and conditions for using these services on ESRI's website before you use them! I am in no way responsible for your use of this plugin if you break ESRI's licensing agreements.

I will warn you that this component is somewhat slow. There are several reasons for this. One is that ESRI's version 9.3 desktop java interface does not currently support batch mode geoprocessing. So each address must be sent one at a time to the server. The free SOAP interface also does not support batching up more than 10 addresses at a time, and I do not have access to the commercial SOAP interface to test larger batches, so, 10 addresses per webservice call is the limit. In my testing, each interface generally gets about 9 records per second throughput. When using the ArcGIS desktop software to geocode directly, the speed is more like 25 records per second. Perhaps future ESRI versions will add support for batching.

Another item to note is that there is a memory leak somewhere when using the desktop interface. I am pretty sure that it is not within the component as I have used the Heap Analysis Tool to analyze a running Geokettle process, and evidence points to arcobjects as the culprit, for which there is nothing I can do about that. Arcobjects makes a lot of native calls, and I think some memory is not being released. So be advised that you may have a process crash when geocoding lots of records (> 100,000) using the desktop interface.

Download

Installation

Close Geokettle if it is open.

Download the zip file, and unpack it into the ${geokettlehome}/plugins/steps folder. You can name the folder whatever you want, but I recommend naming it ESRIMultiGeocoderPlugin. If you plan to use the Desktop interface, then you must copy arcobjects.jar from your ESRI installation into the plugin directory. This should be located in Program Files/ArcGIS/java/lib/. Be sure to copy, not move!

Reopen Geokettle. You will see the ESRI Multi Geocoder Plugin in the Geospatial folder of available transformation steps.

Documentation

Plugin Configuration

  • Step name - name your step, must be unique within the transformation
  • Use SOAP Interface/Use Desktop Interface - choose which interface you want to use. depending on which you choose, you will have different input options
    • SOAP Interface
      • ESRI SOAP Url - supply the URL to the SOAP service that will do your geocoding. The two defaults are the publicly available ones from ESRI. Keep in mind you could use your own URL here if you have ESRI Server with a published geocoding service. If you use the premium ESRI task, you must append the token you received from ESRI to the end of the URL.
    • Desktop Interface
      • ESRI Desktop Catalog URL - supply the URL to the ESRI server catalog that performs your geocoding. The two defaults are the publicly available ones from ESRI. Keep in mind you could use your own URL here if you have ESRI Server with a published geocoding service.
      • Username - If you are using a URL that requires authentication, then supply your username here.
      • Password - If you are using a URL that requires authentication, then supply your password here.
      • Retrieve Available Locators button - after selecting your Desktop Catalog URL and setting username/password (if required), click this button to get a list of Address Locators that the Catalog URL supports.
      • Address Locator - Choose the locator you wish to use. Please note that at this time the plugin only supports North American address locators. Other locators will be supported in a future release.
  • Address, City, State, Zip, Zip4, Country - Set these to the fields in your data that contain this information. Note that none of the fields are required, so you can, for instance, geocode just a list of zips, zip4s, city/states, etc. Of course, not providing any information will results in no geocode matches.

Result Fields

Not all of these fields will be available, depending on the interface (SOAP or Desktop) and the address locator you choose. See ESRI's online documentation for the locator you are using for more information.

  • loc_name - the name of the locator that was used to geocode the address. Will be null if no match was made.
  • status - U - Unmatched, T - Tied, M - Matched
  • score - percentage based score of how confident ESRI is of the geocode
  • side - Side of the street the address is on, L - Left, or R - Right
  • longitude - the longitude
  • latitude - the latitude
  • match_addr - the address that was matched to in ESRI's database.
  • zip4_type - the type of centroid identified by the US_Zip4 locator. Type 1 indicates a Zip+4 centroid (more accurate); Type 2 indicates a Zip+2 centroid (less accurate).
  • url - the url used to do the geocoding. This is a copy of the ESRI Desktop Catalog URL or ESRI SOAP URL entered in the plugin configuration.
  • point - this is a geokettle spatial object representing the latitude/longitude location of the address. You can use this field in geokettle's spatial filtering functions.