In this lab, we will explore the spatial weights functionality in GeoDa that is based on the notion of contiguity between polygons. We will use the U.S. Homicides sample data set that comes pre-installed with GeoDa. It contains values for homicides and several socio-economic determinants for the 3085 counties in the continental United States.
We will create contiguity-based weights (rook and queen), as well as higher order contiguity. We will examine their characteristics using the connectivity histogram and explore the connectivity map. We will close by taking another look at the Project File, which is a way to remember project settings (such as spatial weights) between sessions.
Technical details on spatial weights are contained Chapters 3 and 4 of Anselin and Rey (2014), although the software illustrations in that book are based on a GeoDa interface for an earlier version.
Construct contiguity-based spatial weights
Compute higher order contiguity weights
Identify the neighbors of selected observations
Store the weights information in a GeoDa Project File
Assess the characteristics of spatial weights
With GeoDa launched and all previous projects closed, we start a new project and load the U.S. Homicides sample data set from the Sample Data tab of the Connect to Data Source dialog (second data set from the top). This yields the familiar themeless base map of the U.S. counties.
In order to make sure that the weights files we are about to construct end up in the same directory as the data, we need to create a copy of the sample data and Save As a new file in a working directory. Select Save As and choose ESRI Shapefile as the file type to create a copy.
Close the project and load the new file in the usual fashion. In the example that follows, the file name is natregimes (with four matching files with extensions .shp, .shx, .dbf, and .prj). We are now ready to proceed.
We invoke the weights creation through the Weights Manager icon in the toolbar, or by selecting Tools > Weights Manager in the menu.
Clicking on the W icon brings up the Weights Manager dialog. At this point, it should be totally empty. By selecting the Create button, we can start constructing the weights.
The actual construction of the weights is implemented through the Weights File Creation interface. This organizes all the different options in one place.
The first item to specify is the ID Variable. This variable is a critical element to make sure that the weights are connected to the correct observations in the data table. In other words, the ID variable is a so-called key that links the data to the weights. In GeoDa, it is best to have the ID Variable be integer. In practice, this is often not the case. One way to deal with this is to use the Edit Variable Properties functionality in the table to turn a string into an integer, as we have seen earlier. However, sometimes there is no easy way to identify an ID variable. In that case, the Add ID Variable button provides the solution: the added ID variable is simply an integer sequence number that is added to the data table (as always, you must Save the data to make the addition permanent).
For the natregimes data set, we use fipsno as the ID variable. This is the county FIPS code turned into an integer value. Once the ID variable is entered, the various options for the weights become available. We can now proceed to the Contiguity Weight panel in the interface (the top panel) to create spatial weights.
We first consider Rook contiguity, i.e., when only common sides of the polygons are considered to define the neighbor relation (common vertices are ignored). With the Rook contiguity radio button checked, a click on Create will start the weights construction process. First, a file dialog appears in which a file name for the weights must be specified (the file extension GAL is added automatically). For example, we could use natregimes_r.gal. Since there are no real metadata in a spatial weights file, it is a good practice to make the file name something meaningful, so that you can remember what type of weight you just created. In our example, we added _r to the name of the data set to suggest rook weights. However, as we will see below, if a Project File is saved, some of the characteristics of the weights are remembered.
After entering a file name and clicking on OK, the weights are computed and written to the file. At the end of this operation, a success message will appear (or an Error message if something went wrong).
A useful option in the weights file creation dialog is the specification of a Precision threshold (highlighted in the figure). In most cases, this is not needed, but in some instances the precision of the underlying shape file in insufficient to allow for an exact match of coordinates (to determine which polygons are neighbors). When this happens, GeoDa suggests a default error band to allow for a fuzzy comparison. For example, this would be needed to create contiguity weights in the NYC sample data set.
After the weights are created, the corresponding file name is listed in the weights manager under the Weights Name. In addition, several summary properties are given, including the type (rook), whether the weights are inherently symmetric or not (symmetric), the full file name (natregimes_r.gal), the id variable (fipsno) and the order of contiguity (1).
The GAL weights file is a simple text file that contains, for each observation, the number of neighbors and their identifiers. The format was suggested in the 1980s by the Geometric Algorithms Lab at Nottingham University and achieved widespread use after its inclusion in SpaceStat (Anselin 1992), and subsequent adoption by the R spdep package.
The one innovation SpaceStat added was the inclusion of a header line, with some metadata for the weights, such as the number of observations, the name of the shape file from which the weights were derived, and the name of the ID variable. For each observation, the number of neighbors is listed after its ID (e.g., for county 27077, there are 3 neighbors), followed by the IDs of the neighbors (27135 27071 27007).
Since the GAL file is a simple text file, it can easily be edited (e.g., to add or remove neighbors), although this is not recommended: it is easy to break the inherent symmetry of the contiguity weights.
We proceed in the same fashion to construct queen contiguity weights. The difference between the rook and queen criterion to determine neighbors is that the latter also includes common vertices. This makes the greatest difference for regular grids (square polygons), where the rook criterion will result in four neighbors (except for edge cases) and the queen criterion will yield eight. For irregular polygons (like most areal units encountered in practice), the differences will be slight. In order to deal with potential inaccuracies in the polygon file (such as rounding errors), using the queen criterion is recommended in practice.
After checking the Queen contiguity radio button, clicking on Create and entering a file name (e.g., natregimes_q.gal), the new weights will be saved to the weights file. At this point, the file name for the queen weights is included among the weights names listed in the weights manager.
In the dialog, the highlighted weights are the active ones. This will determine what weights are used in any analysis, but also drives the properties that are listed in the dialog and what is generated by the Connectivity Histogram and the Connectivity Map (discussed below). In the example, the summary properties for the queen weights are shown, since that is the highlighted weights name.
The Weights Manager can also be used to Load weights files that are already available on disk. To start with a clean slate, we first Remove the two weights currently in the list (highlight the file name and click on Remove). Next, we select the Load button (center top) and specify the name of the weights file. However, unlike the weights created on the fly in the current session, only limited items will be contained in the properties list, since there are no metadata for the weights files.
In our example, GeoDa has no way of knowing whether the loaded file represents queen or rook contiguity (given as custom in the properties list), whether it is symmetric (unknown), or the order of contiguity (not listed). Therefore, it is highly recommended to use a Project File to store the weights characteristics.
Some useful characteristics of the currently selected weights are provided in the Weights Manager by means of the Connectivity Histogram and the Connectivity Map. In order to use the Project File later on, we first quickly re-create the rook and queen contiguity from scratch (remove any currenly listed weights that were loaded, and either use new file names, or overwrite the current files when creating the weights).
The Histogram button produces a connectivity histogram that shows the number of observations for each value of the cardinality of neighbors (i.e., how many observations have the given number of neighbors). The graph is a standard GeoDa histogram, with a number of options available, some of which are generic, and some specific to the connectivity histogram.
The histogram for the queen contiguity associated with the U.S. counties is given below.
The overall pattern is quite symmetric, with a mode of 6 (i.e., most counties have 6 neighbors). In addition to the visual inspection, the usual statistics of the distribution can be added to the bottom of the table by means of the View > Display Statistics option.
From the descriptive statistics listed at the bottom of the graph, we can see that the median number of neighbors is 6, the average is 5.89, and the maximum is 14. In addition, the number of observations in each interval is listed as well.
In standard GeoDa fashion, the connectivity histogram is connected to all the other views through linking and brushing. For example, by selecting the modal bar, all 1037 counties with six neighbors are highlighted in the U.S. county map.
It is good practice to check the connectivity histogram for any “strange” patterns, such as observations with only one neighbor and neighborless observations (isolates). The latter will be covered in the discussion of distance-based weights.
Ideally, we like the distribution of the cardinalities to be nice and symmetric, and are on the lookout for bimodal distributions (some observations have few neighbors and some many) and other deviations from symmetry.
An interesting feature of GeoDa that becomes available as soon as a weights file has been created or loaded is the selection of the neighbors of a selection. We illustrate this with a special characteristic of several counties in the state of Virginia.
When selecting the first (left-most) bar in the histogram, i.e., the observations with only one neighbor, we find that several of the 24 selected counties are in the state of Virginia. This become clear after we add a base layer (select the Base Map icon in the map toolbar and choose Nokia Day) and zoom in on the state of Virginia. We find 10 of the counties in question in this state.
Upon closer examination (or, using prior knowledge), we can see that these counties are actually cities within a surrounding county, which results in them having only a single neighbor (the surrounding county).
We can illustrate this further by utilizing a feature of the table Selection Tool. This tool is activated in the usual way by right clicking anywhere in the table. The Add Neighbors to Selection button is central in the middle panel. Note that the Weights must be specified. By default, the drop-down list will show the currently active weights, i.e., natregimes_q in our example.
Alternatively, this option can also be selected from any map (with an active current selection), using Selection and Neighbors from the option menu (right click on the map).2 Here again, the option will only work if there is a currently active spatial weights file. If not, a warning will be generated.
Either option will take the currently selected observations, i.e., in the example, this would be the selected city-counties in Virginia, and add their neighbors (i.e., the surrounding counties) to the selection. As shown below, the larger counties have a small polygon within them corresponding to the city-counties.
A second useful option of the connectivity histogram is to save the neighbor cardinality to the data table as an additional column/variable to be used in further analysis. The Save Connectivity to Table option is invoked in the usual way by right clicking on the view.
This brings up a small dialog in which the name for the new variable can be specified (the default is a generic NUM_NBRS, which may not be the most insightful when different spatial weights are being compared).
Upon clicking OK, an additional column is added to the data table with the number of neighbors for that observation (and that particular spatial weights specification). The number of neighbors is an important input into the calculation of the significance of the local join count statistic, covered in a later chapter.
The button on the right at the bottom of the Weights Manager interface brings up a Connectivity Map. This is a standard GeoDa map view (with all the usual map features invoked through toolbar icons, including zooming and panning), but with a special functionality that highlights the neighbors of any selected observation. In our example, this starts with a map of all the U.S. counties.
As soon as the pointer is moved over one of the observations, its neighbors are highlighted. For example, we can zoom in on Virginia again and point at one of the counties. This shows its neighbors on the map (including the city county enclosed by it as the tiny point within the polygon), and lists their ID values in the status bar. In this instance, the county with FIPS code 51003 (Albemarle, VA) has 9 neighbors using the queen contiguity criterion.
Higher order contiguity weights are constructed in the same manner as the first order weights we just covered, but by specifying a value larger than 1 (which is the default) in the Order of contiguity drop down list.
As before, the weights are saved to a file after selecting the Create button and specifying a file name.
One important aspect of higher order contiguity weights is whether or not the lower order neighbors are included in the weights file. This is determined by a check box (highlighted in the figure above). Conceptually, there is quite a difference between the two concepts. The pure higher order contiguity does not include any lower order neighbors (for example, an encompassing notion of second order neighbors would include the first order neighbors as well, since there are two steps – back and forth – connecting each observation to its first order neighbor). This is the notion appropriate for use in a statistical analysis of spatial autocorrelation for different spatial lag orders. However, in practice, it is often useful to use increasing orders of contiguity as similar to increasing distance bands, i.e., consisting of both first and second order neighbors.
We proceed and create two forms of second order queen contiguity weights, one without the box checked (pure second order contiguity), and one with the box checked (inclusive second order contiguity). Both weights should now be listed in the weights manager.
The difference between the two concepts is easily illustrated by means of the connectivity map. For example, focusing on Solano county, CA (in the Bay Area, FIPS code 6095), we first consider the pure second order contiguity (weights file natregimes_q2 in the weights manager should be selected). Using this definition, the county has 9 second order neighbors. In particular, notice the band of unselected first order neighbors right next to the county with the pointer above it.
Compare this to the inclusive second order contiguity (weights file natregimes_q2inc), where there are now 13 neighbors. Also, there are no gaps in the neighbor structure.
Upon closing the current project, the information on the characteristics of the spatial weights is lost. When you later reload the same polygon layer (e.g., natregimes), you have to reload each weights file using the Load button in the weights manager. As we have seen, the summary properties listed for those loaded weights are not very informative.
A much superior alternative is to create and save a project file. As we saw in the discussion of the custom category editor, this file contains information about the project, such as the variables contained in it (especially important when space-time variables are created), as well as the spatial weights.
The project information is kept in memory. As we saw earlier, it is saved by means of the Save Project item in the File menu. This will prompt for a file name and save the project file with a file extension gda.
The project file itself is an editable XML text file. It contains all the information on the source file, variable names, any transformations carried out and any space-time (grouped) variables created. It also keeps all the characteristics of the spatial weights. As we see below, the four weights created so far are included, with the properties as they were listed in the weights manager.
Once the project gda file has been saved, a new project should be started by loading the project file instead of opening a shape file (or other geographical layer). This will automatically load all the spatial weights contained in the project file and list their properties in the weights manager. Also, all transformations and space-time variables that were constructed earlier will be recreated. In addition, different project files can be created for the same data set, each with their own set of weights, space-time variables, etc.
Anselin, Luc. 1992. SpaceStat. National Center for Geographic Information and Analysis, University of California, Santa Barbara, CA.
Anselin, Luc, and Sergio Rey. 2014. Modern Spatial Econometrics in Practice. GeoDa Press, LLC, Chicago, IL.