Basic Mapping


Luc Anselin1

10/18/2017 (updated)


http://geodacenter.github.io/workbook/3a_mapping/lab3a.html

Introduction

In this lab, we will explore a range of mapping and geovisualization options. We start with a review of common thematic map classifications and the way these are implemented in GeoDa. We next focus on different statistical maps, in particular maps that are designed to highight extreme values or outliers. We also illustrate maps for categorical variables (unique value maps), and their extension to multiple categories in the form of co-location maps. We close with a review of some special approaches to geovisualization, i.e., conditional maps, the cartogram and map animation.

Even though there is substantial mapping functionality in GeoDa, it is worth noting that it is not a cartographic software. The main objective is to use the maps as part of a framework of dynamic graphics, to interact with the data as part of the exploration process. As a result, maps in GeoDa do not have some standard cartographic features, such as a directional arrow, or a scale bar. However, any map can be saved as an image file for further manipulation in specialized graphics software.

For this exercise, we will again be using the data set with demographic and socio-economic information for 55 New York City sub-boroughs that comes built-in with GeoDa.

Objectives

After completing this lab, you should be familiar with the following operations and analyses:

  • Create commonly used thematic and statistical maps

  • Manipulate the map by zooming, panning, and selection

  • Save the map as an image

  • Save map classifications as a categorical variable

  • Display and/or save the mean centers and centroids of polygons

  • Create custom intervals

  • Identify outliers using the box map and standard deviation map

  • Create a map for a categorical variable

  • Examine multivariate co-location patterns with a co-location map

  • Assess variable interaction effects using a conditional map

  • Construct and interpret a cartogram

  • Visually explore patterns using map animation

  • Use the project file in GeoDa

GeoDa functions covered

  • Map > Quantile Map
    • select number of categories
  • Map > Natural Breaks Map
    • select number of categories
  • Map > Equal Intervals Map
    • select number of categories
  • Map toolbar options
  • Map selection
    • select all observations in a legend category
  • Map options
    • Change current map type
    • Save Categories
    • Save the map as an image
    • Extract mean centers and/or centroids
    • Changing the look of a map
    • Making map outlines invisible
  • Map > Percentile Map
  • Map > Box Map
    • Set hinges as 1.5 or 3.0
  • Map > Standard Deviation Map
  • Map > Unique Values Map
  • Map > Co-location Map
  • Map > Create New Custom
    • Category Editor
    • applying a custom classification to a map
    • applying a custom classification to a histogram
  • File > Save Project
    • saving a custom classification in the project file
  • Map > Conditional Map
    • adjusting the breaks for the conditioning variables
  • Map > Cartogram
    • improving the cartogram fit
  • Map > Map Movie
    • setting animation controls


Thematic Maps – Overview

We start GeoDa, or, if continuing from a previous exercise, close the project and load the NYC sub-borough data set from the Sample Data tab in the Connect to Data Source dialog. Once click on the NYC Data icon brings up the by now familiar green themeless map. To see the different mapping options, we select Map from the menu, or click on the Maps and Rates icon in the toolbar.

Map Toolbar
            icon

Map Toolbar icon

The selection brings up the list of map types that can be created in GeoDa.

Map
            type options

Map type options

We ignore the Rates-Calculated Maps item for now, which will be considered separately. Among the other options, there are three where an additional choice is required (the right pointing arrow) consisting of the number of intervals to be specified (choose the number from a drop down list). All the other options have pre-specified settings.

Common map classifications include the Quantile Map, Natural Breaks Map, and Equal Intervals Map. Specialized classifications that are designed to bring out extreme values include the Percentile Map, Box Map (with two options for the hinge), and the Standard Deviation Map. The Unique Values Map does not involve a classification algorithm, since it uses the integer values of a categorical variable itself as the map categories. The Co-location Map is an extension of this principle to multiple categorical variables. Finally, Custom Breaks allows for the use of customized classifications by means of the Category Editor.

We consider each in turn.

Common map classifications

Quantile map

We already saw an example of a quintile map (5 categories) in the exercise where we mapped the number of abandoned vehicles by community area in Chicago. As a quick review, select Quantile Map > 4 to create a quartile map (four categories).

Quartile map

Quartile map

In the variable selection dialog that follows, we select rent2008 for the median rent in 2008.

Variable
                selection dialoghttp://geodacenter.github.io/workbook/3a_ma

Variable selection dialog

This brings up the quartile map, with four categories in the legend, each matching a quartile in the data. The values in parentheses give the number of observations in each category.

Quartile map for median rent in 2008

Quartile map for median rent in 2008

Upon closer examination, something doesn’t seem to be quite right. With 55 total observations, we should expect roughly 14 (55/4 = 13.75) observations in each group. But the first group only has 7, and the third group has 19! This illustrates a common problem with quantile maps whenever ties are present. If we open up the Table (click on the icon if it is not open) and sort on the variable rent2008 (click on the field name), we see where the problem lies.

Sorted median rent in 2008

Sorted median rent in 2008

Ignoring the zero entries for now (those are a potential problem in their own right), we see that observations starting in row 8 up to row 21 all have a value of 1000. The cut-off for the first quartile is at 14, highlighted in yellow in the graph. In a non-spatial analysis, this is not an issue, the first quartile value is given as 1000. But in a map, the observations are locations that need to be assigned to a group (with a separate color). Other than an arbitrary assignment, there is no way to classify observations with a rent of 1000 in either category 1 or category 2. To deal with these ties, GeoDa moves all the observations with a value of 1000 to the second category. As a result, even though the value of the first quartile is given as 1000 in the map legend, only those observations with rents less than 1000 are included in the first quartile category. As we see from the table, there are 7 such observations.

Any time there are ties in the ranking of observations that align with the values for the breakpoints, the classification in a quantile map will be problematic and result in categories with an unequal number of observations.

Natural breaks map

A natural breaks map uses a nonlinear algorithm to group observations such that the within-group homogeneity is maximized, following the pathbreaking work of Fisher (1958) and Jenks (1977). In essence, this is a clustering algorithm in one dimension to determine the break points that yield groups with the largest internal similarity.

To create such a map with four categories, we select Natural Breaks Map > 4 from the list of options and again choose rent2008 as the variable, in the same way as for the quantile map. This yields a natural breaks map.

Natural breaks map for median rent in 2008

Natural breaks map for median rent in 2008

In comparison to the quartile map, the natural breaks criterion is better at grouping the extreme observations. The three observations with zero values make up the first category, whereas the five high rent areas in Manhattan make up the top category. Note also that in contrast to the quantile maps, the number of observations in each category can be highly unequal.

Equal intervals map

An equal intervals map uses the same principle as a histogram to organize the observations into categories that divide the range of the variable into equal bins. Using Equal Intervals Map > 4 and rent2008 for the variable provides the following result.

Equal intervals map for median rent in 2008

Equal intervals map for median rent in 2008

As in the case of natural breaks, the equal interval approach can yield categories with highly unequal numbers of observations. In our example, the three zero observations again get grouped in the first category, but the second range (from 725 to 1450) contains the bulk of the spatial units (42).

To illustrate the similarity with the histogram, we select Explore > Histogram for rent2008 and set the Choose Intervals option to 4. Also, set View > Display Statistics. The resulting histogram has the exact same ranges for each interval as the equal intervals map. The number of observations in each category is the same as well (see the values in the descriptive statistics below the graph).

Histogram (4) for median rent in 2008

Histogram (4) for median rent in 2008

Finally, we illustrate the equivalence between the two graphs by selecting a category in the map legend. We click on the dark brown rectangle in the map legend associated with the highest category, which selects the corresponding observations in both the map and the histogram. As expected, the selected bins match completely.

Histogram-Equal Intervals map equivalence

Histogram-Equal Intervals map equivalence

Map options

Once a map is created, there are two types of options available. One set pertains directly to the manipulation of the map and is implemented by means of the icons on the map toolbar. We have already seen some of its features in an earlier exercise, but revisit it in more detail here. The second set of options is triggered by right clicking on the map window, which invokes the options dialog.

Map
              options

Map options

The top item (Change Current Map Type) brings up the same list with map types as invoked by clicking on the Map icon on the main toolbar, with one additional entry for a custom classification (Create New Custom). Selecting a different map type from a current map in this fashion precludes the need to choose a variable, but it also overwrites the current map. We consider the various features in further detail below, except for the Rates and Save Rates, and Selection and Neighbors, which are covered in a separate chapter.

The map toolbar

The map toolbar contains eight icons that facilitate selection and viewing of the map.

Map
                toolbar

Map toolbar

We have already covered the Select (left most) icon in a previous exercise, as well as the Base Map icon (second from right). The right most icon is simply to Refresh the image, in case something went wrong.

The other icons, from left to right, allow the following actions:

  • Invert Select, switches to the complement of the current selection (same functionality as Table > Invert Selection)

  • Zoom In, zooms in on the map by drawing a rectangle for the new map extent

  • Zoom Out, zooms out of the map by repeatedly clicking on the map

  • Pan, implements panning by dragging the map in any given direction with the pointer

  • Full Extent, returns the map to its default full extent

Saving the classification as a categorical variable

The classification associated with a given map can be added to the data table as a new categorical variable by means of the Save Categories option. This brings up a variable selection dialog in which we can specify the name for the new variable (the default is CATEGORIES). In our example, we use natbreak.

Map categories variable

Map categories variable

Upon selecting OK, a new field is added to the data table. The categories are labeled from low to high, starting with the value 1. The resulting categorical variable can be used as input into a Unique Values Map or a Co-location Map (see below). However, any information is lost on the ranges from which the categories were derived, or the type of legend (e.g., sequential, diverging).

Map
                categories variable in table

Map categories variable in table

Saving the map as an image

The three options at the bottom of the list, Save Selection, Copy Image to Clipboard, and Save Image As operate in the same way as for the statistical graphs we reviewed earlier. However, there is one important change compared to earlier versions of GeoDa.

The map image can be saved as four different formats: png (the default), bmp, SVG and Postscript. More importantly, GeoDa now saves the legend with the image (in earlier versions only the map image was saved, without the legend). For example, using Save Image As for the natural breaks map in our example yields the following png image.

Saved map image

Saved map image

Shape centers

The options Shape Centers and Thiessen Polygons apply respectively to polygon and point layers. We will use Thiessen polygons in the lab that deals with distance-based spatial weights. The Shape Centers item allows the mean center or centroid to be added to either the display itself, or to the table, or to be saved as a separate point layer file. In our current example, since we are dealing with polygons, the options for Thiessen Polygons are not available (greyed out in the interface).

The sub-menu offers the different options for Shape Centers. All options pertain to either the Mean Center or to the Centroid. The former is obtained as the simple average of the the X and Y coordinates that define the vertices of the polygon. The latter is more complex, and is the actual center of mass of the polygon (image a cardboard cutout of the polygon, the centroid is the central point where a pin would hold up the cutout in a stable equilibrium). Both suffer when the polygons are highly irregular, and in those instances they can end up being located outside the polygon. Nevertheless, the shape centers are a handy way to convert a polygon layer to a corresponding point layer with the same underlying geography.

At present, GeoDa can only deal with one geography at a time, but this geography can be represented in a number of different ways. For example, the NYC sub-boroughs can be shown as polygons (as we do here), or as their shape centers in a point layer (same geography), or even as the Thiessen polygons constructed from that point layer. The key factor is that all three representations are connected to the same cross-sectional data set.

There are three options for the shape centers: add their coordinates as fields to the current data table; display them on the map; and save them to an external point layer file.

Shape Center options

Shape Center options

To illustrate these features we select Add Mean Centers to Table. This brings up a dialog to specify the variable names for the X and Y coordinates. For now, we stick with the default of COORD_X and COORD_Y.

Mean center variable selection

Mean center variable selection

Pressing OK will add the two fields to the data table.

Mean center coordinates in table

Mean center coordinates in table

To show the points on the current map, we choose Display Mean Centers. They are added as small circles within the polygons with which they correspond.

Mean centers displayed on map

Mean centers displayed on map

The third option, Save Mean Centers, will bring up the familiar file name selection dialog that will create a new file (or database table) with the point layer.

Other map options

The remainig three options pertain to the look of the map view. Show Status Bar is on by default, which means that the number of observations, the number of selected observations, etc., will be displayed on the status bar. The Selection Shape is set by default to Rectangle, with the other two options as Circle and Line. This allows for the selection of observations on the map that lie within the specified shape, as drawn by the pointer. As we have seen earlier, any selection is instantaneously identified in all other views as well through the process of linking.

Map
                appearance options

Map appearance options

Finally, the Color options determine how the map is displayed. By default, the polygon outlines (i.e., in our example, the boundaries of the sub-borough neighborhoods) are shown, with Outlines Visible checked. By unchecking this option, the lines disappear. This is particularly helpful when the map contains many small areas that will tend to be dominated by their boundary lines. In our example (with the mean center points left in), this results in an impression of much larger areas, because many adjoining sub-boroughs ended up in the same category.

Map without outlines visible

Map without outlines visible

The Background Color determines the background against which the map is drawn. In most instances, the default of white is the best choice.

Extreme Value Maps

Extreme value maps are variations of common choropleth maps where the classification is designed to highlight extreme values at the lower and upper end of the scale, with the goal of identifying outliers. These maps were developed in the spirit of spatializing EDA, i.e., adding spatial features to commonly used approaches in non-spatial EDA (Anselin 1994).

GeoDa currently supports three such map types in the Map menu: a percentile map, a box map, and a standard deviation map. These are briefly described below. Only their distinctive features are highlighted, since they share all the same options with the other choropleth map types.

Percentile map

The percentile map is a variant of a quantile map that would start off with 100 categories. However, rather than having these 100 categories, the map classification is reduced to six ranges, the lowest 1%, 1-10%, 10-50%, 50-90%, 90-99% and the top 1%. This is shown below for the rent2008 variable. We select Map > Percentile Map from the map menu or the map toolbar icon and choose the variable to create the map.

Percentile map

Percentile map

Note how the extreme values are much better highlighted, especially at the upper end of the distribution. The classification also illustrates some common problems with this type of map. First of all, since there are fewer than 100 observations, in a strict sense there is no 1% of the distribution. This is handled (arbitrarily) by rounding, so that the highest category has one observation, but the lowest does not have any. Also, since the values are sorted from low to high to determine the cut points, there can be an issue with ties. As we have seen, this is a generic problem for all quantile maps. As pointed out, GeoDa handles ties by moving observations to the next highest category. For example, when there are a lot of observations with zero values (e.g., in the crime rate map for the U.S. counties), the lowest percentile can easily end up without observations, since all the zeros will be moved to the next category.

Box map

A box map is the mapping counterpart of the idea behind a box plot. The point of departure is again a quantile map, more specifically, a quartile map. But the four categories are extended to six bins, to separately identify the lower and upper outliers. The map menu has two items for the box map, one for each option for the hinges (1.5 and 3.0), identical to what we had for the box plot. For example, we invoke this map using Map > Box Map (Hinge=1.5) from the menu or the map icon on the toolbar. After selecting rent2008 as the variable, the map is created.

Box map,
              hinge=1.5

Box map, hinge=1.5

Compared to the quartile map above, the box map separates the three lower outliers (the observations with zero values) from the other four observations in the first quartile. They are depicted in dark blue. Similarly, it separates the 6 outliers in Manhattan from the 8 other observations in the upper quartile. The upper outliers are colored dark red. To illustrate the correspondence between the box plot and the box map, we put the two side by side (Explore > Box Plot) and select the upper outliers in the box plot. Only the matching 6 outliers in the box map are highlighted. Note that the outliers in the box plot do not provide any information on the fact that these locations are also adjoining in space. This the spatial perspective that the box map adds to the data exploration.

Outliers
              in box plot and box map

Outliers in box plot and box map

From the current map, we can switch the box map between the hinge criterion of 1.5 and 3.0 by opening the options menu (right click on the map) and selecting Change Current Map Type > Box Map (Hinge = 3.0). Alternatively, we can also open a new map window from the main menu or map toolbar icon and select Box Map (Hinge=3.0) as the option. The resulting box map no longer has lower outliers and has only 5 upper outliers.

Box map,
              hinge=3.0

Box map, hinge=3.0

The box map is the preferred method to quickly and efficiently identify outliers and broad spatial patterns in a data set.

Standard deviation map

The third type of extreme values map is a standard deviation map. In some way, this is a parametric counterpart to the box map, in that the standard deviation is used as the criterion to identify outliers, instead of the inter-quartile range.

In a standard deviation map, the variable under consideration is transformed to standard deviational units (with mean 0 and standard deviation 1). The number of categories in the classification depends on the range of values, i.e., how many standard deviational units cover the range from lowest to highest. It is also quite common that some categories do not contain any observations.

We continue with the variable rent2008 and bring up a standard deviation map by means of Map > Standard Deviation Map from the main menu or from the map toolbar icon.

Standard deviation map

Standard deviation map

In the map, there are 5 neighborhoods with median rent more than two standard deviations above the mean, and 3 with a median rent less than two standard deviations below the mean. Both sets would be labeled outliers in standard statistical practice. Also, note how the second lowest category does not contain any observations (so the corresponding color is not present in the map).

Mapping Categorical Variables

So far, our maps have pertained to continuous variables, with a clear order from low to high. GeoDa also contains some functionality to map categorical variables, for which the numerical values are distinct, but not necessarily meaningful in and of themselves. Most importantly, the numerical values typically do not imply any ordering of the categories. The two functions in GeoDa are the unique value map, for a single variable, and the co-location map, where the categories for multiple variables are compared.

Unique Value map

To illustrate the categorical map, we create a box map (with hinge 1.5) for each of the variables kids2000 (the percentage households with kids under 18) and pubast00 (the percentage households receiving public assistance). In each of these maps, we save the categories (Save Categories in the options menu), respectively as kidscat and asstcat. The category labels go from 1 to 6, but not all categories necessarily have observations. For example, the map for public assistance does not have either lower (label 1) or higher (label 6), outliers, only observations for 2–5. For the kids map, we have categories 1–5.

Now, we ignore the meaning of the categories and create a categorical map for the classifications in the box map for the kidscat variable by means of Map > Unique Value Map from the main menu or from the map toolbar icon.

Unique value map

Unique value map

The categorical map has values in the legend from 1 to 5, but unlike for the box map from which they originated, they do not imply any ordering. This is also reflected in the colors, which are generated from the ColorBrewer categorical map palette. In addition, it is possible to move colors between categories. This is carried out by moving an item in the legend by dragging the label up or down. In the example, we drag the label 5 (associated with the rosy color) up to the next level, where the color is green and the label is 4.

Switching categories

Switching categories

As soon as we release the label, it becomes 4 again (to emphasize that the label in and of itself is meaningless in terms of value).

Categories switched

Categories switched

However, all observations that used to be colored rosy are now colored dark green, and vice versa.

Unique
              value map with relabeled categories

Unique value map with relabeled categories

It is important to keep in mind that the values in the legend are just labels, represented as integers for convenience, but they could just as well have been A, B, C, etc., or other descriptors. This is a fundamental difference between the categorical maps and the other choropleth map types.

Co-location map

The idea of a co-location map is the extension of the unique value map concept to a multivariate context. In essence, it is the implementation of the idea of map overlay or map algebra.2 The labels for different categorical variables are compared and the matches identified on the map. It is up to the user to ensure that the categories across variables are meaningful, since the co-location is based on the variables having the same code. For example, this is useful when comparing the extent to which the quartiles across different variables occur at the same locations. Or, as we will see later, whether significant patterns of local spatial autocorrelation match across several variables. But it is also very easy to generate nonsensical results, for example, when the labels are not comparable.

The co-location map is invoked as Map > Co-location Map from the main menu or from the map toolbar icon. The next dialog allows for the selection of the different categorical variables in the top panel. If there are matches between the codes for the variables, these are displayed at the bottom, with the labels and a proposed color scheme. We select kidscat and asstcat as the two categories we want to compare. Four labels match between the two variables, and their values and a proposed color theme are listed in the bottom panel.

Co-location map variable selection

Co-location map variable selection

The proposed color scheme is Unique Values by default. But in this example, that is not the appropriate shading, since we are comparing the classifications between two box maps (or, equivalently, box plots). The Select color scheme drop down list provides a set of pre-coded color schemes for a range of likely comparisons between categories. For our example, we select Box Map.

Co-location map color schemes

Co-location map color schemes

This changes the proposed color scheme in the bottom panel to match the palette for a box map (but only 4 of the 6 colors are being used).

Co-location map box map color schemes

Co-location map box map color schemes

Clicking on the OK button brings up the co-location map.

Co-location map

Co-location map

In our example, 25 of the 55 sub-boroughs have observations in matching categories for the two variables. The 30 that do not are shaded light grey. The matching observations are shaded in the color of the box map category to which they belong. We find that 10 (out of the 14) neighborhoods belong to the upper quartile (> 75%, but not outliers) for both variables, suggesting an association. This association is also something we already observed in the LOWESS smoother of the scatter plot between these two variables. The main difference with a scatter plot is that the co-location map can be applied to categorical variables. It can also easily be extended to more than two variables.

Custom Classifications

In addition to the range of pre-defined classifications available for choropleth maps in GeoDa, it is also possible to create a custom classification. This is often useful when substantive concerns dictate the cut points, rather than data driven criteria. For example, this may be appropriate when specific income categories are specified for certain government programs. It is also useful when comparing the spatial distribution of a variable over time. All pre-defined classifications are relative and would be re-computed for each time period. For example, when mapping crime rates over time, in an era of declining rates, the observations in the upper quartile in a later period may have crime rates that correspond to a much lower category in an earlier period. This would not be apparent using the pre-defined approaches, but could be visualized by setting the same break points for each time period.

The Category Editor in GeoDa is a complex tool that allows for the creation of a fully customized classification scheme. It is invoked from the map options menu, as Change Current Map Type > Create New Custom.

Custom
            category

Custom category

Alternatively, the category editor can also be invoked directly by selecting the corresponding icon from the toolbar, immediately to the left of the histogram icon.

Category Editor toolbar icon

Category Editor toolbar icon

Category editor

The opening dialog asks for a new title for the category, with the default set to Custom Breaks.

Category editor initialization

Category editor initialization

We specify the name for the new classification as custom1.

Specifying a custom category name

Specifying a custom category name

After clicking OK, the category editor dialog is updated with the name for the new classification, which appears in the drop down list below Create Custom Breaks. The variable on which the classification will be based is listed, here kids2000 (but all other variables are available from the drop down list), as well as a default definition of Breaks (here, Quantile), Color Scheme (here, diverging), and Categories (here, 6). The panel on the right shows a histogram with the current classification applied to the associated variable. Moving the pointer over a bar in the histogram displays some summary statistics in the status bar, such as the range and its associated number of observations.

Category editor interface

Category editor interface

As soon as the custom editing process is initiated, the map from which it was started is synchronized with the current state of the category editor. In our example, this turns the original classification into six groups with a diverging color scheme, matching the number of observations in each category that is also shown in the category editor histogram (for example, 9 observations in the first bin).

Map
              synchronization

Map synchronization

Besides the number of categories, the two main criteria to define a new classification are the break points and the associated colors. Each item has a number of options in a drop down list. For the Breaks, the five options consist of four traditional criteria, and one User Defined item. We select the latter.

Breaks dialog

Breaks dialog

The associated colors are selected from the Color Scheme drop down list, and include sequential, diverging, thematic (or, categorical), and custom. The custom colors are defined by clicking on the colored boxes next to each category under the heading Edit Custom Breaks. In our example, we will use a sequential coloring scheme.

Color scheme dialog

Color scheme dialog

As we adjusted the classification criteria, the histogram in the right-hand panel is immediately updated to reflect the latest selection (and so is the synchronized map).

Updated category editor dialog

Updated category editor dialog

We are now ready to edit the break points. There are two approaches to carry this out. One is to use the handle in the vertical slider next to the list of button breaks. One break is dealt with at a time, selected by means of the matching radio button to the left. In our example, this is Button break 1. When we move the slider down, the value in the box changes (increases). For example, in the figure below it is set at 19.987. As the slider moves up or down, the histogram (and the map) is instantaneously updated to reflect the new break points. With Automatic Labels checked (the default), the category labels are updated as well.

Edit
              custom breaks -- pointer

Edit custom breaks – pointer

A second approach to edit the break points is to type them in directly, as shown below. Again, as new values are entered, the histogram, map and labels are updated.

Edit
              custom breaks -- manual entry

Edit custom breaks – manual entry

We complete the process by entering 20, 30, 40, 45 and 50 as the new break points, as shown below.

New
              categories defined

New categories defined

The corresponding map classification has the updated labels and categories. Note how the legend heading indicates the use of custom1 as the classification for the variable kids2000.

Updated
              map classification

Updated map classification

Finally, even though in the current application it is not actually needed, we uncheck the Automatic Labels box to illustrate the custom labels feature. In the boxes, we enter cat1, cat2, etc. as the labels. The labels in the map are updated accordingly.

Custom category labels

Custom category labels

To proceed further, we check the Automatic Labels again to return to the more informative labels that show the range for each bin.

Applications of custom categories

Once created, the custom classification becomes available to any application where classification is involved. Most directly, this is the mapping functionality, where now the custom category is listed as an option when selecting Change Current Map Type.

Custom map classification

Custom map classification

The custom1 classification could be used to track the spatial distribution of the kids variable over time. For example, we can create a box map (with hinge 1.5) for the kids2005 variable, and proceed with Change Current Map Type to select custom1. The resulting map reflects the new custom categories. We observe an overall shift towards the lower categories, i.e., more sub-boroughs have a smaller share of households with children in 2005 compared to 2000.

kids2005 with custom classification

kids2005 with custom classification

In addition, the custom classification is also available to the histogram. Selecting Histogram Classification in the options shows custom1 as a classification. Note that the category editor can also be initiated from the histogram options.

Custom classification for histogram

Custom classification for histogram

The corresponding histogram is the same as the one shown at the end of the breaks editing process in the category editor.

Customized breaks in histogram

Customized breaks in histogram

The custom categories can also be used as break points in conditional plots. This is the approach to fully customize the break points for the conditioning variables.

Saving the custom categories – the Project file

GeoDa has the ability to save a so-called project file that contains information on various variable transformations and other operations. We will revisit the project file in more detail when we discuss spatial weights. One useful feature of the project file is that it also contains the definition of any custom categories that were created. If this definition is not saved in a project file, then it will be lost, and will need to be recreated from scratch the next time the data set is analyzed.

The project file is created from the menu as File > Save Project.

Save
              Project

Save Project

This is followed by the usual file name dialog. The file is saved with a file extension of gda. It is a text file that includes XML encoding. For example, when we examine the just created project file with a text editor, we can locate the section pertaining to custom_classifications, as shown below.

Custom
              category definition in project file

Custom category definition in project file

The custom classification section contains all the aspects needed for the definition of the custom category.

Once a project file has been saved, GeoDa should be started by loading the gda file, rather than the shape file or any other data set. This ensures that all the meta-information is taken into account, all variable transformations updated and custom categories stored. The options menu for any map will now contain custom1 as one of the available categories.

Once we move to more complex analyses, with multiple variable transformations, spatial weights, and other items created during various operations, the most efficient way to operate is to always save and use a project file.

Conditional Map

In the discussion of conditional plots in EDA, one of the options was a conditional map, also known as a conditioned choropleth map, or a micromap matrix (for an extensive discussion, see Carr and Pickle 2010). This functionality can be invoked from the menu as Map > Conditional Map.

Conditional Map from Map menu

Conditional Map from Map menu

Alternatively, it is also one of the options associated with the conditional plot toolbar icon.

Conditional Map from Conditional Plot icon

Conditional Map from Conditional Plot icon

Selecting this option brings up a variable selection dialog containing three columns (compare this to the four columns for the conditional scatter plot). The first column pertains to the conditioning variable for the horizontal axis, the second to the conditioning variable for the vertical axis. The third column, Map Theme, selects the variable that will be mapped. In our example, we use forhis08 (% of Hispanic population not born in the U.S.) and hhsiz08 (average number of people per household) as the two conditioning variables, and rent2008 (median rent) as the focus variable. All values are for 2008.

Conditional Map variable selection

Conditional Map variable selection

Clicking OK brings up the default conditional map, with three categories (quantiles) for each of the conditioning variables, and thus 9 micro maps in total. On the horizontal axis, forhis08 is listed with two break points. The vertical axis variable is given as hhsiz08, also with two break points. The maps themselves are box maps for the median rent variable. Each of the micro maps contains only those observations that match the categories on the horizontal and vertical axes.

3x3
            Conditional Map

3x3 Conditional Map

Since there are only 55 observations in our example, the sample size for each of the subsets tends to be very small. Instead, we use the Options menu (right click) to change the classification. The structure of this menu is the same as for the other conditional plots, including changing the conditioning bins through Vertical Bins Breaks and Horizontal Bins Breaks. There are seven preset classifications, as well as the option to create custom breaks by means of the category editor. Note that if you continued with the previous example, the custom category custom1 will be included as one of the options.

In the illustration below, the vertical and horizontal break points were changed from the default 3 quantiles, to 2. The result is a 2 x 2 matrix of micro maps

2x2
            Conditional Map

2x2 Conditional Map

The classification suggests that sub-boroughs with larger household size tend to have lower rents (the two maps at the top), whereas there does not seem to be an effect of the % Hispanic that is foreign-born (the maps on the left and right do not seem to be drastically different in the range of values they contain). By manipulating the break points, further insight can be gained into the presence of interaction effects (or lack thereof). This can be further investigated more formally by means of an analysis of variance.

Cartogram

GeoDa includes a circular cartogram, in which the areal units are represented as circles, whose size (and color) is proportional to the variable observed at that location.3 The changed shapes remove the misleading effect that the area of the unit might have on perception of magnitude. For example, in the case of median rent in NYC sub-boroughs, some of the smaller areas in Manhattan have the highest rent, and consequently they can barely be noticed in a standard choropleth map.

The cartogram is invoked by its toolbar button, situated in the center of the three mapping icons, or by selecting Map > Cartogram in the Map menu.

Cartogram icon

Cartogram icon

Next follows the Cartogram Variables dialog that contains two columns, one for the Circle Size and one for the Circle Color. It is highly recommended to select the same variable for both. In our example, this is again rent2008. The circle color variable can be different from the circle size variable, but then it may be difficult (and sometimes confusing) to keep the two separate.

Cartogram variable selection

Cartogram variable selection

Clicking OK brings up the cartogram view. The default is to use the Box Map classification for the circle colors (with hinge at 1.5 IQR).

Cartogram

Cartogram

The cartogram is most insightful when used in conjunction with a regular choropleth map. Selecting an observation in the cartogram then immediately links it with the corresponding area in the choropleth map, here illustrated for one of the Manhattan neighborhoods.

Linked Cartogram and map

Linked Cartogram and map

The cartogram has all the same options as a regular choropleth map, with one addition. The positioning of the circles in the cartogram is the result of a non-linear optimization algorithm that tries to locate the center of the circle as close as possible to the centroid of the areal unit with which it corresponds, while respecting the contiguity structure as much as possible. There is no unique solution to this problem, and it is often good practice to experiment with further iterations that will slightly reposition the circles. This is implemented in the Improve Cartogram option. A number of different iteration options are listed, together with the estimated time. The latter is particularly useful for larger data sets (but not in our example).

Improve Cartogram

Improve Cartogram

Map Animation

In GeoDa, map animation, or, more generally, any kind of animation is carried out through the animation tool. This is invoked by the Map Movie toolbar icon, or from the Menu, as Map > Map Movie.

Map
            Movie icon

Map Movie icon

This brings up the Animation dialog, the control center through which the various aspects of the animation are controlled. The first item to specify is the Variable from the drop down list. We continue with rent2008. At the bottom of the dialog are the main controls: the start > button, step-by-step forward >> or backward <<, whether the animation loops or stops at the end, an option to Reverse the progress, the speed of the animation, and whether the order followed is ascending or descending. The defaults are usually good, with Cumulative checked (i.e., the selection grows as the animation progresses) and Ascending order.

Map
            Movie variable selection

Map Movie variable selection

Once the forward button is activated, each observation is selected in turn, starting with the lowest value. This selection is not only for the map (the term map movie is a left-over from earlier versions), but for all currently active windows. The slider in the Animation dialog moves from left to right, and under the variable name the currently selected observation and its value are listed. In our example, the dialog looks as follows after 17 steps.

Animation
            tool

Animation tool

The current selected observation is 51, with a value of 1000. All 17 lowest valued observations are highlighted in all currently open views, such as in the box map below (note how the status bar confirms that 17 observations were selected).

Map animation

Map animation

The animation tool can be paused at any point, reversed, changed from continuous change to step-by-step, etc., using the controls provided. The main point of the animation is to visually check for any patterns, such as all the lowest or highest values occurring in one location, or an increase in value that follows a given spatial trend (e.g., core-periphery, or East-West). Of course, this visual impression is only that, and will need to be confirmed with the more formal pattern detection methods that we will cover later.


References

Anselin, Luc. 1994. “Exploratory Spatial Data Analysis and Geographic Information Systems.” In New Tools for Spatial Analysis, by Marco Painho (Ed.), 45–54. Eurostat, Luxembourg.

Carr, Daniel B., and Linda Williams Pickle. 2010. Visualizing Data Patterns with Micromaps. Chapman & Hall/CRC.

Fisher, W.D. 1958. “On Grouping for Maximum Homogeneity.” Journal of the American Statistical Association 53: 789–98.

Jenks, G.F. 1977. “Optimal Data Classification for Choropleth Maps.” Occasional Paper No. 2. Department of Geography, University of Kansas.

Tobler, Waldo. 2004. “Thirty Five Years of Computer Cartograms.” Annals, Association of American Geographers 94: 58–73.


  1. University of Chicago, Center for Spatial Data Science – anselin@uchicago.edu

  2. While map algebra tends to be geared to applications for raster data, since the polygons for the different variables are identical, the same principles can be applied to the categorical map context.

  3. See Tobler (2004) for an extensive discussion of various aspects of the cartogram.