Skip to contents

The recover() function performs the "Recover" (Step 3) operation from the QOR Method.

Usage

recover(
  units = NULL,
  polygons = NULL,
  zipcodes = NULL,
  unit_id = "unit_id",
  unit_zip = "postalcode",
  polygon_id = "polygon_id",
  zip_id = "postalcode",
  state_shape = NULL,
  used_NCES = FALSE,
  FIPS_code = NULL,
  FIPS_col = NULL
)

Arguments

units

Dataframe or tibble containing unit information (must have unique unit_id and a zipcode variable).

polygons

sf object containing polygon geometries (e.g., school districts).

zipcodes

sf object containing zipcode tabulation area geometries (ZCTAs).

unit_id

Name of the column in the units dataframe that contains unique identifiers for each unit (default: "unit_id"). Preferably as string.

unit_zip

Name of the column in the units dataframe that contains the zipcodes (default: "postalcode"). Preferably as string. NOTE: users strongly recommended to first compare unit_zip's formatting to the zip_id column in the zipcodes object.

polygon_id

Name of the column in the polygons sf object that contains unique identifiers for each polygon (default: "polygon_id"). Preferably as string.

zip_id

Name of the column in the zipcodes sf object that contains unique identifiers for each zipcode (default: "postalcode"). Preferably as string.

state_shape

sf object containing the shape of the state (used to filter zipcodes to the state).

used_NCES

Boolean indicating whether the user input NCES school district shapefiles or other shapefiles with a state FIPS_code as the polygons (default: FALSE).

FIPS_code

State FIPS code to filter NCES school district shapefiles (default: NULL, which means no filtering by state).

FIPS_col

Column name in your polygons dataset containing FIPS_code. Preferably should be a character/string variable (default: NULL, which means no filtering by state).

Value

A list with two items: (1) Tibble with five columns: unit_id (string), polygon_id (string), postalcode (string), distance (numeric, meters), and a binary flag for matched_byzip, where each unit_id is matched to one polygon_id. (2) Tibble for units that could not be matched to a zipcode (i.e., no recovery possible) with columns for unit_id and postalcode.

Details

This function:

  • Assigns units (e.g., voters) to polygon geometries (e.g., school districts) using zipcodes as a spatial crosswalk.

  • Requires that all units, polygons, and zipcodes each have a unique identifier column.

  • If observations are not unique (e.g., panel data), use only one timepoint per function call (e.g., all voters in one year, then all voters from the next year, etc.).

  • Returns a match that assigns one polygon to each unit: the polygon whose internal point is closest to the centroid of that unit's zipcode. The function also returns units that could not be matched to a zipcode as a second table.

  • Intended for cases where the "Query" operation fails to convert an address to a point geometry.

  • To return multiple polygons per unit, you will need to modify the code.