FRIDAY, APR 23, 2021
The 2020 Census, already snake-bitten by a global pandemic that delayed its completion and a prolonged controversy over counting undocumented immigrants, is facing another obstacle as it prepares to release data in the fall that will be used to redraw voting boundaries: Accuracy.
Differential privacy, a technique claimed to protect individuals from identification through Census data, has come under fire in a lawsuit by Alabama’s attorney general and 16 other states who argue “the problem (it) was meant to fix does not exist.” The states are claiming that the new method chosen by the Census Bureau and only formally adopted in November is designed to make Census blocks and other geographies inaccurate. It is worse at protecting respondents and could make the data unusable because the results for a given area would be uncertain especially for geographies with fewer than 500 people. The states also claim the Census Bureau made this decision without the legally required consultation with the states.
This will affect the PL94-171 redistricting data – initially scheduled to be released in March, now delayed until August – which will be inaccurate, especially for small areas. No one will have an accurate (or even inaccurate) knowledge of the number of people in an area or the size of various racial and ethnic groups in an area, making redistricting and many other uses of the basic Census data very difficult.
An analysis by Andrew Beveridge, co-founder of Social Explorer and a demographic expert cited in the Alabama case, noted that 1 of 5 voting districts, 1 of 3 places, and 99 of 100 Census blocks (the smallest unit of Census geography) contain fewer than 500 people. (link the other blog.)
“This is a real threat to the data, no doubt about it,” Beveridge said.
Traditionally, the Census Bureau has used a combination of methods to protect privacy, most commonly “swapping” characteristics of some respondents from one Census block (the smallest unit of Census geography) to another. The Census Bureau, however, has claimed that the characteristics of as many as 52 million Americans – their age, gender, race, ethnicity, household type and housing status – could be discovered using the PL94-171 redistricting data and the Summary File 1 data (likely released in 2022). However, an expert for the states suing has shown that the same result could be achieved at random, raising serious questions about the validity of that argument.
It argues that privacy can be protected by inserting intentional errors into the results of the redistricting data, which describe the number of people living in a given geography; the number of a given ethnicity (Hispanic or non-Hispanic); the number of people of a certain race (one of six races or any combination); the number of voting-age people; the number of people living in group quarters, such as a college dormitory, prison or nursing home; and the number of occupied and vacant housing units.
The claim that inserting errors will protect privacy while only minimally affecting accuracy has drawn criticism from demographic experts. Steven Ruggles, director of the University of Minnesota Institute for Social Research and Data Innovation, warned that the use of differential privacy could result in enormous errors for small places, possibly exceeding 100 percent of their actual population. He also poured cold water on the Census Bureau’s claims.
“It would be impossible to positively identify the characteristics of any particular individual using the database reconstruction without access to non-public internal Census information,” he said in an affidavit.
Even top Census officials question the wisdom of using differential privacy.
“The data must reflect what is seen in the real world because it is used to change how the real world interacts with itself and with its government,” James Whitehorne, chief of the redistricting and voting rights data office, wrote in a Sept. 30 email to the bureau’s chief scientist. “This does not mean I do not understand our obligation to protect the public’s data, it just appears that in our zeal to protect the data we are harming the very same people we are protecting.”
The Census Bureau is expected to release official population totals for each state by the end of the month. The figures, which originally were due by Dec. 31, 2020, will be used to determine the number of congressional seats assigned to each state. The state data release will not be affected by differential privacy.
The official redistricting data is expected to be released by Sept. 30, with a bare-bones legacy version released by Aug. 16. Alabama officials, however, asked a panel of three federal judges in Birmingham to order the Census Bureau to provide full and accurate redistricting data no later than July 31 so it can have time to redraw voting boundaries before the 2022 election cycle begins.
“There are many ways to remedy these problems,” the Alabama lawyers wrote in a response filed earlier this week. “So long as they comply with the law, how Defendants clean up their mess is their choice.”
All of the filings related to this case are available here: https://thearp.org/
Author: Andy Beveridge