2020 FCSM Research and Policy Conference

MONDAY SEPTEMBER 21^st

Keynote Address

9:00 - 10:00 am

Using Federal Data to Evaluate and Inform:

A Case Study on Increasing Upward Mobility in the U.S.

John Friedman, Brown University

<button style=”width: 900px;”type=”button” class=”collapsible”><p>Morning Concurrent Sessions
10:15 am - 12:00 pm</p></button>

AM1-1: The Evidence Act 101

AM1-2: Data Ethics Frameworks

AM1-3: Nonresponse Bias in Federal Surveys:

Gaps in Knowledge and Future Opportunities

<button style=”width: 900px;”type=”button” class=”collapsible”><p>Afternoon Concurrent Sessions 1</p> <p>1:30 - 3:00 </p></button>

PM1-1: Implementing the Evidence Act: Journey thus Far and the Road Ahead

PM1-2: Leveraging Administrative Data

PM1-3: Leveraging Official Statistical Programs to Address Emerging Issues: Providing Information Relevant ot the Coronavirus Pandemic

	<button style="width: 900px;"type="button" class="collapsible"><p><u><strong>Afternoon Concurrent Sessions 2</strong></u></p>

3:15 - 5:00 pm

</button>

PM2-1: Using Data in New Ways:

Leveraging the Evidence Act to

Coordinate Evaluation, Statistics

and Policy

PM2-2: Linked Data from the Census Bureau for Evidence Building: Accessing the Data and Recent Results

PM2-3: Communicating Fitness for Use

<button style=”width: 900px;”type=”button” class=”collapsible”><p>Morning Concurrent Sessions
10:15 am - 12:00 pm</p></button>

AM1-1: The Evidence Act 101

Organizers: Jennifer Edgar (Bureau of Labor Statistics) and Keenan Dworak-Fisher (Office of Management and Budget)

Moderator: Katharine Abraham (University of Maryland)

Panelists:

Diana Epstein (Office of Management and Budget)

Sharon Boivin (Department of Education)

Monique Eleby (U.S. Census Bureau)

Discussant: Emilda Rivers (Statistical Official, National Science Foundation and Director of the National Center for Science and Engineering Statistics)

AM1-2: Data Ethics Frameworks

Organizer and Session Chair: Jessica Graber (National Center for Health Statistics)</p> Ethical Considerations for Data Access and Use;</u> Amy O'Hara (Georgetown University) Policy and Technology: Ensuring Ethics in the Submission and Access of Biomedical Research Data;</u> Dina N. Paltoo (National Institutes of Health) Ethical Issues in the Development of Complex Machine Learning Algorithms;</u> Sara R. Jordan (Policy Counsel, Artificial Intelligence, Future of Privacy Forum) Ethical Principles for the All Data Revolution - Repurposing Administrative and Opportunity Data;</u> Stephanie S. Shipp, Sallie Keller, and Aaron Schroeder (University of Virginia)

AM1-3: Nonresponse Bias in Federal Surveys - Gaps in Knowledge and Future Opportunities

Organizer and Moderator: Tala Fakhouri (National Center for Health Statistics)</p> Constructing an Inventory of Non-response Bias Studies in Federal Surveys; </u>Peter Miller (Professor Emeritus at Northwestern University and U.S. Census Bureau, Retired) Developing and Assessing Weighting Methods for the Redesigned National health Interview Survey; James Dahlhamer (National Center for Health Statistics) Finding the Right Auxiliary Information for Non-response Adjustment Models: In Search of Zs with Desirable Properties; Andy Peytchev (RTI International)

Estimating Survey Non-Response Bias Using Tax Records; Bruce Meyer (University of Chicago, NBER, AEI, and U.S. Census Bureau)

Afternoon Concurrent Sessions 1

</button>

1:30 - 3:00 pm

PM1-1: Implementing the Evidence Act: The Journey thus Far and the Road Ahead!

Organizers: Jennifer Edgar (Bureau of Labor Statistics) and Joe Parsons (National Agricultural Statistics Service)

<Moderator: Hubert Hamer, Administrator, (National Agricultural Statistics Service)

Panelists:

Gregory Fortelny (Chief Data Officer, U.S. Department of Education)

Ted Kaouk (Chief Data Officer, U.S. Department of Agriculture)</p>

William W. Beach (Statistical Officer,

U.S. Department of Labor and Commissioner of Labor Statistics, Bureau of Labor Statistics)</p>

Kelly Bidwell (Evaluation Officer and Statistical Official, General Services Administration)

Samuel C. "Chris" Haffer (Chief Data Officer, U.S. Equal Employment Opportunity Commission)

PM1-2: Leveraging Administrative Data

Organizer and Session Chair: Erik Scherpf (NORC at the University of Chicago)</p>
Using Administrative Records Data to Produce Business Statistics: The Nonemployer Statistics by Demographics Series (NES-D);</u>James Noon and Adela Luque (U.S. Census Bureau)
Blending Administrative Data with a Probability Sample of Nonparticipants to Produce National Estimates: The NCS-X NIBRS Estimation Project</u>; Marcus Berzofsky, Dan Liao (RTI International) and Alexia Cooper (Bureau of Justice Statistics)
Analyzing Research and Development Trends Using Administrative Data; Kathryn Linehan, Eric Oh, Joel Thurston, Stephanie Shipp, and Sallie Keller (University of Virginia), John Jankowski and Audrey Kindlon (National Center for Science and Engineering Statistics)
Integrating Survey and Administrative Data Across Sources and Across Agencies to Create Statistical Products: A Case Study from Education; Sarah Grady (National Center for Education Statistics) and Emily Isenberg (American Institutes for Research)
An Approach to Tiered Access in the Department of Veterans Affairs; </u>Michael Schwaber (U.S. Department of Veterans Affairs)

PM1-3: Leveraging Official Statistical Programs to Address Emerging Issues: Providing Information Relevant to the Coronavirus Pandemic

Organizers: Jaki McCarthy (National Agricultural Statistics Service) and Jennifer Edgar (Bureau of Labor Statistics)

Session Chair: Jennifer Edgar (Bureau of Labor Statistics)

Adding COVID-19 Questions to the CPS; Emy Sok and Karen Ksanovich (Bureau of Labor Statistics) & Tim Marshall (U.S. Census Bureau)
Near Real-Time Surveillance of COVID-19 Mortality using Data from the National Vital Statistics System; Paul Sutton and Lauren Rossen (Centers for Disease Control and Prevention)

The IRS Office of Research, Applied Analytics and Statistics used Innovative Nimble Approaches to Support Decision Making and Evaluation Related to the Corona Virus Pandemic; Holly Donnelly (Internal Revenue Service)

Expanding the Use of NCHS' Research and Development Survey to Quantify Health Characteristics During the Coronavirus Pandemic; Paul Scanlon and Katherine Irimata (National Center for Health Statistics)</p>

New Data for New Purposes; Rolf Schmitt (Bureau of Transportation Statistics)

Discussant: Chris Marokov (Office of Management and Budget)

Afternoon Concurrent Sessions 2

</button>

3:15 - 5:00 pm

</p>

PM2-1: Using Data in New Ways: Leveraging the Evidence Act to Coordinate Evaluation, Statistics and Policy

Organizers: Jennifer Edgar (Bureau of Labor Statistics) and Erica Zielewski (Office of Management and Budget)</p>

Session Chair: Jennifer Edgar (Bureau of Labor Statistics)</p>
Framing the Evidence Act's Vision for Coordination and Collaboration;</u>
Erica Zielewski (Office of Management and Budget)

U.S. Department of Housing and Urban Development's Experience Supporting and Enhancing its Data Infrastructure and Use; Calvin Johnson (U.S. Department of Housing and Urban Development)

Linking State Medicaid Data and Child Welfare Data for Outcomes Research</u>; Valeria Butler (ASPE and ACF/HHS) and Emily Madden (ASPE/HHS)
Department of Labor's Data Exchange and Analysis Platform (DAEP);</u>
Christina Yancey (Chief Evaluation Officer, Department of Labor), David Judkins (Abt Associates) and Scott Gibbons (Department of Labor)

PM2-2: Linked Data from the Census Bureau for Evidence Building: Accessing the Data and Recent Results

Organizer and Session Chair: Katie Genadek (U.S. Bureau of the Census)</p>
Criminal Justice in the US and Economic Inequality: Results from the Criminal Justice Administrative Records System</u>; Keith Finlay (U.S. Census Bureau)
UMETRICS: Data for Examining How Research is Produced and How it Affects the Broader Economy;</u> Joseph Staudt (U.S. Census Bureau)
Results from the Evidence Building Project Series: Health at Birth, Later Life Achievement, and the Intergenerational Transmission of Advantage;</u> Sarah Miller (University of Michigan)
The Census Longitudinal Infrastructure Project - Linked Census Data and Results from the Impact of Preschool on Later-Life Outcomes;</u> Katie Genadek, U.S. Census Bureau

PM2-3: Communicating Fitness for Use

Organizer and Moderator: Jennifer Parker (National Center for Health Statistics)</p>
Panelists:

Amy Branum (National Center for Health Statistics)</p>

Marilyn Seastrom (National Center for Education Statistics)

Regina Nuzzo (American Statistical Association)</p>

Robert Sivinski (U.S. Office of Management and Budget)</p>

Samantha Tyner (Bureau of Labor Statistics)</p>

Abstracts
<button style="width: 900px;"type="button" class="collapsible">

Morning Concurrent Sessions

</button>

AM1-1: The Evidence Act 101

The Evidence Act 101 session will provide a high-level overview of the main components of the Foundations for Evidence-Based Policymaking Act of 2018 (the Evidence Act), as well as the motivation and vision behind it. With an extended introduction from Dr. Katharine Abraham, who was involved with the original Commission for Evidence-Based Policymaking, the audience will hear about learning agendas and how to cultivate plans for evidence building, data governance and data inventories, and the presumption of accessibility to data that the Act provides. The session will conclude with a discussion of what the Act means for statistical agencies.

AM1-2: Data Ethics Frameworks</p>
Ethical Considerations for Data Access and Use

Amy O'Hara, Georgetown University

Project leads and data owners typically focus on legal and policy requirements when sharing data for research and evaluation, relying on written laws, regulations, standards, and policies. Ethical issues are seldom addressed in the same manner. Limited guidance exists to span the sectors, domains, and disciplines involved. We review materials available to guide decisions that data owners, controllers, analysts, and regulators face about whether and how data can be used responsibly. These decisions address concerns about possible or likely harms affecting individuals and groups, at present and into the future. We discuss cross-sector and interdisciplinary projects that are developing ethical guidelines and identifying best practices, and we identify the role that data intermediaries can play in establishing transparent practices that facilitate ethical data sharing.

Data Ethics & the Fourth Industrial Revolution

Christopher S. Lee, JD, CIPP, Chief Privacy Officer; United States Senate, Sergeant at Arms

The Fourth Industrial Revolution (4IR) has started. 5G networks, Cloud Computing and Quantum Computing are being integrated using artificial intelligence, machine learning and software. 4IR will be bigger than the dotcom revolution of the 1990s, and creates opportunities to process and use exponentially more data to make better, informed decisions faster than ever before. The 4IR will usher in a new wave of technical products and services. It will also create tools that can be used to benefit society or infringe upon privacy and civil liberties. This session will introduce the concept of 4IR and lay the ground work for identifying and addressing associated data ethics issues.

Policy and Technology: Ensuring Ethics in the Submission and Access of Biomedical Research Data

Dina N. Paltoo, National Institutes of Health

NIH has a large and growing number of valuable data repositories for human research data. Facilitating access to these data safely and in a manner that honors the privacy of the research participant requires creating innovative approaches to facilitate data submission and access. Accelerating data-driven discovery and providing the best return-on-investment on existing data and resources, in order to accelerate and improve science and build trust in the research enterprise, necessitates that the National Institutes of Health (NIH) facilitate the reuse of data collected in one study for use in future research. The first step is to ensure responsible stewardship of data, through policy and technology, such that data are submitted and made available in a manner that is consistent with the original conditions (e.g., consent) under which the data were collected, as well as in accordance with Federal regulations for de-identifying data and protecting participant privacy. Effective models exist for sharing data while providing necessary protections, such as controlled-access (e.g., to large-scale human genomic data) or results dissemination (e.g., of registered clinical trials), in addition to new models of data stewardship that use cloud-based approaches.

Ethical Issues in the Development of Complex Machine Learning Algorithms

Sara R. Jordan, Virginia Tech

Many statements of ethics for machine learning and artificial intelligence (AI/ML) are written at a high level that does not acknowledge fully the complexity of developing machine learning algorithms. Specifically, while statements that AI/ ML ought to be "transparent" or "explicable" are easily laudable, they are not technically feasible except when coupled with some extraordinary steps taken by programmer teams and their team leaders. In this presentation, I take ethics for AI/ML down from the high level statements to explanations of in medias res techniques for how to build explicable and accountable AI. I will focus on the development of neural network models in the realm of natural language processing algorithms alone in order to demonstrate where the intersections of ethical norms and technical practices will change conventional technical practices, such as data collection, transformation, model building, and model testing. Model deployment in consumer products will be discussed briefly.

Ethical Principles and Data Science - Repurposing Administrative and Opportunity Data Stephanie S. Shipp (University of Virginia), Sallie Keller (University of Virginia), Aaron Schroeder (University of Virginia)

The data revolution has transformed the conduct of social science research through the incorporation of data science, but ethical dimensions should not be compromised. Researchers can now observe behavior based on repurposing existing administrative and opportunity data without consent or awareness by those providing the data. The principles set forth in the Belmont Report on Ethical Principles and Guidelines for the Protection of Human Subjects of Research are still as applicable as when these principles were first established in 1978. Discussions about ethics need to be a natural part of every research project, especially when repurposing data for analytical purposes. A publicly-shared ethical checklist at each research stage can help researchers identify and frame any potential concerns and evaluate their relative impacts. A key part of this checklist is the assessment of implicit biases. Ethical principles require the implementation of everyday practices around documentation, transparency, ongoing discussion, questioning, and constructive criticism. We will discuss the history of these ethical principles and our experiences implementing them into our research.

AM1-3: Nonresponse Bias in Federal Surveys: Gaps in Knowledge and Future Opportunities

Constructing an inventory of nonresponse bias studies in federal surveys

Peter Miller, Northwestern University and U.S. Census Bureau, Retired.

This presentation summarizes the first systematic review of nonresponse bias (NRB) studies involving Federal Surveys since the release of the 2006 OMB Standards and Guidelines for Statistical Surveys. </i>NRB reports were identified through searches on PubMed, Google Scholar, Current Index to Statistics, Joint Statistical Meeting proceedings and through an open call to Federal statistical agencies and associated professional organizations. Some 165 studies were identified - 89 concerning establishment surveys and 76 involving household surveys. About 40 percent of the NRB studies were done during the period shortly after the 2006 OMB guidance. The methods employed for assessing NRB differed for establishment and household surveys. Studies involving establishment surveys mostly compared survey estimates to external (frame) data, while those involving household surveys mostly examined variations of estimates within the response set (e.g. early and late responders). A majority of studies reported some NRB in estimates prior to weighting and some reduction in bias after adjustment. The efficacy of weighting was often not explicitly documented in the reports. This systematic review is a first step in continuing research on NRB in Federal surveys.</p>

Developing and Assessing Weighting Methods for the Redesigned National Health Interview Survey Ronaldo Iachan ICF, National Center for Health Statistics.

In 2019, the National Health Interview Survey (NHIS) released its first redesigned instrument since 1997. In this paper, we present the results of a collaboration between the National Center for Health Statistics (NCHS) and ICF to evaluate weighting methods for use with the redesigned NHIS. The evaluation focused on the use of machine learning and multilevel logistic regression to assess nonresponse (NR) bias in key NHIS health estimates and develop NR bias adjustments for household, adult, and child sample weights.

We start by reviewing the data sources which provided potential predictors at different levels. The nonresponse models incorporated predictors from the NHIS Contact History Instrument and Neighborhood Observation Instrument, and auxiliary data from the Area Health Resource File and the Census Planning Database. The analysis employed machine learning methods such as lasso, random forest, and decision trees, as well as more traditional single-level and multilevel logistic regression, to find best-fitting models to use in NHIS nonresponse bias adjustments for household, adult, and child weights. We define key health indicators used in the nonresponse bias analysis at these different levels, and discuss data sources, decision processes, methodology and results focused on bias reduction. We also present results from capping the NR adjustment factors to limit variance inflation, as well as overlaying raking on top of NR adjustments to extend the simple demographic post-stratification currently used in the NHIS.

Finding the Right Auxiliary Information for Nonresponse Adjustment Models: In Search of Zs with Desirable Properties
Andy Peytchev, RTI International

Declining response rates increase the dependency of survey estimates on postsurvey adjustments. The identification of auxiliary information is becoming increasingly important. This presentation starts with a discussion of flaws in common current practice with regard to nonresponse adjustment models. It is followed by an overview of desirable properties of auxiliary information, and related challenges. In the third part, promising avenues for improvement are introduced, along with several illustrative examples from the research literature.

Estimating Survey Nonresponse Bias Using Tax Records
Bruce Meyer, National Bureau of Economic Research, American Enterprise Institute, and U.S. Census Bureau

Declining survey response rates is a widespread and troubling problem that raises the possibility of bias in key statistics. We propose and implement a new method to determine nonresponse bias by linking income tax records to respondents and nonrespondents by address. In light of the importance of income in assessing poverty, inequality, and material well-being, we focus on income but also examine bias along other dimensions measured on tax returns such as marital status and family size. To provide a framework, we first describe a theory of testing for differences between populations when linkage to validation data is incomplete. We then apply the methods to the Current Population Survey (CPS), the most used economic survey and the source of official employment, income, poverty, and inequality statistics. We link the CPS to IRS Form 1040 records, comparing several characteristics of respondents and nonrespondents, including income, its components, self-employment status, marital status, number of children, and the receipt of social security. We find little evidence of differences between the percentiles of the income distributions of the linked respondents and nonrespondents. We also find little difference between the income distributions of ASEC respondents and CPS Basic respondents who decline to participate in the ASEC (whole imputes). However, we find significant differences between respondent and nonrespondents in marital status, the number of children, and other characteristics.
<button style="width: 900px;"type="button" class="collapsible">
Afternoon Concurrent Sessions 1</p></button>
PM1-1: Implementing the Evidence Act: Journey thus Far and the Road Ahead

The Foundations for Evidence-Based Policymaking Act of 2018 requires data from federal agencies to be accessible and requires agencies to plan to develop statistical evidence to support policymaking. To facilitate this, three newly-designated positions were created: Chief Data Officers (CDOs), Evaluation Officers (EOs) and Statistical Officials (SOs). These are the key players who will lead federal agencies through the changes required to meet the requirements laid out in the Evidence Act, propelling the federal statistical system into a new era. In this session, we bring together CDOs, EOs and SOs and ask them about their experiences thus far implementing the Evidence Act, and their thoughts about the road ahead. There will be ample time for discussion with the panel, we encourage attendees to bring their questions!

PM1-2: Leveraging Administrative Data

Using Administrative Records Data to produce Business Statistics: the Nonemployer Statistics by Demographics Series (NES-D)
Adela Luque (U.S. Census Bureau)

The Survey of Business Owners (SBO) was the only comprehensive source of information on business demographics. To address increasing nonresponse rates and costs, and a rising demand for more frequent and timely data, the Census Bureau has consolidated three business surveys. One of the consolidated surveys is the SBO. The nonemployer component of the SBO will be accomplished through a new blended-data approach that leverages existing administrative (AR) and census records to assign demographic characteristics to the universe of nonemployers, and produce an annual series that will become the only source of nonemployer demographics estimates. This new series is the Nonemployer Statistics by Demographics series or NES-D. Meeting the public's needs, NES-D will provide reliable estimates with no respondent burden on a more frequent and timely basis than the SBO. Using the 2014-2016 vintages of nonemployer businesses and demographic information from the decennial census, the American Community Survey, the Census Numident and AR from the Department of Veteran Affairs, we discuss preliminary results, the challenges encountered along the way, and next steps.

Blending Administrative Data with a Probability Sample of Nonparticipants to Produce National Estimates: The NCS-X NIBRS Estimation Project
Marcus Berzofsky (RTI International), Dan Liao (RTI International), Alexia Cooper (Bureau of Justice Statistics)

Administrative data collected through a set of agencies (e.g., law enforcement, schools) can be a rich source of information, but misleading, if the data suffer from quality issues such as item missingness or incomplete coverage. When the data source suffers from incomplete coverage, the data are not representative of the population. If a census is not possible, one alternative is to select a probability sample of nonparticipating agencies, collect their data, and blend them with the reporting agencies. The FBI's National Incident-Based Reporting System (NIBRS) collects incident-based information on all crimes reported to the police. Currently, 33% of law enforcement agencies in the US submit to NIBRS, but these agencies mainly represent less populated parts of the country. The National Crime Statistics Exchange program is recruiting a probability sample of 400 agencies designed to produce nationally representative estimates when blended with the existing reporting agencies. However, the methodology for addressing quality issues and producing estimates is complex. We describe how we intend to address these issues and the plan for developing the appropriate estimation methodology.</p>
Analyzing Research and Development Trends Using Administrative Data

Kathryn Linehan, Eric Oh, Joel Thurston, Stephanie Shipp, and Sallie Keller (University of Virginia)
John Jankowski and Audrey Kindlon (National Center for Science and Engineering Statistics)

The Federal Government accounts for about one-fourth of total Research and Development (R&D) funding in the United States-but what exactly does this public funding support? While the National Center of Science and Engineering Statistics (NCSES) provides high-level data from surveys on the disposition of federal obligations for R&D, more granular research characteristics (e.g., project topic) remain untapped. Federal agencies also release publicly available administrative data that describe projects in far greater detail (e.g., Federal RePORTER). This presentation documents the usefulness of these administrative data to enhance and supplement NCSES surveys of federal funding. Using grant abstracts in Federal RePORTER and topic modeling, we discover latent R&D research topics in the database and analyze their trends over time to discover emerging topics. We also complete a pandemics case study that utilizes information retrieval techniques along with topic modeling to perform a deeper dive into a specific area of interest that is not captured in enough detail by the topic model on the entire database. Initial results show that we can capture specific R&D research trends from administrative data in Federal RePORTER.

Integrating Survey and Administrative Data Across Sources and Across Agencies to Create Statistical Products: A Case Study from Education
Sarah Grady (National Center for Education Statistics), Emily Isenberg (American Institutes for Research)

The National Center for Education Statistics (NCES), within the U.S. Department of Education (ED), developed supplementary geocode data files for the Early Childhood Program Participation, Parent and Family Involvement in Education, and Adult Training and Education surveys of the 2016 National Household Education Surveys Program. The geocode files use sample members' addresses to integrate data from other federal agencies and ED administrative data collections. The data files include new radius-based measures of household proximity to educational opportunities and job search assistance. The presentation will provide an overview of how the geocode files demonstrate some of the goals of evidence-based policymaking. The presentation will also discuss some of the challenges inherent in creating the files. It will discuss the challenges encountered in identifying auxiliary data sources, evaluating them for appropriateness, and in assessing disclosure risk of the resulting files. Data timeliness and cost will also be discussed.

An Approach to Tiered Access in the Department of Veterans Affairs

Michael Schwaber (U.S. Department of Veterans Affairs)

This presentation describes the VA OEI Office of Data Governance and Analytics' (DGA) development of tiered data access in an evolving environment of privacy, as well as future plans to expand access while protecting confidentiality. DGA is an organization that stores, links, processes, and distributes large amounts of veteran data, some of which contain personally identifiable information (PII). Recent updates to data policy and new laws encourage more data access across agencies and to the public. This has led to DGA's examination of its disclosure risk mitigation strategies to better protect its data. DGA is developing access tiers defined by combinations of protection levels on each of the elements of the "Five Safes" framework.

PM1-3: Leveraging Official Statistics Programs to Address Emerging Issues: Providing Information Relevant to the Coronavirus Pandemic

Federal Statistical agencies have long standing programs and data collection to provide official statistics on all aspects of the US - population characteristics, the economy, education, health, and more. These programs, which include time-series data that have been relatively unchanged for many years, are the result of years of effort required to develop, execute and produce important national estimates. However, information relevant to major unexpected events, such as the sudden development of the Coronavirus Pandemic in the US, may not be captured well as part of those ongoing collections, leaving a gap in the information available to base policy decision on. In 2020, many Federal Statistical Agencies nimbly addressed the need for data on the pandemic by leveraging the ongoing survey programs. Examples from BLS, Census, IRS, NCHS and BTS will illustrate how statistical agencies were able to support evidence based decision making relevant to a pandemic that was not part of anyone's planning. Some added new information to the existing collections, while others found new uses for existing data. But all contributed to efforts to support policy evaluation and impact.

Adding COVID-19 questions to the CPS

Emy Sok, Karen Kosanovich, Bureau of Labor Statistics & Tim Marshall, U.S. Census Bureau

As the coronavirus (COVID-19) pandemic began to spread in the US, both the Census Bureau and Bureau of Labor Statistics began to consider how the monthly Current Population Survey (CPS) could be utilized to collect information related to the pandemic. The timely release of estimates from the monthly household labor force survey would show the impact of the pandemic and efforts to contain it on measures of employment and unemployment. We had a rare opportunity to quickly add a limited number of questions that might offer more information about how people were affected by the pandemic. The new questions were crafted, reviewed, and submitted in a few weeks as the impact of the pandemic was still unfolding. The programming, testing, and fielding of these new items occurred as survey operations were changing due to the public-health constraints resulting from the pandemic. Emy Sok (BLS), Karen Kosanovich (BLS), and Tim Marshall (Census) will discuss the challenges of adding these new questions to an ongoing monthly survey. </p>
Near-real time surveillance of COVID-19 mortality using data from the National Vital Statistics System

Paul Sutton, Division of Vital Statistics, National Center for Health Statistics, Centers for Disease Control and Prevention

Lauren M. Rossen, Division of Research and Methodology, National Center for Health Statistics, Centers for Disease Control and Prevention

The Vital Statistics Rapid Release (VSRR) program provides access to the timeliest vital statistics for public health surveillance of important mortality outcomes, based on a current flow of vital statistics data from state vital records offices. In response to the urgent need for data to inform decisions related to the COVID-19 pandemic, the VSRR program was quickly expanded to tabulate and publish COVID-19-related provisional mortality data. These data include daily updates of the counts of COVID-19 deaths by week for the United States, and by jurisdiction of occurrence. Several other data files and visualizations are produced and published weekly, including counts of COVID-19 deaths by various demographic factors such as age, race and Hispanic origin, and place of death. Data files are published on an open data platform to ensure accessibility. Additionally, provisional data from NVSS are used to examine the data quality of other sources of information on COVID-19 mortality, and to monitor and disseminate data on excess deaths associated with COVID-19.

Prior to the COVID-19 pandemic, the VSRR program published provisional estimates of mortality with a 3-9 month lag between the data and the date of analysis, depending on the cause of death. This lag was based on analyses of timeliness and completeness of provisional data. The urgent nature of the COVID-19 pandemic necessitated the release of data as quickly as possible; waiting 3 months for the data to become more complete would render the data obsolete. As such, it was necessary to provide more information about the completeness of the mortality data, and to conduct related analyses on an ongoing basis to monitor the data flow and timeliness. These analyses have suggested that the NVSS provisional data on COVID-19 deaths track about two weeks behind other data sources (e.g., media reports and other COVID-19 tracking systems).

The strengths of the NVSS as the most comprehensive and consistent source of mortality data for the US make it a valuable resource for examining data quality across other sources, such as case reporting systems. Additionally, the availability of historic data in the NVSS has allowed for the estimation and ongoing monitoring of excess deaths (the number of deaths from all causes above expected levels). Estimates of excess deaths can provide information about the burden of mortality potentially related to the COVID-19 pandemic, including deaths that are directly or indirectly attributed to COVID-19. This metric may be particularly important for monitoring COVID-19 mortality and related trends, given potential differences in testing and reporting of COVID-19 deaths.

The IRS Office of Research, Applied Analytics and Statistics used innovative nimble approaches to support decision making and evaluation related to the corona virus pandemic.

Holly Donnelly, IRS

The IRS built a new graph data model to analyze entities with employment tax requirements and Form 941 Schedule R filings. The model will be used by Exam, CI, and others to identify entity relationships, distribution of claims and credits, patterns of non-compliance (including fraud), estimate Exam workload, and optimize case selection activities. The Economic Impact Payment (EIP) is a payment meant to stimulate the economy and put money in the hands of taxpayers and U.S. citizens in a time of need. The EIP also creates an opportunity for identity thieves as it creates a new mode of entry into the filing population (See https://krebsonsecurity.com/2020/04/new-irs-site-could-make-it-easy-for-thieves-to-intercept-some-stimulus-payments/ ). Normally, identity thieves must claim a refund on a return in order to profit from ID theft fraud, which subjects a return to multiple selection filters that are looking for ID Theft and other forms of refund fraud. The EIP removes the scrutiny as thieves can now file a fraudulent $0 return, avoid RRP and DDb filters, and receive a stimulus check or direct deposit. RAAS is working closely with RICS to start a system for fraudulent EIP detection and selection. Mechanisms for detection will be driven by RAAS and our contractors. W&I is expecting 18 million marginal returns this year as a result of EIP. ID Theft not only allows stimulus payments to go to identity thieves who may or may not spend those checks in the U.S., but also makes it difficult for the appropriate person claim their rightful stimulus check in a timely manner. Given the absence of a systemic selection process for this vulnerable population, we currently estimate that we need at least $200K in additional funding and potentially an additional $200K to prepare for PY2021 related schemes as unclaimed EIPs will show up as credits on TY2020 accounts.

RAAS estimated the number of businesses eligible for advance tax credits to cover health care costs and paid leave to assist Collection in anticipating workload. We used counts of employers filing employment tax returns including the 941, 944, 934 and CT-1. RAAS also estimated likely health care costs to assist in identifying claims that necessitated further scrutiny before the tax credits were advanced. Using data from the W-2s, we reported health care costs by number of employees to establish guidelines.

We are also building a model to estimate the impact of sudden economic downturn on IRS Collection resources. Using data from our compliance data warehouse paired with published data from other statistical agencies, our model will forecast, by industry and geographic area, how many individual filers will likely owe taxes they cannot pay and enter the Collection workstream. This effort will be challenging because of recent "tax reform" that caused a rise in individual balance due returns for tax year 2019. A related project will project additional work for the insolvency group to secure the government's interests in cases of bankruptcy.

Expanding the Use of NCHS' Research and Development Survey to Quantify Health Characteristics During the Coronavirus Pandemic

Paul Scanlon, Katherine Irimata, National Center for Health Statistics

The National Center for Health Statistics' (NCHS) Research and Development Survey (RANDS) is a series of surveys collected from commercial panels for methodological research purposes. Until now, the goals of RANDS have been solely methodological. On one hand, NCHS has used RANDS to refine mixed-method question evaluation techniques that can be integrated into its cognitive interviewing program. On the other hand, it has explored calibration methods that leverage the strength of NCHS' established core surveys to produce estimates from commercial web panels. These estimates have not been released; however, since they are considered experimental.

In response to the Coronavirus pandemic, NCHS expanded the use of the RANDS platform to rapidly monitor aspects of the public health emergency including the inability to work due to illness with COVID-19, telemedicine before and during the pandemic, and problems accessing specific types of health care due to the pandemic. The RANDS during COVID-19 survey was fielded in two rounds during the summer of 2020 and experimental estimates were publicly released for both rounds. This has been a joint effort between the Division of Research and Methodology's (DRM) Collaborating Center for Questionnaire Design and Evaluation Research (CCQDER) and Collaborating Center for Statistical Research and Survey Design (CCSRSD) and involved the development of the questionnaire, including the development of COVID-19 related questions, as well as the calibration of the RANDS data to NCHS' National Health Interview Survey (NHIS) in an effort to adjust for some of the potential bias in the panel. Through the expansion of this existing experimental platform, NCHS was not only able to evaluate approaches to asking about Coronavirus- and pandemic-related topics in a timely manner, but also rapidly respond to, and provide relevant information about, COVID-19 in the United States.

New Data for New Purposes

Rolf Schmitt, Bureau of Transportation Statistics

Travel restrictions and warnings had enormous and immediate effects on the U.S. transportation system. The Bureau of Transportation Statistics (BTS) realized from anecdotal evidence that the speed and magnitude of change placed a premium on daily and weekly statistics rather than the Bureau's traditional focus on annual and monthly statistics. The premium on timely statistics also demanded rethinking of deliberative data quality processes. Preliminary estimates were becoming more important than ever just as the stable trends that formed the traditional basis for preliminary estimates were undone. BTS responded to these challenges by tapping new data sources, by taking different approaches to preliminary estimates, and by adopting a rapid prototyping strategy for statistical product development. The response will be summarized by Rolf Schmitt, the BTS Deputy Director.
<button style="width: 900px;"type="button" class="collapsible">
Afternoon Concurrent Sessions 2</p></button>
PM2-1: Using Data in New Ways: Leveraging the Evidence Act to Coordinate Evaluation, Statistics and Policy

Framing the Evidence Act's Vision for Coordination and Collaboration
Erica Zielewski, OMB

In brief introductory remarks, this presentation will summarize key elements of the Act with an emphasis on those places that highlight coordination and collaboration between evaluation and statistics functions and roles, such as the creation of multi-year learning agendas and Annual Evaluation Plans. These remarks will also introduce the examples that follow each of which brings something different to the discussion: one example of an agency that generally makes data available; an example of how an evaluation shop and statistical unit work together within one organization; an example focused on state data; and an example of leveraging administrative data from an agency for research purposes.

HUD's Approach to Making Data Available for Research and Evaluation
Calvin Johnson, HUD

HUD traditionally prioritizes its data capacity (see for example the data section in their Research Roadmap) and has spent effort to get their data into the hands of people that can use it HUD has also done some ~~cool~~ unique data linkages (for example, with Census and with NCHS), and more broadly tried to make data more accessible for use. We would like you to discuss how this has worked in practice.

Linking State Medicaid Data and Child Welfare Data for Outcomes Research
Valeria Butler, ASPE and ACF/HHS, and Emily Madden, ASPE/HHS

HHS recently launched this effort to link and build datasets that can be used for research and policy. It is unique and innovative, but nascent, so the presentation would focus more on intentions/plans vs. concrete activities/outcomes

The Department of Labor's Data Exchange and Analysis Platform (DEAP)
Christina Yancey, Chief Evaluation Officer, DOL; Scott Gibbons, Chief Data Officer, DOL; and David Judkins, ABT Associates

This presentation will include speakers from DOL discussing the DEAP tool, and will begin with a discussion of the critical role of capacity-building for ongoing evidence building, followed by an introduction and overview of the DEAP tool. A contractor with experience collaborating with DEAP will provide a specific use case.

PM2-2: Linked Data from the Census Bureau for Evidence Building: Accessing Recent Results</p>
Criminal Justice in the US and Economic Inequality: Results from the Criminal Justice Administrative Records System
Keith Finlay (U.S. Census Bureau)

CJARS is a joint Census Bureau-University of Michigan project started in 2016 to create a national, integrated, harmonized collection of criminal justice microdata at the Census Bureau. The project has three fundamental goals: (1) improve Census Bureau operations, (2) provide valuable aggregate statistical information to criminal justice agencies, and (3) increase the quality and quantity of criminal justice research by making the data available through the Federal Statistical Research Data Centers. The project highlights the opportunities provided by the Census Bureau's Data Linkage Infrastructure. This paper provides new evidence on how felony conviction and imprisonment rates have changed for 30+ birth year cohorts over 185 distinct commuting zones in the U.S. using a novel piece of data infrastructure we have created called the Criminal Justice Administrative Records System (CJARS). We document striking variation in cumulative exposure to the justice system over geography, between birth cohorts, and across demographic groups, and leverage this newly documented variation to assess how changing risk of contact with the justice system correlates with economic outcomes in the U.S.

UMETRICS: Data For Examining How Research is Produced and How it Affects the Broader Economy
Joseph Staudt (U.S. Census Bureau)

The IMI UMETRICS data include information on awards, wage payments from awards to university research employees, vendor purchases, subcontracts, and the unit performing the funded research for 26 universities. These data can be linked to internal Census Bureau data products, such as the Decennial Census, American Communities Survey, Longitudinal Employee-Employer Household Dynamics database (LEHD), and the integrated Longitudinal Business Database, providing researchers with a comprehensive view on the businesses associated with the production of scientific research. This paper provides information on the data available, how researchers can access the data, and results from work in progress by researchers.

Results from the Evidence Building Project Series: Health at Birth, Later Life Achievement, and the Intergenerational Transmission of Advantage

Sarah Miller (University of Michigan)

This paper provides evidence on the long-run and intergenerational impacts of initial health endowments. We link detailed birth certificate records to federally-held survey and administrative data on earnings, educational attainment, and public assistance for all individuals born in California between 1960 and 2014, allowing us to observe measures of health at birth and long-run economic outcomes for over 25 million individuals. For a large subset of these individuals, we are also able to observe outcomes for their children, allowing us to trace the transmissions of health and advantage across generations. Our analysis is the first to document these effects in the United States using data of this size and scope. We use this data to analyze how health at birth within twin pairs, and within siblings, affects long-run and intergenerational health and achievement. We find that individuals with higher birth weights are better off in adulthood along a number of dimensions, and some evidence that this advantage transfers to the next generation in the form of higher birthweights and better economic and health outcomes.

The Census Longitudinal Infrastructure Project - Linked Census Data and Results from the Impact of Preschool on Later-Life Outcomes

Katie Genadek ( U.S. Census Bureau)

The Census Longitudinal Infrastructure Project (CLIP) was created to support research using the linked data at the Census Bureau, including linked mandatory-response census and survey data, and to further develop the linked data infrastructure with expansion to historical data. There are currently more than 12 projects using the linked data at the Census Bureau through the FSRDC network. This paper will describe the linked data available and explain how researchers can access this data. Recent research using this data to analyze the effect of the Lanham Act preschools in the 1940s on later life outcomes will also be discussed.

PM2-3: Communicating Fitness for Use

Federal agencies in the United States produce a wide range of estimates from increasing sources of data to inform evidence-based policy decisions. Communicating the uncertainty of these estimates and the uncertainty of associated inferences (e.g. trends, comparisons) is essential to transparent quality reporting and making informed decisions. In 2016, the American Statistical Association (ASA) released a statement on the use of significance testing, one tool used for interpreting and communicating the uncertainty of statistical data, recommending a decreased reliance on p-values for decision making. This session brings together a panel to discuss communicating statistical uncertainty for federal agencies, including implications of the 2016 ASA statement, information needs of data users and stakeholders, and some alternatives for communicating statistical uncertainty for evidence-based policy decisions.