2020 FCSM Research and Policy Conference
MONDAY SEPTEMBER 21st
9:00 - 10:00 am
Using Federal Data to Evaluate and Inform:
A Case Study on Increasing Upward Mobility in the U.S.
John Friedman, Brown University
<button style=”width: 900px;”type=”button” class=”collapsible”><p>Morning Concurrent Sessions
10:15 am - 12:00 pm</p></button>
AM1-1: The Evidence Act 101
AM1-2: Data Ethics Frameworks
AM1-3: Nonresponse Bias in Federal Surveys:
Gaps in Knowledge and Future Opportunities
<button style=”width: 900px;”type=”button” class=”collapsible”><p>Afternoon Concurrent Sessions 1</p> <p>1:30 - 3:00 </p></button>
PM1-1: Implementing the Evidence Act: Journey thus Far and the Road Ahead
PM1-2: Leveraging Administrative Data
PM1-3: Leveraging Official Statistical Programs to Address Emerging Issues: Providing Information Relevant ot the Coronavirus Pandemic
<button style="width: 900px;"type="button" class="collapsible"><p><u><strong>Afternoon Concurrent Sessions 2</strong></u></p>
3:15 - 5:00 pm
</button>
PM2-1: Using Data in New Ways:
Leveraging the Evidence Act to
Coordinate Evaluation, Statistics
and Policy
PM2-2: Linked Data from the Census Bureau for Evidence Building: Accessing the Data and Recent Results
PM2-3: Communicating Fitness for Use
<button style=”width: 900px;”type=”button” class=”collapsible”><p>Morning Concurrent Sessions
10:15 am - 12:00 pm</p></button>
AM1-1: The Evidence Act 101
Organizers: Jennifer Edgar (Bureau of Labor Statistics) and Keenan Dworak-Fisher (Office of Management and Budget)
Moderator: Katharine Abraham (University of Maryland)
Panelists:
Diana Epstein (Office of Management and Budget)
Sharon Boivin (Department of Education)
Monique Eleby (U.S. Census Bureau)
Discussant: Emilda Rivers (Statistical Official, National Science Foundation and Director of the National Center for Science and Engineering Statistics)
AM1-2: Data Ethics Frameworks
Organizer and Session Chair: Jessica Graber
(National Center for Health Statistics)</p>
Ethical Considerations for Data Access and Use;</u> Amy
O'Hara (Georgetown University)
Policy and Technology: Ensuring Ethics in the Submission
and Access of Biomedical Research Data;</u> Dina N. Paltoo
(National Institutes of Health)
Ethical Issues in the Development of Complex Machine
Learning Algorithms;</u>
Sara R. Jordan (Policy Counsel, Artificial Intelligence,
Future of Privacy Forum)
Ethical Principles for the All Data Revolution - Repurposing
Administrative and Opportunity Data;</u> Stephanie S. Shipp,
Sallie Keller, and Aaron Schroeder (University of Virginia)
AM1-3:
Nonresponse Bias in Federal Surveys - Gaps in Knowledge and Future
Opportunities Organizer and Moderator: Tala Fakhouri
(National Center for Health Statistics)</p>
Constructing an Inventory of Non-response Bias Studies in
Federal Surveys; </u>Peter Miller (Professor Emeritus at Northwestern University
and U.S. Census Bureau, Retired)
Developing and Assessing Weighting Methods for the
Redesigned National health Interview Survey; James Dahlhamer
(National Center for Health Statistics)
Finding the Right Auxiliary Information for Non-response
Adjustment Models: In Search of Zs with Desirable Properties; Andy
Peytchev (RTI International)
Estimating Survey Non-Response
Bias Using Tax Records; Bruce Meyer (University of Chicago, NBER, AEI, and
U.S. Census Bureau) Afternoon Concurrent Sessions
1 1:30 - 3:00 pm PM1-1:
Implementing the Evidence Act: The Journey thus Far and the Road Ahead! Organizers: Jennifer Edgar
(Bureau of Labor Statistics) and Joe Parsons (National Agricultural Statistics
Service) <Moderator: Hubert Hamer,
Administrator, (National Agricultural Statistics Service)
Gregory Fortelny (Chief Data
Officer, U.S. Department of Education)
Ted Kaouk (Chief Data Officer, U.S. Department of Agriculture)</p>
William W. Beach (Statistical Officer, U.S. Department of Labor and Commissioner
of Labor Statistics, Bureau of Labor Statistics)</p>
Kelly Bidwell (Evaluation Officer and
Statistical Official, General Services Administration)
Samuel C. "Chris" Haffer (Chief Data Officer, U.S. Equal Employment
Opportunity Commission) PM1-2:
Leveraging Administrative Data Organizer and Session Chair: Erik Scherpf (NORC
at the University of Chicago)</p>
PM1-3: Leveraging Official Statistical Programs to Address
Emerging Issues: Providing Information Relevant to the Coronavirus Pandemic Organizers: Jaki McCarthy (National Agricultural Statistics Service) and Jennifer
Edgar (Bureau of Labor Statistics) Session Chair: Jennifer Edgar
(Bureau of Labor Statistics) The IRS Office of Research,
Applied Analytics and Statistics used Innovative Nimble Approaches to Support
Decision Making and Evaluation Related to the Corona Virus Pandemic; Holly Donnelly
(Internal Revenue Service) Expanding the Use
of NCHS' Research and Development Survey to Quantify Health Characteristics
During the Coronavirus Pandemic; Paul Scanlon and Katherine Irimata
(National Center for Health Statistics)</p>
New Data for New
Purposes;
Rolf
Schmitt (Bureau of Transportation Statistics) Discussant: Chris Marokov (Office of Management and Budget) Afternoon Concurrent Sessions
2 3:15 - 5:00 pm </p>
PM2-1:
Using Data in New Ways: Leveraging the Evidence Act to Coordinate Evaluation,
Statistics and Policy Organizers: Jennifer Edgar
(Bureau of Labor Statistics) and Erica Zielewski (Office of Management and
Budget)</p>
Session Chair: Jennifer Edgar
(Bureau of Labor Statistics)</p>
U.S. Department of Housing and
Urban Development's Experience Supporting and Enhancing its Data Infrastructure
and Use; Calvin Johnson (U.S.
Department of Housing and Urban Development) PM2-2: Linked Data
from the Census Bureau for Evidence Building: Accessing the Data and Recent
Results Organizer and Session Chair: Katie Genadek
(U.S. Bureau of the Census)</p>
PM2-3:
Communicating Fitness for Use Organizer and Moderator: Jennifer Parker
(National Center for Health Statistics)</p>
Amy Branum (National Center for Health Statistics)</p>
Marilyn Seastrom (National Center for Education Statistics) Regina Nuzzo (American Statistical Association)</p>
Robert Sivinski (U.S. Office of Management and Budget)</p>
Samantha Tyner (Bureau of Labor Statistics)</p>
Morning Concurrent Sessions AM1-1: The Evidence Act 101 The Evidence Act 101 session
will provide a high-level overview of the main components of the Foundations
for Evidence-Based Policymaking Act of 2018 (the Evidence Act), as well as the
motivation and vision behind it. With an extended introduction from Dr.
Katharine Abraham, who was involved with the original Commission for
Evidence-Based Policymaking, the audience will hear about learning agendas and
how to cultivate plans for evidence building, data governance and data
inventories, and the presumption of accessibility to data that the Act
provides. The session will conclude with a discussion of what the Act
means for statistical agencies. AM1-2: Data Ethics Frameworks</p>
Amy O'Hara, Georgetown University Project leads and
data owners typically focus on legal and policy requirements when sharing data
for research and evaluation, relying on written laws, regulations, standards,
and policies. Ethical issues are seldom addressed in the same manner. Limited
guidance exists to span the sectors, domains, and disciplines involved. We
review materials available to guide decisions that data owners, controllers,
analysts, and regulators face about whether and how data can be used
responsibly. These decisions address concerns about possible or likely harms
affecting individuals and groups, at present and into the future. We discuss
cross-sector and interdisciplinary projects that are developing ethical
guidelines and identifying best practices, and we identify the role that data
intermediaries can play in establishing transparent practices that facilitate
ethical data sharing. Christopher S.
Lee, JD, CIPP, Chief Privacy Officer; United States Senate, Sergeant at Arms The Fourth
Industrial Revolution (4IR) has started. 5G networks, Cloud Computing and
Quantum Computing are being integrated using artificial intelligence, machine
learning and software. 4IR will be bigger than the dotcom revolution of the
1990s, and creates opportunities to process and use exponentially more data to
make better, informed decisions faster than ever before. The 4IR will usher in
a new wave of technical products and services. It will also create tools that
can be used to benefit society or infringe upon privacy and civil liberties.
This session will introduce the concept of 4IR and lay the ground work for
identifying and addressing associated data ethics issues. Dina N. Paltoo,
National Institutes of Health NIH has a large and
growing number of valuable data repositories for human research data.
Facilitating access to these data safely and in a manner that honors the
privacy of the research participant requires creating innovative approaches to
facilitate data submission and access. Accelerating data-driven discovery and
providing the best return-on-investment on existing data and resources, in
order to accelerate and improve science and build trust in the research
enterprise, necessitates that the National Institutes of Health (NIH)
facilitate the reuse of data collected in one study for use in future research.
The first step is to ensure responsible stewardship of data, through policy and
technology, such that data are submitted and made available in a manner that is
consistent with the original conditions (e.g., consent) under which the data
were collected, as well as in accordance with Federal regulations for
de-identifying data and protecting participant privacy. Effective models exist
for sharing data while providing necessary protections, such as
controlled-access (e.g., to large-scale human genomic data) or results
dissemination (e.g., of registered clinical trials), in addition to new models
of data stewardship that use cloud-based approaches. Ethical Issues in the Development of
Complex Machine Learning Algorithms Sara R. Jordan,
Virginia Tech Many statements of
ethics for machine learning and artificial intelligence (AI/ML) are written at
a high level that does not acknowledge fully the complexity of developing
machine learning algorithms. Specifically, while statements that AI/ ML ought
to be "transparent" or "explicable" are easily laudable, they are not
technically feasible except when coupled with some extraordinary steps taken by
programmer teams and their team leaders. In this presentation, I take ethics
for AI/ML down from the high level statements to explanations of in medias res
techniques for how to build explicable and accountable AI. I will focus on the
development of neural network models in the realm of natural language
processing algorithms alone in order to demonstrate where the intersections of
ethical norms and technical practices will change conventional technical practices,
such as data collection, transformation, model building, and model testing.
Model deployment in consumer products will be discussed briefly. Ethical Principles and Data Science - Repurposing Administrative and
Opportunity Data Stephanie S. Shipp (University of Virginia), Sallie Keller
(University of Virginia), Aaron Schroeder (University of Virginia) The data revolution has transformed the conduct of social science
research through the incorporation of data science, but ethical dimensions
should not be compromised. Researchers can now observe behavior based on
repurposing existing administrative and opportunity data without consent or
awareness by those providing the data. The principles set forth in the Belmont
Report on Ethical Principles and Guidelines for the Protection of Human
Subjects of Research are still as applicable as when these principles were
first established in 1978. Discussions about ethics need to be a natural part
of every research project, especially when repurposing data for analytical
purposes. A publicly-shared ethical checklist at each research stage can help
researchers identify and frame any potential concerns and evaluate their
relative impacts. A key part of this checklist is the assessment of implicit
biases. Ethical principles require the implementation of everyday practices
around documentation, transparency, ongoing discussion, questioning, and
constructive criticism. We will discuss the history of these ethical principles
and our experiences implementing them into our
research. AM1-3: Nonresponse
Bias in Federal Surveys: Gaps in Knowledge and Future Opportunities Peter Miller, Northwestern University and U.S. Census Bureau, Retired. This presentation summarizes the first systematic review of nonresponse
bias (NRB) studies involving Federal Surveys
since the release
of the 2006 OMB Standards and Guidelines for Statistical
Surveys. </i>NRB reports were identified through searches on PubMed, Google
Scholar, Current Index to Statistics, Joint Statistical Meeting proceedings and
through an open call to Federal statistical agencies and associated
professional organizations. Some 165 studies were identified - 89 concerning
establishment surveys and 76 involving household surveys. About 40 percent of
the NRB studies were done during the period shortly after the 2006 OMB
guidance. The methods employed for assessing NRB differed for establishment and
household surveys. Studies involving
establishment surveys mostly compared survey estimates to external (frame)
data, while those involving household surveys mostly examined variations of
estimates within the response set (e.g. early and late responders). A majority
of studies reported some NRB in estimates prior to weighting and some reduction
in bias after adjustment. The efficacy of weighting was often not explicitly
documented in the reports. This systematic review is a first step in continuing
research on NRB in Federal surveys.</p>
Developing
and Assessing Weighting Methods for the Redesigned National Health Interview
Survey Ronaldo Iachan ICF, National Center for Health Statistics. In 2019, the National Health Interview Survey (NHIS) released its first
redesigned instrument since 1997. In this paper, we present the results of a
collaboration between the National Center for Health Statistics (NCHS) and ICF
to evaluate weighting methods for use with the redesigned NHIS. The evaluation
focused on the use of machine learning and multilevel logistic regression to
assess nonresponse (NR) bias in key NHIS health estimates and develop NR bias
adjustments for household, adult, and child sample weights. We start by reviewing the data sources which provided potential
predictors at different levels. The nonresponse models incorporated predictors
from the NHIS Contact History Instrument and Neighborhood Observation
Instrument, and auxiliary data from the Area Health Resource File and the
Census Planning Database. The analysis employed machine learning methods such
as lasso, random forest, and decision trees, as well as more traditional
single-level and multilevel logistic regression, to find best-fitting models to
use in NHIS nonresponse bias adjustments for household, adult, and child
weights. We define key health indicators used in the nonresponse bias analysis
at these different levels, and discuss data sources, decision processes,
methodology and results focused on bias reduction. We also present results from
capping the NR adjustment factors to limit variance inflation, as well as
overlaying raking on top of NR adjustments to extend the simple demographic
post-stratification currently used in the NHIS. Andy Peytchev, RTI International Declining response rates increase the dependency of survey estimates on
postsurvey adjustments. The identification of auxiliary information is becoming
increasingly important. This presentation starts with a discussion of flaws in
common current practice with regard to nonresponse adjustment models. It is
followed by an overview of desirable properties of auxiliary information, and
related challenges. In the third part, promising avenues for improvement are
introduced, along with several illustrative examples from the research
literature. Bruce Meyer,
National Bureau of Economic Research, American Enterprise Institute, and U.S.
Census Bureau Declining survey response rates is a widespread and troubling problem
that raises the possibility of bias in key statistics. We propose and implement
a new method to determine nonresponse bias by linking income tax records
to respondents and nonrespondents by address. In light of the importance of income in assessing
poverty, inequality, and material well-being, we focus on income but also
examine bias along other dimensions measured on tax returns such as marital
status and family
size. To provide a framework, we first describe a theory of testing for
differences between populations when linkage to validation data is incomplete. We then apply
the methods to the Current Population Survey (CPS), the most used economic
survey and the source of official employment, income, poverty, and inequality statistics. We link the CPS to IRS Form 1040 records, comparing
several characteristics of
respondents and nonrespondents, including income, its components,
self-employment status, marital status, number of children, and the receipt of
social security. We find little evidence of differences between the percentiles
of the income distributions of the linked respondents and nonrespondents. We also find little difference between the income
distributions of ASEC respondents
and CPS Basic respondents who decline to participate in the ASEC (whole imputes).
However, we find significant differences between
respondent and nonrespondents in marital status, the number of children, and
other characteristics. Afternoon Concurrent Sessions 1</p></button>
PM1-1: Implementing the Evidence
Act: Journey thus Far and the Road Ahead The
Foundations for Evidence-Based Policymaking Act of 2018 requires data from federal agencies to be
accessible and requires agencies to plan to develop statistical evidence to
support policymaking. To facilitate this, three newly-designated
positions were created: Chief Data Officers (CDOs), Evaluation Officers (EOs)
and Statistical Officials (SOs). These are the key players who will lead
federal agencies through the changes required to meet the requirements laid out
in the Evidence Act, propelling the federal statistical system into a new era. In this session, we bring together CDOs, EOs and SOs and ask
them about their experiences thus far implementing the Evidence Act, and their
thoughts about the road ahead. There will be ample
time for discussion with the panel, we encourage attendees to bring their
questions! PM1-2: Leveraging
Administrative Data Adela Luque (U.S. Census Bureau) The Survey of Business Owners (SBO) was the only comprehensive source of
information on business demographics. To address increasing nonresponse rates
and costs, and a rising demand for more frequent and timely data, the Census
Bureau has consolidated three business surveys. One of the consolidated surveys
is the SBO. The nonemployer component of the SBO will be accomplished through a
new blended-data approach that leverages existing administrative (AR) and
census records to assign demographic characteristics to the universe of
nonemployers, and produce an annual series that will become the only source of
nonemployer demographics estimates. This new series is the Nonemployer
Statistics by Demographics series or NES-D. Meeting the public's needs, NES-D
will provide reliable estimates with no respondent burden on a more frequent
and timely basis than the SBO. Using the 2014-2016 vintages of nonemployer
businesses and demographic information from the decennial census, the American
Community Survey, the Census Numident and AR from the Department of Veteran
Affairs, we discuss preliminary results, the challenges encountered along the
way, and next steps. Marcus Berzofsky (RTI International), Dan Liao (RTI International),
Alexia Cooper (Bureau of Justice Statistics) Administrative data collected through a set of agencies (e.g., law
enforcement, schools) can be a rich source of information, but misleading, if
the data suffer from quality issues such as item missingness or incomplete
coverage. When the data source suffers from incomplete coverage, the data are
not representative of the population. If a census is not possible, one
alternative is to select a probability sample of nonparticipating agencies,
collect their data, and blend them with the reporting agencies. The FBI's National Incident-Based Reporting System
(NIBRS) collects incident-based information on all crimes reported to the
police. Currently, 33% of law enforcement agencies in the US submit to NIBRS,
but these agencies mainly represent less populated parts of the country. The
National Crime Statistics Exchange program is recruiting a probability sample
of 400 agencies designed to produce nationally representative estimates when blended
with the existing reporting agencies. However, the methodology for addressing
quality issues and producing estimates is complex. We describe how we intend to
address these issues and the plan for developing the appropriate estimation
methodology.</p>
Analyzing Research and Development Trends Using Administrative Data Kathryn Linehan, Eric Oh, Joel Thurston, Stephanie Shipp,
and Sallie Keller (University of Virginia) The Federal Government accounts for about one-fourth of total Research
and Development (R&D) funding in the United States-but what exactly does
this public funding support? While the National Center of Science and
Engineering Statistics (NCSES) provides high-level data from surveys on the
disposition of federal obligations for R&D, more granular research
characteristics (e.g., project topic) remain untapped. Federal agencies also
release publicly available administrative data that describe projects in far
greater detail (e.g., Federal RePORTER). This presentation documents the
usefulness of these administrative data to enhance and supplement NCSES surveys
of federal funding. Using grant abstracts in Federal RePORTER and topic
modeling, we discover latent R&D research topics in the database and
analyze their trends over time to discover emerging topics. We also complete a
pandemics case study that utilizes information retrieval techniques along with
topic modeling to perform a deeper dive into a specific area of interest that
is not captured in enough detail by the topic model on the entire database.
Initial results show that we can capture specific R&D research trends from
administrative data in Federal RePORTER. Sarah Grady (National Center for Education Statistics), Emily Isenberg
(American Institutes for Research) The National Center for Education Statistics (NCES), within the U.S.
Department of Education (ED), developed supplementary geocode data files for
the Early Childhood Program Participation, Parent and Family Involvement in
Education, and Adult Training and Education surveys of the 2016 National
Household Education Surveys Program. The geocode files use sample members'
addresses to integrate data from other federal agencies and ED administrative
data collections. The data files include new radius-based measures of household
proximity to educational opportunities and job search assistance. The
presentation will provide an overview of how the geocode files demonstrate some
of the goals of evidence-based policymaking. The presentation will also discuss
some of the challenges inherent in creating the files. It will discuss the
challenges encountered in identifying auxiliary data sources, evaluating them
for appropriateness, and in assessing disclosure risk of the resulting files.
Data timeliness and cost will also be discussed. An Approach to Tiered Access in the
Department of Veterans Affairs Michael Schwaber (U.S. Department of Veterans
Affairs) This presentation describes the VA OEI Office
of Data Governance and Analytics' (DGA) development of tiered data access in an
evolving environment of privacy, as well as future plans to expand access while
protecting confidentiality. DGA is an organization that stores, links,
processes, and distributes large amounts of veteran data, some of which contain
personally identifiable information (PII). Recent updates to data policy and
new laws encourage more data access across agencies and to the public.
This has led to DGA's examination of its disclosure risk mitigation strategies
to better protect its data. DGA is developing access tiers defined by combinations
of protection levels on each of the elements of the "Five Safes" framework. PM1-3:
Leveraging
Official Statistics Programs to Address Emerging Issues: Providing Information
Relevant to the Coronavirus Pandemic Federal
Statistical agencies have long standing programs and data collection to provide
official statistics on all aspects of the US - population characteristics, the
economy, education, health, and more. These programs, which include
time-series data that have been relatively unchanged for many years, are the
result of years of effort required to develop, execute and produce important
national estimates. However, information relevant to major unexpected events,
such as the sudden development of the Coronavirus Pandemic in the US, may not be
captured well as part of those ongoing collections, leaving a gap in the
information available to base policy decision on. In 2020, many Federal
Statistical Agencies nimbly addressed the need for data on the pandemic by
leveraging the ongoing survey programs. Examples from BLS, Census, IRS,
NCHS and BTS will illustrate how statistical agencies were able to support
evidence based decision making relevant to a pandemic that was not part of
anyone's planning. Some added new information to the existing collections,
while others found new uses for existing data. But all contributed to
efforts to support policy evaluation and impact. Adding
COVID-19 questions to the CPS Emy
Sok, Karen Kosanovich, Bureau of Labor Statistics & Tim Marshall, U.S.
Census Bureau As
the coronavirus (COVID-19) pandemic began to spread in the US, both the Census
Bureau and Bureau of Labor Statistics began to consider how the monthly Current
Population Survey (CPS) could be utilized to collect information related to the
pandemic. The timely release of estimates from the monthly household
labor force survey would show the impact of the pandemic and efforts to contain
it on measures of employment and unemployment. We had a rare opportunity
to quickly add a limited number of questions that might offer more information
about how people were affected by the pandemic. The new
questions were crafted, reviewed, and submitted in a few weeks as the
impact of the pandemic was still unfolding. The programming, testing, and
fielding of these new items occurred as survey operations were changing due to
the public-health constraints resulting from the pandemic. Emy Sok (BLS),
Karen Kosanovich (BLS), and Tim Marshall (Census) will discuss the challenges
of adding these new questions to an ongoing monthly survey. </p>
Near-real time surveillance of COVID-19 mortality using data from the
National Vital Statistics System Paul Sutton, Division of Vital Statistics, National Center for Health
Statistics, Centers for Disease Control and Prevention Lauren M. Rossen, Division of Research and Methodology, National Center
for Health Statistics, Centers for Disease Control and Prevention The Vital Statistics Rapid Release (VSRR) program provides access to the
timeliest vital statistics for public health surveillance of important
mortality outcomes, based on a current flow of vital statistics data from state
vital records offices. In response to the urgent need for data to inform
decisions related to the COVID-19 pandemic, the VSRR program was quickly
expanded to tabulate and publish COVID-19-related provisional mortality data.
These data include daily updates of the counts of COVID-19 deaths by week for
the United States, and by jurisdiction of occurrence. Several other data files
and visualizations are produced and published weekly, including counts of
COVID-19 deaths by various demographic factors such as age, race and Hispanic
origin, and place of death. Data files are published on an open data platform
to ensure accessibility. Additionally, provisional data from NVSS are used to
examine the data quality of other sources of information on COVID-19 mortality,
and to monitor and disseminate data on excess deaths associated with COVID-19. Prior to the COVID-19 pandemic, the VSRR program published provisional
estimates of mortality with a 3-9 month lag between the data and the date of
analysis, depending on the cause of death. This lag was based on analyses of
timeliness and completeness of provisional data. The urgent nature of the
COVID-19 pandemic necessitated the release of data as quickly as possible;
waiting 3 months for the data to become more complete would render the data
obsolete. As such, it was necessary to provide more information about the
completeness of the mortality data, and to conduct related analyses on an
ongoing basis to monitor the data flow and timeliness. These analyses have
suggested that the NVSS provisional data on COVID-19 deaths track about two
weeks behind other data sources (e.g., media reports and other COVID-19
tracking systems). The strengths of the NVSS as the most comprehensive and consistent source
of mortality data for the US make it a valuable resource for examining data
quality across other sources, such as case reporting systems. Additionally, the
availability of historic data in the NVSS has allowed for the estimation and
ongoing monitoring of excess deaths (the number of deaths from all causes above
expected levels). Estimates of excess deaths can provide information about the
burden of mortality potentially related to the COVID-19 pandemic, including
deaths that are directly or indirectly attributed to COVID-19. This metric may
be particularly important for monitoring COVID-19 mortality and related trends,
given potential differences in testing and reporting of COVID-19 deaths. The IRS Office of Research, Applied Analytics and Statistics used
innovative nimble approaches to support decision making and evaluation related
to the corona virus pandemic. Holly Donnelly, IRS The IRS built a new graph data model to analyze entities with employment
tax requirements and Form 941 Schedule R filings. The model will be used by
Exam, CI, and others to identify entity relationships, distribution of claims
and credits, patterns of non-compliance (including fraud), estimate Exam
workload, and optimize case selection activities. The Economic Impact Payment
(EIP) is a payment meant to stimulate the economy and put money in the hands of
taxpayers and U.S. citizens in a time of need. The EIP also creates an
opportunity for identity thieves as it creates a new mode of entry into the
filing population (See
https://krebsonsecurity.com/2020/04/new-irs-site-could-make-it-easy-for-thieves-to-intercept-some-stimulus-payments/
). Normally, identity thieves must claim a refund on a return in order to profit
from ID theft fraud, which subjects a return to multiple selection filters that
are looking for ID Theft and other forms of refund fraud. The EIP removes the
scrutiny as thieves can now file a fraudulent $0 return, avoid RRP and DDb
filters, and receive a stimulus check or direct deposit. RAAS is working
closely with RICS to start a system for fraudulent EIP detection and
selection. Mechanisms for detection will be driven by RAAS and our
contractors. W&I is expecting 18 million marginal returns this year as a
result of EIP. ID Theft not only allows stimulus payments to go to identity
thieves who may or may not spend those checks in the U.S., but also makes it
difficult for the appropriate person claim their rightful stimulus check in a
timely manner. Given the absence of a systemic selection process for this
vulnerable population, we currently estimate that we need at least $200K in
additional funding and potentially an additional $200K to prepare for PY2021
related schemes as unclaimed EIPs will show up as credits on TY2020 accounts. RAAS estimated the number of businesses eligible for advance tax credits
to cover health care costs and paid leave to assist Collection in anticipating
workload. We used counts of employers filing employment tax returns including
the 941, 944, 934 and CT-1. RAAS also estimated likely health care costs to
assist in identifying claims that necessitated further scrutiny before the tax
credits were advanced. Using data from the W-2s, we reported health care costs
by number of employees to establish guidelines. We are also building a model to estimate the impact of sudden economic
downturn on IRS Collection resources. Using data from our compliance data
warehouse paired with published data from other statistical agencies, our model
will forecast, by industry and geographic area, how many individual filers will
likely owe taxes they cannot pay and enter the Collection workstream. This
effort will be challenging because of recent "tax reform" that caused a rise in
individual balance due returns for tax year 2019. A related project will
project additional work for the insolvency group to secure the government's
interests in cases of bankruptcy. Expanding the Use of NCHS' Research and Development Survey to Quantify
Health Characteristics During the Coronavirus Pandemic Paul Scanlon, Katherine Irimata, National Center for Health Statistics The National Center for Health Statistics' (NCHS) Research and
Development Survey (RANDS) is a series of surveys collected from commercial
panels for methodological research purposes. Until now, the goals of RANDS have
been solely methodological. On one hand, NCHS has used RANDS to refine
mixed-method question evaluation techniques that can be integrated into its
cognitive interviewing program. On the other hand, it has explored calibration
methods that leverage the strength of NCHS' established core surveys to produce
estimates from commercial web panels. These estimates have not been released;
however, since they are considered experimental. In response to the Coronavirus pandemic, NCHS expanded the use of the
RANDS platform to rapidly monitor aspects of the public health emergency
including the inability to work due to illness with COVID-19, telemedicine
before and during the pandemic, and problems accessing specific types of health
care due to the pandemic. The RANDS during COVID-19 survey was fielded in two
rounds during the summer of 2020 and experimental estimates were publicly
released for both rounds. This has been a joint effort between the Division of
Research and Methodology's (DRM) Collaborating Center for Questionnaire Design
and Evaluation Research (CCQDER) and Collaborating Center for Statistical
Research and Survey Design (CCSRSD) and involved the development of the questionnaire,
including the development of COVID-19 related questions, as well as the
calibration of the RANDS data to NCHS' National Health Interview Survey (NHIS)
in an effort to adjust for some of the potential bias in the panel. Through
the expansion of this existing experimental platform, NCHS was not only able to
evaluate approaches to asking about Coronavirus- and pandemic-related topics in
a timely manner, but also rapidly respond to, and provide relevant information
about, COVID-19 in the United States. New Data for New Purposes Rolf Schmitt, Bureau of Transportation Statistics Travel restrictions and warnings had enormous and immediate effects on
the U.S. transportation system. The Bureau of Transportation Statistics (BTS)
realized from anecdotal evidence that the speed and magnitude of change placed
a premium on daily and weekly statistics rather than the Bureau's traditional
focus on annual and monthly statistics. The premium on timely statistics also
demanded rethinking of deliberative data quality processes. Preliminary
estimates were becoming more important than ever just as the stable trends that
formed the traditional basis for preliminary estimates were undone. BTS
responded to these challenges by tapping new data sources, by taking different
approaches to preliminary estimates, and by adopting a rapid prototyping
strategy for statistical product development. The response will be summarized
by Rolf Schmitt, the BTS Deputy Director. Afternoon Concurrent Sessions 2</p></button>
PM2-1: Using Data in
New Ways: Leveraging the Evidence Act to Coordinate Evaluation, Statistics and
Policy Erica Zielewski, OMB In brief introductory remarks, this presentation will summarize key
elements of the Act with an emphasis on those places that highlight
coordination and collaboration between evaluation and statistics functions and
roles, such as the creation of multi-year learning agendas and Annual
Evaluation Plans. These remarks will also introduce the examples that follow
each of which brings something different to the discussion: one example of an
agency that generally makes data available; an example of how an evaluation
shop and statistical unit work together within one organization; an example
focused on state data; and an example of leveraging administrative data from an
agency for research purposes. Calvin Johnson, HUD HUD traditionally prioritizes its data capacity (see for example the data
section in their Research Roadmap) and has spent effort to get their data into
the hands of people that can use it HUD has also done some Valeria Butler, ASPE and ACF/HHS, and Emily Madden, ASPE/HHS HHS recently launched this effort to link and build datasets that can be
used for research and policy. It is unique and innovative, but nascent, so the
presentation would focus more on intentions/plans vs. concrete
activities/outcomes Christina Yancey,
Chief Evaluation Officer, DOL; Scott Gibbons, Chief Data Officer, DOL; and
David Judkins, ABT Associates This presentation will include speakers from DOL discussing the DEAP
tool, and will begin with a discussion of the critical role of
capacity-building for ongoing evidence building, followed by an introduction
and overview of the DEAP tool. A contractor with experience collaborating with
DEAP will provide a specific use case. PM2-2: Linked Data from the Census Bureau for Evidence
Building: Accessing Recent Results</p>
Keith Finlay (U.S. Census Bureau) CJARS is a joint Census Bureau-University of Michigan
project started in 2016 to create a national, integrated, harmonized collection
of criminal justice microdata at the Census Bureau. The project has three
fundamental goals: (1) improve Census Bureau operations, (2) provide valuable
aggregate statistical information to criminal justice agencies, and (3)
increase the quality and quantity of criminal justice research by making the
data available through the Federal Statistical Research Data Centers. The
project highlights the opportunities provided by the Census Bureau's Data
Linkage Infrastructure. This paper provides new evidence on how felony
conviction and imprisonment rates have changed for 30+ birth year cohorts over
185 distinct commuting zones in the U.S. using a novel piece of data
infrastructure we have created called the Criminal Justice Administrative
Records System (CJARS). We document striking variation in cumulative exposure
to the justice system over geography, between birth cohorts, and across
demographic groups, and leverage this newly documented variation to assess how
changing risk of contact with the justice system correlates with economic
outcomes in the U.S. Joseph Staudt (U.S. Census Bureau) The IMI UMETRICS data include information on awards, wage payments from
awards to university research employees, vendor purchases, subcontracts, and
the unit performing the funded research for 26 universities. These data can be
linked to internal Census Bureau data products, such as the Decennial Census,
American Communities Survey, Longitudinal Employee-Employer Household Dynamics
database (LEHD), and the integrated Longitudinal Business Database, providing
researchers with a comprehensive view on the businesses associated with the
production of scientific research. This paper provides information on the data
available, how researchers can access the data, and results from work in
progress by researchers. Results from the Evidence Building
Project Series: Health at Birth, Later Life Achievement, and the
Intergenerational Transmission of Advantage Sarah Miller (University of Michigan) This paper provides evidence on the long-run and intergenerational
impacts of initial health endowments. We link detailed birth certificate
records to federally-held survey and administrative data on earnings,
educational attainment, and public assistance for all individuals born in
California between 1960 and 2014, allowing us to observe measures of health at
birth and long-run economic outcomes for over 25 million individuals. For a
large subset of these individuals, we are also able to observe outcomes for
their children, allowing us to trace the transmissions of health and advantage
across generations. Our analysis is the first to document these effects in the
United States using data of this size and scope. We use this data to analyze
how health at birth within twin pairs, and within siblings, affects long-run
and intergenerational health and achievement. We find that individuals with
higher birth weights are better off in adulthood along a number of dimensions,
and some evidence that this advantage transfers to the next generation in the
form of higher birthweights and better economic and health outcomes. The Census Longitudinal Infrastructure
Project - Linked Census Data and Results from the Impact of Preschool on
Later-Life Outcomes Katie Genadek ( U.S. Census Bureau) The Census Longitudinal Infrastructure Project (CLIP) was created to
support research using the linked data at the Census Bureau, including linked
mandatory-response census and survey data, and to further develop the linked
data infrastructure with expansion to historical data. There are currently more
than 12 projects using the linked data at the Census Bureau through the FSRDC
network. This paper will describe the linked data available and explain how
researchers can access this data. Recent research using this data to analyze
the effect of the Lanham Act preschools in the 1940s on later life outcomes
will also be discussed. PM2-3: Communicating
Fitness for Use Federal agencies in the United States produce a wide range of estimates
from increasing sources of data to inform evidence-based policy
decisions. Communicating the uncertainty of these estimates and the
uncertainty of associated inferences (e.g. trends, comparisons) is essential to
transparent quality reporting and making informed decisions. In 2016, the
American Statistical Association (ASA) released a statement on the use of
significance testing, one tool used for interpreting and communicating the
uncertainty of statistical data, recommending a decreased reliance on p-values
for decision making. This session brings together a panel to
discuss communicating statistical uncertainty for federal agencies, including
implications of the 2016 ASA statement, information needs of data users and
stakeholders, and some alternatives for communicating statistical uncertainty
for evidence-based policy decisions.
Using Administrative Records Data to Produce Business
Statistics: The Nonemployer Statistics by Demographics Series (NES-D);</u>James Noon and
Adela Luque (U.S. Census Bureau)
Blending Administrative Data with a Probability Sample of
Nonparticipants to Produce National Estimates: The NCS-X NIBRS Estimation
Project</u>;
Marcus Berzofsky, Dan Liao (RTI International) and Alexia
Cooper (Bureau of Justice Statistics)
Analyzing Research and Development Trends Using Administrative
Data; Kathryn Linehan, Eric Oh, Joel Thurston, Stephanie Shipp,
and Sallie Keller (University of Virginia), John Jankowski and Audrey Kindlon
(National Center for Science and Engineering Statistics)
Integrating Survey and Administrative Data Across Sources
and Across Agencies to Create Statistical Products: A Case Study from Education; Sarah
Grady (National Center for Education Statistics) and Emily Isenberg (American
Institutes for Research)
An Approach to Tiered Access in the Department of Veterans
Affairs; </u>Michael Schwaber (U.S. Department of Veterans Affairs)
Adding
COVID-19 Questions to the CPS; Emy Sok and Karen
Ksanovich (Bureau of Labor Statistics) & Tim Marshall (U.S. Census Bureau)
Near
Real-Time Surveillance of COVID-19 Mortality using Data from the National Vital
Statistics System;
Paul Sutton and Lauren Rossen (Centers for Disease Control
and Prevention)
Framing the Evidence Act's Vision for Coordination and
Collaboration;</u>
Erica Zielewski (Office of Management and Budget)
Linking State Medicaid Data and Child Welfare Data for
Outcomes Research</u>; Valeria Butler (ASPE and ACF/HHS) and Emily Madden
(ASPE/HHS)
Department of Labor's Data Exchange and Analysis Platform
(DAEP);</u>
Christina Yancey (Chief Evaluation Officer, Department of
Labor), David Judkins (Abt Associates) and Scott Gibbons (Department of Labor)
Criminal Justice in the US and Economic Inequality: Results
from the Criminal Justice Administrative Records System</u>; Keith Finlay
(U.S. Census Bureau)
UMETRICS: Data for Examining How Research is Produced and
How it Affects the Broader Economy;</u> Joseph Staudt
(U.S. Census Bureau)
Results from the Evidence Building Project Series: Health
at Birth, Later Life Achievement, and the Intergenerational Transmission of
Advantage;</u>
Sarah Miller (University of Michigan)
The Census Longitudinal Infrastructure Project - Linked
Census Data and Results from the Impact of Preschool on Later-Life Outcomes;</u> Katie
Genadek, U.S. Census Bureau
Panelists:
Abstracts
<button style="width: 900px;"type="button" class="collapsible">
Ethical Considerations for Data Access and Use
Data Ethics & the Fourth Industrial Revolution
Policy and Technology: Ensuring Ethics
in the Submission and Access of Biomedical Research Data
Constructing an inventory of
nonresponse bias studies in federal surveys
Finding the Right Auxiliary
Information for Nonresponse Adjustment Models: In Search of Zs with Desirable
Properties
Estimating Survey Nonresponse Bias Using Tax Records
Using Administrative Records Data to
produce Business Statistics: the Nonemployer Statistics by Demographics Series
(NES-D)
Blending Administrative Data with a
Probability Sample of Nonparticipants to Produce National Estimates: The NCS-X
NIBRS Estimation Project
John Jankowski and Audrey Kindlon (National Center for Science and Engineering
Statistics)
Integrating Survey and Administrative
Data Across Sources and Across Agencies to Create Statistical Products: A Case
Study from Education
Framing the Evidence Act's Vision for
Coordination and Collaboration
HUD's Approach to Making Data
Available for Research and Evaluation
cool unique
data linkages (for example, with Census and with NCHS), and more broadly tried
to make data more accessible for use. We would like you to discuss how this has
worked in practice.
Linking State Medicaid Data and Child
Welfare Data for Outcomes Research
The Department of Labor's Data
Exchange and Analysis Platform (DEAP)
Criminal Justice in the US and
Economic Inequality: Results from the Criminal
Justice Administrative Records System
UMETRICS: Data For Examining How
Research is Produced and How it Affects the Broader Economy