Client Success Story

State Government Client
Data de-identification and preparation for analytics research

Data De-Identification and Preparation for Analytics Research

Large-scale analytics research projects, involving sensitive data from multiple agencies, help improve program effectiveness and the lives of citizens.  

Data De-Identification and Preparation for Analytics Research

Background

A Navigator government client uses a big data platform to conduct large analytics research project, that often involve bringing sensitive data together from multiple state agencies. Many of these agencies have strict regulations for usage of their data. Therefore, to successfully get agencies to provide their data and conduct these analytics projects, data needs to be de-identified and prepared quickly, in a variety of different ways so that the data science teams can get approved access to the data and conduct their research. 

Challenge

In many circumstances, state agencies will not share data for analytics research projects unless the data was de-identified and prepared to meet regulations. To meet tight deadlines for these research projects, the data de-identification and preparation process must be extremely agile and efficient. Also, de-identified data by itself may not provide enough value to support research, therefore additional data preparation and transformation logic and creativity is necessary to get the most out of the de-identified data sets.

Approach

Navigator’s approach to this solution for the client is to use Alteryx, providing the agility and efficiency to address the problem, as well as all of the necessary de-identification and preparation functions to get the data in the acceptable format for data science research. The de-identification and preparation using Alteryx often includes data redaction, masking, filtering, conditional functions, joins & fuzzy matching across data sets, and calculations to provide relevant information without providing sensitive information. An example of these calculations is providing date differences, or zip code differences rather than providing actual date or zip code fields.

Results

The client has successfully been able to efficiently conduct large scale analytics research projects, involving sensitive data from multiple agencies, to help improve program effectiveness and the lives of citizens.