Back to Projects

BRFSS Data Platform & Analytics Dashboard

Sep 2025 - Dec 2025 Categories: Data Classification, Data Analytics

Role

Sole contributor: completed the end-to-end data science pipeline including data processing, modeling, and analytics

Course

DS5110 – Essentials of Data Science

Objective

This project is based on the Behavioral Risk Factor Surveillance System (BRFSS), a large-scale public health survey dataset released by the U.S. CDC. The dataset spans multiple years and regions, containing complex demographic and health-related variables. The objective was to design and implement an end-to-end data science pipeline that transforms raw, heterogeneous BRFSS data into structured insights through data cleaning, modeling, exploratory analysis, and visualization, enabling interpretable analysis of health risk factors across populations.

Technologies

Python data analysis (Pandas, NumPy), Large-scale data cleaning & ETL Pipeline, Multi-table and hierarchical data structure modeling, Parquet data storage & performance optimization, Exploratory Data Analysis (EDA), Data visualization (Plotly / Matplotlib)

My Contributions

  • Cleaned and processed millions of rows of raw BRFSS survey data
  • Reconstructed hierarchical structures across class, topic, question, and response levels
  • Standardized inconsistent response encodings across years
  • Designed analysis-ready data schemas and optimized storage using Parquet
  • Built reusable aggregation pipelines for multi-dimensional analysis
  • Visualized trends and comparisons across demographic and geographic dimensions

Results

  • Delivered a complete and reusable end-to-end data science pipeline
  • Enabled systematic analysis of health indicators across multiple demographic dimensions
  • Supported longitudinal trend analysis and regional comparisons
  • Produced interpretable results relevant to real-world public health decision-making

GitHub Repository

Data analysis pipeline completed; code will be updated upon completion of the frontend visualization dashboard