top of page

PORTFOLIO

Tennessee K–12 Academic Outcome Analysis

Overview:

Analyzed multi-year education data across Tennessee schools to understand the relationship between funding, student performance, and demographic factors.

What I Did:

  • Integrated 90+ datasets into a relational database

  • Cleaned and standardized multi-year data (2018–2024)

  • Applied statistical models (GAMM, ordinal logistic regression)

  • Conducted comparative and correlation analyses

Tools:

R, SQL, Microsoft Access

Key Findings:

  • Per-pupil spending showed weak and inconsistent relationships with outcomes

  • ACT scores and graduation rates strongly aligned with TVAAS performance

  • Demographics were strongly associated with student outcomes

Impact:

This project highlights how data-driven analysis can challenge assumptions about education funding and provide insights for policy and decision-making.

Machine Learning – Fraud Detection

Overview:

This project focuses on detecting fraudulent credit card transactions using machine learning techniques. By analyzing transaction data, the goal was to identify patterns that distinguish fraudulent activity from normal behavior and evaluate model performance in a real-world classification setting.

What I Did:

  • Explored and analyzed a dataset of 10,000 credit card transactions

  • Cleaned and preprocessed data, including normalization and handling class imbalance

  • Performed exploratory data analysis to identify patterns in transaction behavior

  • Built and compared multiple machine learning models:

    • Logistic Regression

    • Decision Tree

  • Evaluated model performance using key metrics such as accuracy, precision, recall, F1-score, and AUC

  • Visualized results using ROC curves and confusion matrices

Tools:

Python, R, Machine Learning Models (Logistic Regression, Decision Trees)

Key Findings:

  • Class imbalance significantly impacted model performance and required preprocessing adjustments

  • Logistic Regression achieved strong overall performance with a high AUC score (~0.94)

  • Decision Trees improved fraud detection after balancing the dataset, increasing recall for fraudulent transactions

  • Accuracy alone was not a reliable metric due to imbalance; recall and AUC provided better evaluation insight

Impact:

This project demonstrates how machine learning can be applied to real-world fraud detection problems, highlighting the importance of model evaluation and data preprocessing in building reliable classification systems.

Cardiovascular Disease Risk Prediction Using Machine Learning

Overview:

This project focuses on predicting the risk of cardiovascular disease using large-scale public health data. By analyzing behavioral, clinical, and demographic factors, the goal was to build machine learning models that identify individuals at higher risk and support early intervention strategies.

What I Did:

  • Analyzed large-scale health datasets (300,000+ records) from the BRFSS survey

  • Cleaned and prepared data, including handling missing values and encoding categorical variables

  • Conducted exploratory data analysis to understand relationships between health behaviors and disease risk

  • Built and evaluated multiple machine learning models:

    • Logistic Regression

    • Random Forest

    • Gradient Boosting

  • Applied clustering techniques to identify population groups with similar risk profiles

  • Evaluated model performance using accuracy, precision, recall, F1-score, and AUC

Tools:

Python, R, WEKA, Machine Learning Models (Logistic Regression, Random Forest, Gradient Boosting)

Key Findings:

  • Behavioral factors such as physical activity, smoking, and BMI were strong predictors of cardiovascular risk

  • Machine learning models successfully identified high-risk individuals with strong classification performance

  • Ensemble models (Random Forest, Gradient Boosting) improved predictive accuracy over simpler models

  • Clustering revealed distinct population groups with shared health risk characteristics

Impact:

This project demonstrates how machine learning can be applied to large-scale public health data to support early detection and prevention strategies, helping inform data-driven healthcare decisions.

Bellabeat Smart Device Data Analysis

Overview:

This project analyzes smart device usage data to identify patterns in user behavior and provide data-driven recommendations for Bellabeat, a wellness technology company. The goal was to translate raw activity and sleep data into actionable marketing insights.

What I Did:

  • Analyzed Fitbit user data, including activity levels, sleep patterns, and sedentary behavior

  • Cleaned and prepared data for analysis, addressing inconsistencies and missing values

  • Conducted exploratory data analysis to identify behavioral trends

  • Applied data visualization techniques to communicate insights clearly

  • Used nonlinear analysis (LOESS) to explore relationships between activity and sleep patterns

  • Interpreted findings in a business context to generate recommendations

Tools:

R, Excel, Microsoft Access, Data Visualization (LOESS, scatter plots, bar charts)

Key Findings:

  • No consistent linear relationship between sleep duration and activity levels

  • Sedentary behavior dominated daily user patterns

  • User activity varied significantly, suggesting the need for personalized engagement strategies

  • Nonlinear trends revealed more accurate insights than simple linear models

Recommendations:

  • Develop personalized activity and wellness recommendations for users

  • Target low-activity users with engagement campaigns

  • Focus on habit-building features within the app

  • Use behavioral insights to guide marketing strategy

Impact:

This project demonstrates the ability to translate data analysis into business insights, bridging the gap between technical work and strategic decision-making.

Coffee Store Database Design and Analysis

Overview:

This project involved designing and implementing a relational database for an online coffee store to manage products, inventory, orders, and staff operations. The goal was to create an efficient data structure that supports real-world business processes and enables meaningful data analysis.

What I Did:

  • Designed a relational database schema for an online coffee store

  • Defined entities such as products, ingredients, orders, staff, and inventory

  • Established relationships between tables, including one-to-many and many-to-many relationships

  • Created primary keys and structured tables to ensure data integrity

  • Developed views to analyze:

    • Daily revenue

    • Sales summaries

    • Staff scheduling

  • Modeled real-world business operations using normalized database design

Tools:

SQL, Database Design, ER Modeling

Key Findings:

  • Normalized database structure to reduce redundancy

  • Efficient table relationships using primary and foreign keys

  • Support for operational reporting (sales, staffing, inventory)

  • Scalable design adaptable to real business environments

Impact:

This project demonstrates the ability to design structured data systems that support business operations and analytics, highlighting foundational skills in data engineering and database management.

Student Depression Risk Prediction Using Machine Learning

Overview:

This project focuses on predicting students at high risk of depression using machine learning techniques. By analyzing academic, behavioral, and lifestyle factors, the goal was to identify at-risk individuals and support early intervention strategies in educational settings.

What I Did:

  • Analyzed a large student dataset (~27,000+ records) with demographic and behavioral features

  • Cleaned and preprocessed data, including feature selection and class balancing

  • Built and evaluated multiple machine learning models:

  • Logistic Regression

  • k-Nearest Neighbors (k-NN)

  • Decision Tree (J48)

  • Applied cross-validation to ensure model robustness

  • Evaluated performance using accuracy, precision, recall, F1-score, and AUC

  • Compared baseline models with ensemble methods (Bagging, AdaBoost)

Tools:

WEKA, Machine Learning Models, Data Preprocessing

Key Findings:

  • Logistic Regression achieved the strongest performance (~84.8% accuracy, AUC ≈ 0.92)

  • Ensemble methods (Bagging) improved generalization and model stability

  • Maintaining full feature sets produced better results than aggressive feature reduction

  • The model achieved high recall, effectively identifying at-risk students

Impact:

This project demonstrates how machine learning can be used to support mental health initiatives by identifying at-risk students early, enabling institutions to take proactive and data-driven action.

Global AI Job Market Dashboard (D3.js)

Overview:

This project involved developing an interactive data visualization dashboard using D3.js to explore trends in the global artificial intelligence job market. The objective was to transform a large dataset of AI-related job postings into an intuitive and interactive dashboard that allowed users to explore salary trends, experience levels, and employment patterns over time.

What I Did:

  • Developed an interactive dashboard using D3.js, JavaScript, HTML, and CSS

  • Cleaned and transformed raw job posting data from the Global AI Job Market dataset

  • Aggregated salary information by month and experience level

  • Created a time-series line chart showing average salary trends over time

  • Implemented dynamic scales, axes, legends, and labeling for improved usability

  • Designed visual elements to support exploratory analysis and data storytelling

  • Collaborated with a team to integrate multiple visualizations into a unified dashboard

Tools:

  • D3.js

  • JavaScript

  • HTML

  • CSS

  • Data Aggregation

  • Data Visualization

  • Interactive Dashboard Development

Key Findings:

  • Executive-level positions consistently commanded the highest salaries

  • Clear salary differences existed between Entry, Mid, Senior, and Executive experience levels

  • Time-series visualizations revealed salary trends and fluctuations across the AI job market

  •  Interactive visualizations improved the ability to explore complex labor market data

  • Effective data preparation and aggregation were critical for meaningful visualization

Impact:

This project demonstrates the use of data visualization and dashboard development to communicate insights from large datasets. By transforming raw job posting information into interactive visualizations, the dashboard enabled users to identify trends, compare experience levels, and better understand patterns within the rapidly evolving AI workforce.

Dynamic Effects of Dietary Protein Restriction on Behavior (Published Research)

Overview:

Analyzed experimental behavioral and physiological data to investigate how dietary protein restriction influences body weight, food consumption, and protein preference in mouse models.

What I Did:

  • Designed and conducted experimental research using single-case experimental design (SCED)

  • Collected and analyzed longitudinal behavioral and physiological data

  • Applied multilevel linear modeling in R to evaluate repeated-measures data

  • Modeled relationships between diet conditions, weight gain, and food consumption

  • Contributed to interpretation of results and scientific publication

 

Tools:

R (lme4, statistical modeling), Excel, Experimental Design

Key Findings:

  • Protein restriction decreased weight gain but increased food consumption in mice

  • Protein preference increased under low-protein conditions in normal mice

  • Effects were reduced or absent in genetically modified (FGF21-KO) mice

  • Demonstrated the importance of the FGF21 hormone in dietary respons

 

Impact:

This research demonstrates how advanced statistical modeling and experimental design can uncover biological mechanisms, contributing to understanding of metabolism and potential applications in health and nutrition.

Effects of Chronic Risperidone on Food Reinforcement (Master’s Thesis)

Overview:

Investigated how acute and chronic risperidone administration influences food reinforcement, consumption behavior, and weight gain using behavioral economic modeling in mouse models.

 

What I Did:

  • Designed and conducted controlled behavioral experiments using operant conditioning paradigms

  • Collected and analyzed longitudinal behavioral and physiological data

  • Applied nonlinear demand curve modeling to assess reinforcement value

  • Used multilevel statistical modeling in R to analyze repeated-measures data

  • Evaluated differences between acute and chronic drug administration effects

 

Tools:

R (lme4, statistical modeling), Excel, Experimental Design

 

Key Findings:

  • Acute risperidone administration reduced food-reinforced responding across conditions

  • Chronic risperidone did not increase reinforcement value of food

  • Chronic administration increased weight gain without significantly increasing food consumption

  • Results suggest weight gain mechanisms are not driven by increased food reinforcement

 

Impact:

This project demonstrates the application of behavioral economics and advanced statistical modeling to understand drug effects on behavior, contributing to research on antipsychotic side effects and metabolic health.

bottom of page