PORTFOLIO

Tennessee K–12 Academic Outcome Analysis

Overview:

Analyzed multi-year education data across Tennessee schools to understand the relationship between funding, student performance, and demographic factors.

What I Did:

Integrated 90+ datasets into a relational database
Cleaned and standardized multi-year data (2018–2024)
Applied statistical models (GAMM, ordinal logistic regression)
Conducted comparative and correlation analyses

Tools:

R, SQL, Microsoft Access

Key Findings:

Per-pupil spending showed weak and inconsistent relationships with outcomes
ACT scores and graduation rates strongly aligned with TVAAS performance
Demographics were strongly associated with student outcomes

Impact:

This project highlights how data-driven analysis can challenge assumptions about education funding and provide insights for policy and decision-making.

Download Report

Download Presentation

Machine Learning – Fraud Detection

Overview:

This project focuses on detecting fraudulent credit card transactions using machine learning techniques. By analyzing transaction data, the goal was to identify patterns that distinguish fraudulent activity from normal behavior and evaluate model performance in a real-world classification setting.

What I Did:

Explored and analyzed a dataset of 10,000 credit card transactions
Cleaned and preprocessed data, including normalization and handling class imbalance
Performed exploratory data analysis to identify patterns in transaction behavior
Built and compared multiple machine learning models:
- Logistic Regression
- Decision Tree
Evaluated model performance using key metrics such as accuracy, precision, recall, F1-score, and AUC
Visualized results using ROC curves and confusion matrices

Tools:

Python, R, Machine Learning Models (Logistic Regression, Decision Trees)

Key Findings:

Class imbalance significantly impacted model performance and required preprocessing adjustments
Logistic Regression achieved strong overall performance with a high AUC score (~0.94)
Decision Trees improved fraud detection after balancing the dataset, increasing recall for fraudulent transactions
Accuracy alone was not a reliable metric due to imbalance; recall and AUC provided better evaluation insight

Impact:

This project demonstrates how machine learning can be applied to real-world fraud detection problems, highlighting the importance of model evaluation and data preprocessing in building reliable classification systems.

Download Report

Download Presentation

Cardiovascular Disease Risk Prediction Using Machine Learning

Overview:

This project focuses on predicting the risk of cardiovascular disease using large-scale public health data. By analyzing behavioral, clinical, and demographic factors, the goal was to build machine learning models that identify individuals at higher risk and support early intervention strategies.

What I Did:

Analyzed large-scale health datasets (300,000+ records) from the BRFSS survey
Cleaned and prepared data, including handling missing values and encoding categorical variables
Conducted exploratory data analysis to understand relationships between health behaviors and disease risk
Built and evaluated multiple machine learning models:
- Logistic Regression
- Random Forest
- Gradient Boosting
Applied clustering techniques to identify population groups with similar risk profiles
Evaluated model performance using accuracy, precision, recall, F1-score, and AUC

Tools:

Python, R, WEKA, Machine Learning Models (Logistic Regression, Random Forest, Gradient Boosting)

Key Findings:

Behavioral factors such as physical activity, smoking, and BMI were strong predictors of cardiovascular risk
Machine learning models successfully identified high-risk individuals with strong classification performance
Ensemble models (Random Forest, Gradient Boosting) improved predictive accuracy over simpler models
Clustering revealed distinct population groups with shared health risk characteristics

Impact:

This project demonstrates how machine learning can be applied to large-scale public health data to support early detection and prevention strategies, helping inform data-driven healthcare decisions.

Download Report

Bellabeat Smart Device Data Analysis

Overview:

This project analyzes smart device usage data to identify patterns in user behavior and provide data-driven recommendations for Bellabeat, a wellness technology company. The goal was to translate raw activity and sleep data into actionable marketing insights.

What I Did:

Analyzed Fitbit user data, including activity levels, sleep patterns, and sedentary behavior
Cleaned and prepared data for analysis, addressing inconsistencies and missing values
Conducted exploratory data analysis to identify behavioral trends
Applied data visualization techniques to communicate insights clearly
Used nonlinear analysis (LOESS) to explore relationships between activity and sleep patterns
Interpreted findings in a business context to generate recommendations

Tools:

R, Excel, Microsoft Access, Data Visualization (LOESS, scatter plots, bar charts)

Key Findings:

No consistent linear relationship between sleep duration and activity levels
Sedentary behavior dominated daily user patterns
User activity varied significantly, suggesting the need for personalized engagement strategies
Nonlinear trends revealed more accurate insights than simple linear models

Recommendations:

Develop personalized activity and wellness recommendations for users
Target low-activity users with engagement campaigns
Focus on habit-building features within the app
Use behavioral insights to guide marketing strategy

Impact:

This project demonstrates the ability to translate data analysis into business insights, bridging the gap between technical work and strategic decision-making.

Download Report

Coffee Store Database Design and Analysis

Overview:

This project involved designing and implementing a relational database for an online coffee store to manage products, inventory, orders, and staff operations. The goal was to create an efficient data structure that supports real-world business processes and enables meaningful data analysis.

What I Did:

Designed a relational database schema for an online coffee store
Defined entities such as products, ingredients, orders, staff, and inventory
Established relationships between tables, including one-to-many and many-to-many relationships
Created primary keys and structured tables to ensure data integrity
Developed views to analyze:
- Daily revenue
- Sales summaries
- Staff scheduling
Modeled real-world business operations using normalized database design

Tools:

SQL, Database Design, ER Modeling

Key Findings:

Normalized database structure to reduce redundancy
Efficient table relationships using primary and foreign keys
Support for operational reporting (sales, staffing, inventory)
Scalable design adaptable to real business environments

Impact:

This project demonstrates the ability to design structured data systems that support business operations and analytics, highlighting foundational skills in data engineering and database management.

Download Report

Download Presentation

Student Depression Risk Prediction Using Machine Learning

Overview:

This project focuses on predicting students at high risk of depression using machine learning techniques. By analyzing academic, behavioral, and lifestyle factors, the goal was to identify at-risk individuals and support early intervention strategies in educational settings.

What I Did:

Analyzed a large student dataset (~27,000+ records) with demographic and behavioral features
Cleaned and preprocessed data, including feature selection and class balancing
Built and evaluated multiple machine learning models:
Logistic Regression
k-Nearest Neighbors (k-NN)
Decision Tree (J48)
Applied cross-validation to ensure model robustness
Evaluated performance using accuracy, precision, recall, F1-score, and AUC
Compared baseline models with ensemble methods (Bagging, AdaBoost)

Tools:

WEKA, Machine Learning Models, Data Preprocessing

Key Findings:

Logistic Regression achieved the strongest performance (~84.8% accuracy, AUC ≈ 0.92)
Ensemble methods (Bagging) improved generalization and model stability
Maintaining full feature sets produced better results than aggressive feature reduction
The model achieved high recall, effectively identifying at-risk students

Impact:

This project demonstrates how machine learning can be used to support mental health initiatives by identifying at-risk students early, enabling institutions to take proactive and data-driven action.

Download Report

Global AI Job Market Dashboard (D3.js)

Overview:

This project involved developing an interactive data visualization dashboard using D3.js to explore trends in the global artificial intelligence job market. The objective was to transform a large dataset of AI-related job postings into an intuitive and interactive dashboard that allowed users to explore salary trends, experience levels, and employment patterns over time.

What I Did:

Developed an interactive dashboard using D3.js, JavaScript, HTML, and CSS
Cleaned and transformed raw job posting data from the Global AI Job Market dataset
Aggregated salary information by month and experience level
Created a time-series line chart showing average salary trends over time
Implemented dynamic scales, axes, legends, and labeling for improved usability
Designed visual elements to support exploratory analysis and data storytelling
Collaborated with a team to integrate multiple visualizations into a unified dashboard

Tools:

D3.js
JavaScript
HTML
CSS
Data Aggregation
Data Visualization
Interactive Dashboard Development

Key Findings:

Executive-level positions consistently commanded the highest salaries
Clear salary differences existed between Entry, Mid, Senior, and Executive experience levels
Time-series visualizations revealed salary trends and fluctuations across the AI job market
Interactive visualizations improved the ability to explore complex labor market data
Effective data preparation and aggregation were critical for meaningful visualization

Impact:

This project demonstrates the use of data visualization and dashboard development to communicate insights from large datasets. By transforming raw job posting information into interactive visualizations, the dashboard enabled users to identify trends, compare experience levels, and better understand patterns within the rapidly evolving AI workforce.

Github Page

Download Report

Dynamic Effects of Dietary Protein Restriction on Behavior (Published Research)

Download Report

Overview:

Analyzed experimental behavioral and physiological data to investigate how dietary protein restriction influences body weight, food consumption, and protein preference in mouse models.

What I Did:

Designed and conducted experimental research using single-case experimental design (SCED)
Collected and analyzed longitudinal behavioral and physiological data
Applied multilevel linear modeling in R to evaluate repeated-measures data
Modeled relationships between diet conditions, weight gain, and food consumption
Contributed to interpretation of results and scientific publication

Tools:

R (lme4, statistical modeling), Excel, Experimental Design

Key Findings:

Protein restriction decreased weight gain but increased food consumption in mice
Protein preference increased under low-protein conditions in normal mice
Effects were reduced or absent in genetically modified (FGF21-KO) mice
Demonstrated the importance of the FGF21 hormone in dietary respons

Impact:

This research demonstrates how advanced statistical modeling and experimental design can uncover biological mechanisms, contributing to understanding of metabolism and potential applications in health and nutrition.

Effects of Chronic Risperidone on Food Reinforcement (Master’s Thesis)

Download Report

Overview:

Investigated how acute and chronic risperidone administration influences food reinforcement, consumption behavior, and weight gain using behavioral economic modeling in mouse models.

What I Did:

Designed and conducted controlled behavioral experiments using operant conditioning paradigms
Collected and analyzed longitudinal behavioral and physiological data
Applied nonlinear demand curve modeling to assess reinforcement value
Used multilevel statistical modeling in R to analyze repeated-measures data
Evaluated differences between acute and chronic drug administration effects

Tools:

R (lme4, statistical modeling), Excel, Experimental Design

Key Findings:

Acute risperidone administration reduced food-reinforced responding across conditions
Chronic risperidone did not increase reinforcement value of food
Chronic administration increased weight gain without significantly increasing food consumption
Results suggest weight gain mechanisms are not driven by increased food reinforcement

Impact:

This project demonstrates the application of behavioral economics and advanced statistical modeling to understand drug effects on behavior, contributing to research on antipsychotic side effects and metabolic health.