top of page
dataset-cover heart1.png

Heart Disease Risk Factors: Correlations and Predictive Patterns

 

Project Overview:

Heart disease is one of the 3 leading cause of death across all racial demographics in the US. 49% of American's exhibit primary risk factors that cause heart disease. Understanding and addressing these primary risk factors are critical for improving healthcare outcomes.

 

​

Objective and Scope:

Offer insights into correlation and predictive patterns for risk factors of heart disease. Through exploration of this dataset, our goal is to uncover indicators that facilitate early detection and prevention of heart disease. 

Data set covers U.S. residents across all 50 States, the District of Columbia, and three U.S. territories. The variables cover a broad spectrum of health-related factors that directly or indirectly influence heart disease.

​

 

Tools:

Python/Jupyter Notebook including Pandas, Numpy, Matplotlib, Seaborn, Scikit-Learn Libraries

Tableau

 

​

​

​

​

Vasculature of the Heart

Data Overview

​

Data Set:

Indicators of Heart Disease (2022 UPDATE)

Source: Kaggle

Data Ownership:  Center of Disease Control (CDC)

Data Collection: Telephone surveys conducted annually, which is updated from data collected in 2023

​

 

​

 

Skills and Techniques:

Sourcing Data

Exploratory Analysis

Geospatial Analysis

Time Series Analysis

Regression Analysis

Cluster Analysis

Tableau Dashboard Design

heart image.jpg
istockphoto-1359314170-612x612.jpg
istockphoto-1359314170-612x612.jpg

Key Business Questions

Research Question 1:

Are there any correlations or associations between various risk factors that likely cause heart disease?

 

Research Question 2:

How can risk factors such as BMI be categorized to indicate risk for heart disease?

 

Research Question 3:

Do invariable factors such as age and gender have any effect on risk factors such as BMI?

​

Research Question 4:

A: Do lifestyle factors such as Alcohol consumption, Smoking and E-cigarette use have any effect on heart disease?

B: Does COVID Viral infection, Diabetes, Angina, Arthritis have any effect on likelihood of heart disease?

 

Research Question 5:

Are there any geographical patterns in BMI and weight distribution associating with risk for heart disease?

 

Research Question 6:

Race/Ethnicity that is more likely to have heart attack?

​

Research Question 1

 

Are there any correlations or associations between various risk factors that likely cause heart disease?

​

​

Strong positive correlation: BMI and Weight

Correlation co-efficient: 0.86

More the persons weight higher the BMI

 

Other quantitative variables have weak correlations

Correlation Heat Map

Heart Correlation Q1.JPG
heart Q1 scatterplot.JPG

Python: Exploratory Data Analysis

Heart Jupyter exploring relationships.JPG

Geospatial Analysis of State-wise BMI Data

image.png
heart Linear regression.JPG

Linear Regression

Evaluating BMI Predictions

 

The regression model demonstrates strong performance in predicting BMI values for the majority of data points.

​

 

This R2 score indicates a relatively good fit of the regression model to the data

​

 

Some instances were  the model's predictions were different from the actual BMI values may arise from external factors influencing BMI, data bias, or model errors.

BMI and Weight

 

Analysis of Darker Purple Data Points (Clusters 6 to 9):

Lower weight individuals tend to have lower BMI values

Outliers that have low weights but high BMI

 

Suggest potential anomalies or unique characteristics within these cases.

 

Analysis of Pink Data Points (Clusters 0 to 4):

Moderate to high weight individuals tend to have high BMI values

Outliers that have high weights but low BMI values

 

Further research and analysis are warranted to better understand the multifaceted factors influencing BMI and its effect on heart disease

Heart Correlation.JPG

Cluster Analysis

Heart Race 2.JPG
Heart Race Legend.JPG

Research Question 6

 

Is Race/Ethnicity that is more likely to have heart attack?


White : 80.06%
Black  : 6.62%
Hispanic: 6.39%
Other race : 4.40%
Multiracial : 2.53%

​

​

Key Insights:

White only, individuals have the highest incidence of heart attacks among the racial groups 80.06% of total heart attacks
All other racial groups show significantly lower heart attack, ranging from 2.53% to 6.62%
Multiracial individuals have the lowest reported heart attack incidence at 2.53% of total heart attacks

​

Insights gained

 

- BMI and Weight exhibit a strong correlation,while other variables like Height, Physical Health Days, Mental Health Days, and Sleep Hours show weak correlations.

 

- A general trend is observed where BMI increases with Weight, though exceptions exist, suggesting additional factors may be impacting BMI or presence of outliers.

 

- Age-wise analysis in the US reveals the highest counts of obese and overweight BMI in the 65 to 69 age group, while underweight BMI is predominant in the 18 to 24 age group.

 

- In the US, the highest percentages of obese BMI are found in females (34.84%) and males (41.14%).

 

- Smoking is a significant risk factor for heart disease, with 8.10% of smokers experiencing heart attacks compared to only 3.7% of non-smokers.

 

- Arthritis and Angina show slight correlations with heart disease development, indicating potential indirect connections.

 

- Median BMI varies across states, with Mississippi, West Virginia, Ohio, and Alabama having the highest, and Colorado, Vermont, Massachusetts, and California having the lowest.

 

- White only, Non-Hispanic individuals had the highest incidence of heart attacks among the racial groups of 80.06 % of total heart attacks

statiscope.webp
bottom of page