Fitness tracker Consumer Survey Analysis

This analysis aimed to explore how fitness wearables impact users’ exercise routines, motivation, and overall well-being. The survey focused on understanding the relationship between usage patterns and perceived health benefits.

Data Source: Kaggle

Analysis Method: Python

Process Overview

  • 1. Initial Inspection & Cleaning

    Evaluated columns such as age, gender, education, occupation, and fitness habits.

    Cleaned data by renaming columns for clarity and filtering out any inconsistent or null entries.

  • 2. Demographic Analysis

    Plotted age and gender distribution using pie charts.

    Found a varied age range and a nearly balanced gender breakdown.

  • 3. Tracker Usage Trends

    Assessed how long respondents had used fitness trackers and how frequently they used or checked them.

    Created categorical orderings for comparison and visualized usage with bar charts and heatmaps.

  • 4. Impact Evaluation

    Standardized and analyzed responses to 9 impact areas, including motivation, engagement, health, sleep, and community connection.

    Responses were reclassified into a unified scale: Very Negative to Very Positive.

    Visualized using facet grids to assess perception across multiple dimensions.

  • 5. Cross Analysis

    Compared enjoyment of exercising with how often users tracked their data using heatmaps.

  • 6. Insights Summary and Ideation

    Identified key patterns such as positive correlation between usage frequency and enjoyment.

    Noted drop-off in long-term engagement and inconsistent user feedback structures.

    Brainstormed product opportunities and UX strategies grounded in the behavioral data.

  1. Initial Inspection & Cleaning

Input

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats

df = pd.read_csv('survey 605.csv')
print(df.head())
print(df.columns)
print(df.count())
print(df.describe())

Output

Index(['Timestamp', 'What is your age?', 'What is your gender?',
       'What is your highest level of education?',
       'What is your current occupation?',
       'How often do you exercise in a week?',
       'How long have you been using a fitness wearable?',
       'How frequently do you use your fitness wearable?',
       'How often do you track fitness data using wearable?',
       'How has the fitness wearable impacted your fitness routine?',
       'Has the fitness wearable helped you stay motivated to exercise?',
       'Do you think that the fitness wearable has made exercising more enjoyable?',
       'How engaged do you feel with your fitness wearable?',
       'Does using a fitness wearable make you feel more connected to the fitness community?',
       'How has the fitness wearable helped you achieve your fitness goals?',
       'How has the fitness wearable impacted your overall health?',
       'Has the fitness wearable improved your sleep patterns?',
       'Do you feel that the fitness wearable has improved your overall well-being?',
       'Has using a fitness wearable influenced your decision? [To exercise more?]',
       'Has using a fitness wearable influenced your decision? [To purchase other fitness-related products?]',
       'Has using a fitness wearable influenced your decision? [To join a gym or fitness class?]',
       'Has using a fitness wearable influenced your decision? [To change your diet?]'],
      dtype='object')
dtype: int64

Thoughts:

  1. The dataset is predominantly surveying wearable/fitness tracker users (no yes/no questions about fitness trackers)

  2. The questions are mainly designed to evaluate fitness trackers' impact on workout routines

  3. Missing important features like location (suburb vs urban), type of fitness trackers, etc

  4. The database is best for evaluating the correlation between health habits and frequency/time use of the tracker

  5. The database has limited data (only 30 entries) so the result should be treated as a reference only

Cleaning needs:

  1. Remove empty or error logs

  2. Filter the ones without trackers

  3. Data types

2. Demographic Analysis

Input

# rename the dataframe for easier calling
df_new = df.rename(columns={
    'Timestamp': 'timestamp',
    'What is your age?': 'age',
    ... (hidden)
}, inplace=False)

# print the new columns
print(df_new.columns)

# start with examining the empty or error data
# print(df_new[df_new.isnull().any(axis=1)])

# examine the age data in this table
colors = sns.color_palette("pastel")

plt.pie(df_new.age.value_counts(), 
        labels=df_new.age.unique(),
        colors = colors,
        autopct='%1.1f%%', 
        startangle=90)
plt.axis('equal')
plt.title('Age information of the survey respondents')
plt.show()

# examine the gender data
colors = sns.color_palette("Paired")

plt.pie(df_new.gender.value_counts(), 
        labels=df_new.gender.unique(),
        colors = colors,
        autopct='%1.1f%%', 
        startangle=90)
plt.axis('equal')
plt.title('Gender information of the survey respondents')
plt.show()

Output

3. Tracker Usage Trend

Input

# examine the tracker_use_time column aka who has trackers
print(df_new.tracker_use_time.unique())

# create a ranking of the categorical description 
order_tracker_use_time = ['Less than 6 months' ,'6-12 months', '1-2 years' ,'More than 2 years']
df_new.tracker_use_time = pd.Categorical(df_new.tracker_use_time, categories=order_tracker_use_time, ordered=True)
print(df_new.tracker_use_time.describe())

# create a box gram to examine the distribution
plt.bar(order_tracker_use_time, df_new.tracker_use_time.value_counts())
plt.xlabel('How long the tracker has been owned')
plt.ylabel('Number of responses')
plt.show()

Input

# tracker owning vs age
plt.scatter(df_new.age, df_new.tracker_use_time)
plt.show()

Input

# inspect frequency column and assign orders
print(df_new.tracker_use_frequency.unique())

order_tracker_use_frequency = ['Rarely','1-2 times a week','3-4 times a week','Daily']
df_new.tracker_use_frequency = pd.Categorical(df_new.tracker_use_frequency, categories=order_tracker_use_frequency, ordered=True)
print(df_new.tracker_use_frequency.describe())

plt.bar(order_tracker_use_frequency, df_new.tracker_use_frequency.value_counts())
plt.xlabel('How frequently do you use the tracker')
plt.ylabel('Number of responses')
plt.show()

Input

# plot usage time vs frequency

plt.clf()
cross_tab = pd.crosstab(df_new.tracker_use_time, df_new.tracker_use_frequency)
sns.heatmap(cross_tab, annot=True, cmap='flare')
plt.gca().invert_yaxis()
plt.xlabel('How frequently the tracker is used')
plt.ylabel('How long the tracker has been owned')

plt.show()

Input

# create jitter 
x_jitter = np.random.normal(0, 0.1, len(df_new.tracker_use_time))
y_jitter = np.random.normal(0, 0.1, len(df_new.tracker_use_frequency))

# add numerical values to categories for plotting
df['usage_num'] = pd.Categorical(df_new.tracker_use_time).codes
df['frequency_num'] = pd.Categorical(df_new.tracker_use_frequency).codes

plt.scatter(df['usage_num'] + x_jitter, df['frequency_num'] + y_jitter)
plt.xticks(ticks=range(len(order_tracker_use_time)), labels=order_tracker_use_time, rotation=45)
plt.yticks(ticks=range(len(order_tracker_use_frequency)), labels=order_tracker_use_frequency)
plt.title('Scatter Plot with Jitter for Categorical Variables')

plt.show()

Input

# examine the responses of data tracking frequency
print(df_new.tracker_data_track_frequency.unique())

# oder the labels
order_data_track_frequency = ['Rarely','Once a month','Once a week','Every other day','Every day']
df_new.tracker_data_track_frequency = pd.Categorical(df_new.tracker_data_track_frequency, categories=order_data_track_frequency, ordered=True)

print(df_new.tracker_data_track_frequency.describe())

# create a graph
plt.bar(order_data_track_frequency, df_new.tracker_data_track_frequency.value_counts())
plt.xlabel('How frequently do you track data')
plt.ylabel('Number of responses')
plt.show()

4. Impact Evaluation

Input

columns = ['tracker_routine_impact',
       'tracker_motivation_impact', 'tracker_enjoyable_impact',
       'tracker_engagement_impact', 'tracker_community_impact',
       'tracker_goal_impact', 'tracker_health_impact', 'tracker_sleep_impact',
       'tracker_wellbeing_impact']

df_evaluation = df_new[columns]
#print(df_evaluation.describe())
#print(df_evaluation.tracker_motivation_impact.unique())


# melt df_evaluation into a bigger dataframe
df_evaluation_melt = df_evaluation.melt(var_name='Question', value_name='Response')
#print(df_evaluation_melt.describe())

# update the responses to be consistent
#print(df_evaluation_melt.Response.unique())
response_update = ['Very negative impact','Negative impact','Neutral impact','Positive impact','Very positive impact','Not sure']

replace = {
    'Negatively impacted my fitness routine':'Negative impact',
    'No impact on my fitness routine':'Neutral impact',
    ...
}

df_evaluation_melt['Response'] = df_evaluation_melt['Response'].replace(replace)
print(df_evaluation_melt.Response.unique())

# create a big graph of all of the questions

graph = sns.FacetGrid(df_evaluation_melt, col="Question", col_wrap=3, sharey=False, sharex=False)
graph.map(sns.countplot, "Response", order=response_update)

#plt.xticks(ticks=range(len(response_update)), labels=response_update, rotation=45)
for ax in graph.axes.flat:
    ax.set_xticks(range(len(response_update)))
    ax.set_xticklabels(response_update, rotation=45, ha='right',fontsize=8)
graph.set_titles("{col_name}")
plt.tight_layout(pad=3)

plt.show()

5. Cross Analysis

Input

plt.clf()
cross_tab = pd.crosstab(df_new.tracker_data_track_frequency, df_new.tracker_enjoyable_impact)
sns.heatmap(cross_tab, annot=True, cmap='flare')
plt.gca().invert_yaxis()
plt.xlabel('How frequently do you track fitness data')
plt.ylabel('Do you think that the fitness wearable has made exercising more enjoyable?')

plt.show()

6. Insights Summary and Ideation

Highlights

  • Most users felt positively about how wearables impacted motivation, goal achievement, and health.

  • Daily users tended to report the most positive outcomes.

  • Usage drops after 1–2 years, though the sample size beyond that is too small to confirm a trend.

  • Wearables were strongly linked to increased fitness tracking and greater enjoyment of workouts.

limitation

  • Only 30 responses, with just 2 users reporting over 2 years of tracker usage.

  • Inconsistent answer formatting made cleaning harder.

  • Survey lacked a control group (non-wearable users) for stronger insights.

Product Strategy Recommendations

  1. Encourage Early Engagement

    Most value and motivation are felt in the first year—design onboarding experiences that reinforce early wins and consistent tracking.

  2. Re-engage Long-Term Users

    Usage dips after 1–2 years. Develop “milestone programs” or advanced challenges to re-spark interest for long-time users.

  3. Simplify UX with Consistent Feedback

    Align product questionnaires, feedback loops, and user check-ins using consistent response types for easier longitudinal analysis.

  4. Design for Habit-Forming Routines

    Users who tracked data daily found workouts more enjoyable. Highlight progress and send nudges that reward consistency.

  5. Add Community-Driven Features

    Many users felt more connected to the fitness community. Expand social sharing, virtual classes, or achievement boards.