Nitay Yacobovitch

28 years Old. Information System Management Graduate (Ben-Gurion University). Big sports & basketball fan.

Awesome Data Visualization - McDonlad's Menu

This post shows an analysis that I’ve made of Mcdonalds Menu - The data i used holds information on all of the dishes from their menu, on top of that i made some analyzing and provide insights with python libraries.

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/nutrition-facts/menu.csv

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt # draw graphs in Jupyter Notebook
import matplotlib as mpl  # visualization library in Python for 2D plots of arrays.
import seaborn as sns #beautiful statistics and visualization

menu = pd.read_csv('/kaggle/input/nutrition-facts/menu.csv') # read the csv file

menu.isnull().sum() # check for missing values, return the number of missing values

Category                         0
Item                             0
Serving Size                     0
Calories                         0
Calories from Fat                0
Total Fat                        0
Total Fat (% Daily Value)        0
Saturated Fat                    0
Saturated Fat (% Daily Value)    0
Trans Fat                        0
Cholesterol                      0
Cholesterol (% Daily Value)      0
Sodium                           0
Sodium (% Daily Value)           0
Carbohydrates                    0
Carbohydrates (% Daily Value)    0
Dietary Fiber                    0
Dietary Fiber (% Daily Value)    0
Sugars                           0
Protein                          0
Vitamin A (% Daily Value)        0
Vitamin C (% Daily Value)        0
Calcium (% Daily Value)          0
Iron (% Daily Value)             0
dtype: int64

menu.head() # show us the top rows

	Category	Item	Serving Size	Calories	Calories from Fat	Total Fat	Total Fat (% Daily Value)	Saturated Fat	Saturated Fat (% Daily Value)	...	Carbohydrates	Carbohydrates (% Daily Value)	Dietary Fiber	Dietary Fiber (% Daily Value)	Sugars	Protein	Vitamin A (% Daily Value)	Calcium (% Daily Value)	Iron (% Daily Value)
0	Breakfast	Egg McMuffin	4.8 oz (136 g)	300	120	13.0	20	5.0	25	...	31	10	4	17	3	17	10	25	15
1	Breakfast	Egg White Delight	4.8 oz (135 g)	250	70	8.0	12	3.0	15	...	30	10	4	17	3	18	6	25	8
2	Breakfast	Sausage McMuffin	3.9 oz (111 g)	370	200	23.0	35	8.0	42	...	29	10	4	17	2	14	8	25	10
3	Breakfast	Sausage McMuffin with Egg	5.7 oz (161 g)	450	250	28.0	43	10.0	52	...	30	10	4	17	2	21	15	30	15
4	Breakfast	Sausage McMuffin with Egg Whites	5.7 oz (161 g)	400	210	23.0	35	8.0	42	...	30	10	4	17	2	21	6	25	10

5 rows × 24 columns

menu.hist(column=['Calories','Cholesterol','Sugars','Protein','Total Fat'],bins=20, color='steelblue', edgecolor='black', linewidth=1.0,
           xlabelsize=8, ylabelsize=8, grid=False) # historam for each attribute. adjust size and colors
plt.tight_layout(rect=(0, 0, 1.2, 1.2))
plt.ylabel("Counter") # naming the y-axis
plt.xlabel("Values") # naming the x-axis

Text(0.5, 0, 'Values')

png

So first we’ll import some libraries that will help us later to analyze.

sns.set(font_scale=1.4)
menu['Category'].value_counts().plot(kind='bar', figsize=(7, 6)) # makes plot bar for each category
plt.xlabel("Categories", labelpad=5) # title for the x label
plt.ylabel("Count of items", labelpad=5) # title for the y label
plt.title("Count of items by its category", y=1.02); # title for the plot

png

I see that the dishes are divided into categories. Here’s a look of the distribution along the categories. we can see that the biggest category is coffe & Tea, and the category with the lowest variety.

# Histogram
fig = plt.figure(figsize = (6,4)) # adjust plot for size
title = fig.suptitle("Calories of each item", fontsize=14) # title for plot
fig.subplots_adjust(top=0.85, wspace=0.3) # Tune the subplot layout. adjust the placing

ax = fig.add_subplot(1,1, 1) # adds subplot to the figure
ax.set_xlabel("Calories") # naming the x-axis
ax.set_ylabel("Frequency") # naming the y-axis
freq, bins, patches = ax.hist(menu['Calories'], color='steelblue', bins=30,
                                    edgecolor='black', linewidth=1) # histogram for each category of calories
                                    

# Density Plot
fig = plt.figure(figsize = (6, 4)) # adjust plot for size
title = fig.suptitle("Calories of each item", fontsize=14) # title for plot
fig.subplots_adjust(top=0.85, wspace=0.3) # Tune the subplot layout. adjust the placing

ax1 = fig.add_subplot(1,1, 1) # adds the 2nd subplot to the figure
ax1.set_xlabel("Calories") # naming the x-axis
ax1.set_ylabel("Frequency") # naming the y-axis
sns.kdeplot(menu['Calories'], ax=ax1, shade=True, color='steelblue')  # Fit and plot a univariate or bivariate kernel density estimate.

<matplotlib.axes._subplots.AxesSubplot at 0x7fe94fa778d0>

png

We can see that most of the items contain between 250-500 calories. There’s also one giant dish that is really exceptional. let’s check which one is it.

menu[menu['Calories']==menu['Calories'].max()]['Item'] # gets the highest-calorie dish

82    Chicken McNuggets (40 piece)
Name: Item, dtype: object

So we can see that the dish with the biggest calorie values is Chicken McNuggets (40 piece). keep away from it :)

sugars = menu['Sugars'] # sugars values column
calories = menu['Calories'] # calories values column
plt.scatter(calories, sugars) # scatter plot of calories vs sugars
plt.xlabel('calories') # naming the x-axis
plt.ylabel('sugars') naming the y-axis
plt.show()

png

Here’s an interesting scatter plot, which is great to display correlation between two parameters. the x axis shows the calories and the y axis shows the sugars. we can see that there’s clear linear correlation between calories and sugars, as long as we have more calories in the dish, it contains more sugars, at most of the dishes. we can see some exceptionals at dishes between 0-750 calories that have maximum 25 grams of sugars.

menu.pivot_table('Trans Fat', 'Category').plot(kind='bar', stacked=True, color = 'c') # bar plot of each category and the mean of its Trans fat

<matplotlib.axes._subplots.AxesSubplot at 0x7fe94f781410>

png

Well, the dishes with the most Trans Fat are Beef & Pork and Smoothies & Shakes.

Protein = menu['Protein'] # protein column
Calories = menu['Calories'] # calories column
Category = menu['Category'] # category column

df = pd.DataFrame(dict(Calories=Calories, Protein=Protein, Category=Category)) # data frame of the 3 column
sns.lmplot('Calories', 'Protein', data=df, hue='Category', fit_reg=False) # Plot data of Calories vs Protein, based on categories by colors
plt.show()

png

cols = ['Calories','Cholesterol','Trans Fat','Sugars','Dietary Fiber'] 
cm = np.corrcoef(menu[cols].values.T) # Return Pearson product-moment correlation coefficients on all the columns 
sns.set(font_scale = 1.5) # Set aesthetic parameters in one step.
hm = sns.heatmap(cm,cbar = True, annot = True,square = True, fmt = '.2f', annot_kws = {'size':15}, yticklabels = cols, xticklabels = cols) # correlation matrix between some main values

png

correlation matrix between some main values. Calories has positive correlation with Cholesterol and Trans Fat, and Sugars has negative correlation with Dietary Fiber.

def plot(grouped):
    item = grouped["Item"].sum() # group items by names
    item_list = item.sort_index() # sort the items by sugar values
    item_list = item_list[-20:] # keeps only the 20 first items
    plt.figure(figsize=(9,10))
    graph = sns.barplot(item_list.index,item_list.values) #x is the sugar values, y is the item name
    labels = [aj.get_text()[-40:] for aj in graph.get_yticklabels()] # crop text to make it look better
    graph.set_yticklabels(labels)

    sugar = menu.groupby(menu["Sugars"]) # group by sugars
    plot(sugar)

png

TOP 20 items by sugar.

protein = menu.groupby(menu["Protein"])
plot(protein)

png

Top 20 items by Protein. so 40 Nuggets contains alot of Protein, but what with Cholsetrol and Fat? let’s see.

Cholesterol = menu.groupby(menu["Cholesterol"])
plot(Cholesterol)

png

Top 20 items by cholesterol

calories = menu.groupby(menu["Calories"])
plot(calories)

png

Saturated = menu.groupby(menu["Saturated Fat"])
plot(Saturated)

png

Top 20 items by calories.

menu.corr()['Calories'].sort_values()

Vitamin C (% Daily Value)       -0.068747
Vitamin A (% Daily Value)        0.108844
Sugars                           0.259598
Calcium (% Daily Value)          0.428426
Trans Fat                        0.522441
Dietary Fiber                    0.538894
Dietary Fiber (% Daily Value)    0.540014
Cholesterol (% Daily Value)      0.595208
Cholesterol                      0.596399
Iron (% Daily Value)             0.643552
Sodium                           0.712309
Sodium (% Daily Value)           0.713415
Carbohydrates (% Daily Value)    0.781242
Carbohydrates                    0.781539
Protein                          0.787847
Saturated Fat                    0.845564
Saturated Fat (% Daily Value)    0.847631
Total Fat (% Daily Value)        0.904123
Total Fat                        0.904409
Calories from Fat                0.904588
Calories                         1.000000
Name: Calories, dtype: float64

The method corr gives us correlation of specific column with all the other columns. in this example, we can see that calories has strong positive correlation with total fat, saturated fat, and carbohtydrates.

2022 1
2020 3

2022

Data Engineer Course Final Project - MBTA API

This Project was created as the Final Project in the Big Data Engineer course at Naya College By Nitay Yacobovitch, Shoham Gilady and Dor Izmaylov. We used M...

Nitay Yacobovitch

Awesome Data Visualization - McDonlad's Menu

2022

Data Engineer Course Final Project - MBTA API

2020

Solar System Analysis

Build Interactive Excel Dashboards

Awesome Data Visualization - McDonlad’s Menu