Pivot tables in pandas

In [1]:
import pandas as pd
%matplotlib inline
pd.options.display.max_rows = 10
In [2]:
df = pd.read_table('http://bit.ly/chiporders')
In [3]:
df.head()
Out[3]:
order_id quantity item_name choice_description item_price
0 1 1 Chips and Fresh Tomato Salsa NaN $2.39
1 1 1 Izze [Clementine] $3.39
2 1 1 Nantucket Nectar [Apple] $3.39
3 1 1 Chips and Tomatillo-Green Chili Salsa NaN $2.39
4 2 2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98

Below shows how simple it is to create a pivot table for quantities of item_name based on the table above.

Note that I used the aggfunc='sum' which tells the method that I wish to sum up the values of quantities for each item_name.

You can use pretty much any function for the aggfunc including user defined functions. but some of the more common functions are:

"sum"   for Adding the values
"mean"   for Average
"std"   for Standard Deviation
"var"   for Variance
"min"   for Minimum
"max"   for the maximum
"median"   for median

In [4]:
df.pivot_table(values="quantity", index = "item_name",aggfunc ="sum")
Out[4]:
quantity
item_name
6 Pack Soft Drink 55
Barbacoa Bowl 66
Barbacoa Burrito 91
Barbacoa Crispy Tacos 12
Barbacoa Salad Bowl 10
... ...
Veggie Burrito 97
Veggie Crispy Tacos 1
Veggie Salad 6
Veggie Salad Bowl 18
Veggie Soft Tacos 8

50 rows × 1 columns

Bonus: As you can see in the data we have the item price with the dollar Sign which turns the column into a text value, we fix that by removing the dollar sign and reassigning the column as an integer

In [5]:
#first I will access the string methods via str keyword and then use the replace function to replace $ with nothing
df.item_price = df.item_price.str.replace("$","")

## Now I can set the type of item_price as a float(decimal number)
df.item_price = df.item_price.astype("float")

#to make sure everything did work i use the dtypes method to see the types of my columns
df.dtypes

#Sure enough the item_price is now a float
Out[5]:
order_id                int64
quantity                int64
item_name              object
choice_description     object
item_price            float64
dtype: object
In [6]:
df.head()
Out[6]:
order_id quantity item_name choice_description item_price
0 1 1 Chips and Fresh Tomato Salsa NaN 2.39
1 1 1 Izze [Clementine] 3.39
2 1 1 Nantucket Nectar [Apple] 3.39
3 1 1 Chips and Tomatillo-Green Chili Salsa NaN 2.39
4 2 2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans... 16.98

Lets do one more example with both column and row values just like excel

In [7]:
df.pivot_table(values = "item_price", index="choice_description", columns = "quantity", aggfunc = "sum")
Out[7]:
quantity 1 2 3 4
choice_description
[Adobo-Marinated and Grilled Chicken, Pinto Beans, [Sour Cream, Salsa, Cheese, Cilantro-Lime Rice, Guacamole]] 7.40 NaN NaN NaN
[Adobo-Marinated and Grilled Chicken, [Sour Cream, Cheese, Cilantro-Lime Rice]] 7.40 NaN NaN NaN
[Adobo-Marinated and Grilled Chicken] 7.40 NaN NaN NaN
[Adobo-Marinated and Grilled Steak, [Sour Cream, Salsa, Cheese, Cilantro-Lime Rice, Guacamole]] 7.40 NaN NaN NaN
[Adobo-Marinated and Grilled Steak] 7.40 NaN NaN NaN
... ... ... ... ...
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Black Beans, Cheese, Lettuce]] 16.98 NaN NaN NaN
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Black Beans, Lettuce]] 8.49 NaN NaN NaN
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Pinto Beans, Cheese, Lettuce]] 8.49 NaN NaN NaN
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Pinto Beans, Fajita Veggies, Cheese, Sour Cream, Lettuce]] 8.99 NaN NaN NaN
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Pinto Beans, Fajita Veggies, Lettuce]] 8.99 NaN NaN NaN

1043 rows × 4 columns

There we have it , we now have a pivot table of our chipolle orders based on the choice description and quantity of the order of course we can do one more thing to make this pivot look a little bit better

In [8]:
df.pivot_table(values = "item_price", index="choice_description", columns = "quantity", aggfunc = "sum").fillna(0)
Out[8]:
quantity 1 2 3 4
choice_description
[Adobo-Marinated and Grilled Chicken, Pinto Beans, [Sour Cream, Salsa, Cheese, Cilantro-Lime Rice, Guacamole]] 7.40 0.0 0.0 0.0
[Adobo-Marinated and Grilled Chicken, [Sour Cream, Cheese, Cilantro-Lime Rice]] 7.40 0.0 0.0 0.0
[Adobo-Marinated and Grilled Chicken] 7.40 0.0 0.0 0.0
[Adobo-Marinated and Grilled Steak, [Sour Cream, Salsa, Cheese, Cilantro-Lime Rice, Guacamole]] 7.40 0.0 0.0 0.0
[Adobo-Marinated and Grilled Steak] 7.40 0.0 0.0 0.0
... ... ... ... ...
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Black Beans, Cheese, Lettuce]] 16.98 0.0 0.0 0.0
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Black Beans, Lettuce]] 8.49 0.0 0.0 0.0
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Pinto Beans, Cheese, Lettuce]] 8.49 0.0 0.0 0.0
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Pinto Beans, Fajita Veggies, Cheese, Sour Cream, Lettuce]] 8.99 0.0 0.0 0.0
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Pinto Beans, Fajita Veggies, Lettuce]] 8.99 0.0 0.0 0.0

1043 rows × 4 columns

Plotting and sorting the values of the pivot table

in the method below we first created the pivot table using pivot_table() method we then sort the values of quanitites using sort_values() method and finally used the plot() method to plot the pivot table

In [9]:
df.pivot_table(values="quantity", index = "item_name",aggfunc ="sum").sort_values("quantity").plot(kind = "bar",figsize = (18,5))
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x15ec16faac8>