Pivot tables in pandas¶

In [1]:

import pandas as pd
%matplotlib inline
pd.options.display.max_rows = 10

In [2]:

df = pd.read_table('http://bit.ly/chiporders')

In [3]:

df.head()

Out[3]:

	order_id	quantity	item_name	choice_description	item_price
0	1	1	Chips and Fresh Tomato Salsa	NaN	$2.39
1	1	1	Izze	[Clementine]	$3.39
2	1	1	Nantucket Nectar	[Apple]	$3.39
3	1	1	Chips and Tomatillo-Green Chili Salsa	NaN	$2.39
4	2	2	Chicken Bowl	[Tomatillo-Red Chili Salsa (Hot), [Black Beans...	$16.98

Below shows how simple it is to create a pivot table for quantities of item_name based on the table above.¶

Note that I used the aggfunc='sum' which tells the method that I wish to sum up the values of quantities for each item_name.

You can use pretty much any function for the aggfunc including user defined functions. but some of the more common functions are:

"sum" for Adding the values
"mean" for Average
"std" for Standard Deviation
"var" for Variance
"min" for Minimum
"max" for the maximum
"median" for median

In [4]:

df.pivot_table(values="quantity", index = "item_name",aggfunc ="sum")

Out[4]:

	quantity
item_name
6 Pack Soft Drink	55
Barbacoa Bowl	66
Barbacoa Burrito	91
Barbacoa Crispy Tacos	12
Barbacoa Salad Bowl	10
...	...
Veggie Burrito	97
Veggie Crispy Tacos	1
Veggie Salad	6
Veggie Salad Bowl	18
Veggie Soft Tacos	8

50 rows × 1 columns

Bonus: As you can see in the data we have the item price with the dollar Sign which turns the column into a text value, we fix that by removing the dollar sign and reassigning the column as an integer¶

In [5]:

#first I will access the string methods via str keyword and then use the replace function to replace $ with nothing
df.item_price = df.item_price.str.replace("$","")

## Now I can set the type of item_price as a float(decimal number)
df.item_price = df.item_price.astype("float")

#to make sure everything did work i use the dtypes method to see the types of my columns
df.dtypes

#Sure enough the item_price is now a float

Out[5]:

order_id                int64
quantity                int64
item_name              object
choice_description     object
item_price            float64
dtype: object

In [6]:

df.head()

Out[6]:

	order_id	quantity	item_name	choice_description	item_price
0	1	1	Chips and Fresh Tomato Salsa	NaN	2.39
1	1	1	Izze	[Clementine]	3.39
2	1	1	Nantucket Nectar	[Apple]	3.39
3	1	1	Chips and Tomatillo-Green Chili Salsa	NaN	2.39
4	2	2	Chicken Bowl	[Tomatillo-Red Chili Salsa (Hot), [Black Beans...	16.98

Lets do one more example with both column and row values just like excel¶

In [7]:

df.pivot_table(values = "item_price", index="choice_description", columns = "quantity", aggfunc = "sum")

Out[7]:

quantity	1	2	3	4
choice_description
[Adobo-Marinated and Grilled Chicken, Pinto Beans, [Sour Cream, Salsa, Cheese, Cilantro-Lime Rice, Guacamole]]	7.40	NaN	NaN	NaN
[Adobo-Marinated and Grilled Chicken, [Sour Cream, Cheese, Cilantro-Lime Rice]]	7.40	NaN	NaN	NaN
[Adobo-Marinated and Grilled Chicken]	7.40	NaN	NaN	NaN
[Adobo-Marinated and Grilled Steak, [Sour Cream, Salsa, Cheese, Cilantro-Lime Rice, Guacamole]]	7.40	NaN	NaN	NaN
[Adobo-Marinated and Grilled Steak]	7.40	NaN	NaN	NaN
...	...	...	...	...
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Black Beans, Cheese, Lettuce]]	16.98	NaN	NaN	NaN
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Black Beans, Lettuce]]	8.49	NaN	NaN	NaN
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Pinto Beans, Cheese, Lettuce]]	8.49	NaN	NaN	NaN
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Pinto Beans, Fajita Veggies, Cheese, Sour Cream, Lettuce]]	8.99	NaN	NaN	NaN
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Pinto Beans, Fajita Veggies, Lettuce]]	8.99	NaN	NaN	NaN

1043 rows × 4 columns

There we have it , we now have a pivot table of our chipolle orders based on the choice description and quantity of the order of course we can do one more thing to make this pivot look a little bit better

In [8]:

df.pivot_table(values = "item_price", index="choice_description", columns = "quantity", aggfunc = "sum").fillna(0)

Out[8]:

quantity	1	2	3	4
choice_description
[Adobo-Marinated and Grilled Chicken, Pinto Beans, [Sour Cream, Salsa, Cheese, Cilantro-Lime Rice, Guacamole]]	7.40	0.0	0.0	0.0
[Adobo-Marinated and Grilled Chicken, [Sour Cream, Cheese, Cilantro-Lime Rice]]	7.40	0.0	0.0	0.0
[Adobo-Marinated and Grilled Chicken]	7.40	0.0	0.0	0.0
[Adobo-Marinated and Grilled Steak, [Sour Cream, Salsa, Cheese, Cilantro-Lime Rice, Guacamole]]	7.40	0.0	0.0	0.0
[Adobo-Marinated and Grilled Steak]	7.40	0.0	0.0	0.0
...	...	...	...	...
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Black Beans, Cheese, Lettuce]]	16.98	0.0	0.0	0.0
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Black Beans, Lettuce]]	8.49	0.0	0.0	0.0
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Pinto Beans, Cheese, Lettuce]]	8.49	0.0	0.0	0.0
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Pinto Beans, Fajita Veggies, Cheese, Sour Cream, Lettuce]]	8.99	0.0	0.0	0.0
[[Tomatillo-Red Chili Salsa (Hot), Tomatillo-Green Chili Salsa (Medium)], [Rice, Pinto Beans, Fajita Veggies, Lettuce]]	8.99	0.0	0.0	0.0

1043 rows × 4 columns

Plotting and sorting the values of the pivot table¶

in the method below we first created the pivot table using pivot_table() method we then sort the values of quanitites using sort_values() method and finally used the plot() method to plot the pivot table

In [9]:

df.pivot_table(values="quantity", index = "item_name",aggfunc ="sum").sort_values("quantity").plot(kind = "bar",figsize = (18,5))

Out[9]:

<matplotlib.axes._subplots.AxesSubplot at 0x15ec16faac8>