In [1]:
import pandas as pd
In [3]:
voters = pd.read_excel("data/vlookup-example.xls",sheet_name="Voters")
In [4]:
party_name = pd.read_excel("data/vlookup-example.xls", sheet_name = "Party Codes")

I have a very basic spread sheet with two sheets one called Voters and one called Party Codes. and I want to look up the party name based on the party code for the voters. lets look at the data to get a better understanding of this.

as you can see below the goal here is to fill in the "Political Party" Column

In [7]:
voters
Out[7]:
First Last Pcode Political Party
0 Smith Fred A NaN
1 Robbins Terry 1 NaN
2 O'Neill Susan B NaN
3 Parker Scott D NaN
4 Perkins Ralph D NaN
5 Talbot Angie 7 NaN
In [8]:
party_name.head()
Out[8]:
PARTY CODE NAME
0 1 Green
1 2 Reform
2 3 Whig
3 4 Islamic Political Party of America
4 5 Rock & Roll

In pandas to mimic the functionality of a vlookup we use a method called merge, as the name suggest merge method merges or combines two or more tables togather

In [13]:
pd.merge(left = voters, right = party_name, left_on ="Pcode", right_on = "PARTY CODE" , how="left")
Out[13]:
First Last Pcode Political Party PARTY CODE NAME
0 Smith Fred A NaN A Democratic
1 Robbins Terry 1 NaN 1 Green
2 O'Neill Susan B NaN B Republican
3 Parker Scott D NaN D American Independent
4 Perkins Ralph D NaN D American Independent
5 Talbot Angie 7 NaN 7 Middle Class Pty

There we go we just joined the two tables together, but it's not neat, I want the NAME column to be in Politcal Party

In [30]:
pd.merge(left = voters,
         right = party_name,
         left_on ="Pcode",
         right_on = "PARTY CODE",
         how="left").loc[:,["First","Last","Pcode","NAME"]].rename(columns={"NAME":"Political Party"})
Out[30]:
First Last Pcode Political Party
0 Smith Fred A Democratic
1 Robbins Terry 1 Green
2 O'Neill Susan B Republican
3 Parker Scott D American Independent
4 Perkins Ralph D American Independent
5 Talbot Angie 7 Middle Class Pty

Lets walk throught the methods above so we can all understand it

  1. we use the merge method with left(left table or first table) being Voters and right(right table or latter table) being party_name where our party codes and names reside
  2. since the names of the code differs in two tables we need to tell pandas what the common column is called in the left table and what it's called in the right table, hence left_on = "Pcode" and right_on = "PARTY_CODE". This is essentialy what we are doing a vlookup based on and thus the _on keyword.
  3. since we are joining two tables, we need to tell pandas how it should join the tables and how to deal with the conflicts of the rows between the two tables, ny default it's an inner join, if you are familiar with SQL it's same concept of inner join. but since we are trying to mimic a vlookup we use the left join in here, I explain more about the tyes of join at the end.
  4. Now that I have completed my merge(vlookup) I want my table to look good so I used the loc which is great filtering tool for boths rows and column to only show "First","Last","Pcode","NAME" columns
  5. and finally I used a rename function to rename column name which I pulled from the second table. There are so many ways to rename columns in pandas so if you are not a fan of the method above, rest assure there are many ways to this and I have covered this on my site.
In [ ]:

Different ways of merging

inner (default): use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.

left : use only keys from left frame, similar to a SQL left outer join; preserve key order.

right: use only keys from right frame, similar to a SQL right outer join; preserve key order.

outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.