import seaborn as sns data = sns.load_dataset('tips')
import pandas as pd %matplotlib inline pd.options.display.max_rows = 10
data
244 rows × 7 columns
To be extra clear here data.loc[rowslices, colslices] and either one of the slices can take in a boolan too which we will cover in more depth with examples below.
I want to filter my data to show me only the tips from the Female customers. As you can see below we done this in a single line of code, but let's try to understand what exactly is happening
data.loc[data.sex == "Female"]
87 rows × 7 columns
lets break the code above into components,
loc When we call the loc method, we are asking for a location in our data set, so the loc method expects an address if you will. An address for the rows and an address for the columns, But we can also pass in a boolan of True and False Values and it will only return all the rows which qualified as the True Values. additionally we can pass in a boolan for columns too, but if you just pass in a single boolan statment it will return all the columns and the rows with True values.
data.sex == "Female" If you are familiar with Python you know that == is a conditional statement , it simply means if something is equal to something else, return True other wise return False. We are asking our data to see if the sex column data.sex is equal to Female and if it is return True, and Pandas is smart enough to do this for every row in your data.
Puting it all together data.loc[data.sex == "Female"] Now that we have a boolan as per the 2nd Point above, we can use that boolan to filter our data to only show the True values of the boolean which is the Female Customer. And thats how we can filter our data to show specific part of the data.
data.sex == "Female"
0 True 1 False 2 False 3 False 4 True ... 239 False 240 True 241 False 242 False 243 True Name: sex, Length: 244, dtype: bool
Now that we understand the concept of the boolan , you should know we can make use of the and and or conditional statements too as refreser.
statment1 and statement2 will return True if both are True and False otherwise. statement1 or statement2 will return True if either of the statements are True.
#example 1==1 and 2==2
True
1==1 and 2==1
False
1==1 or 2==1
Pandas allows us to use and and or conditions in our conditionals in a loc method but the syntax is slightly different. & represents and | represents or And we need to put each staement inside a bracket() for it to work, lets look at an example below
# I want to filter my data for female who gave tips during a dinner meal data.loc[(data.sex=="Female") & (data.time == "Dinner")]
52 rows × 7 columns
So just to recap what's happening above we are using an and conditional with two statement:
If they are both True then Return True, and filter my data based on the final True values which is females at dinner.
Let's use the same example with an or statment which should return any row with female as sex, or dinner as time.
Note The Sample Method just shuffles the data so we can see we are indeed returning either female or dinnner.
# I want to filter my data for female who gave tips during a dinner meal data.loc[(data.sex=="Female") | (data.time == "Dinner")].sample(10)
data.loc[(data.total_bill - data.tip) ==11.0]
data.loc[(data.tip/ data.total_bill ) > 0.4]
There we have it, it seems like people are in a better mood at Sunday Dinners. :)
data.iloc[pd.np.r_[0:5,7,9],[1]]