In [4]:
import pandas as pd
import seaborn as sns
pd.options.display.max_rows = 10
df = sns.load_dataset("titanic")
In [2]:
df.head()
Out[2]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True

When selecting a single Column from your data you should ask yourself if you want to get a DataFrame or a Series.

accessing your columns in pandas.

  1. Type in the name of your dataframe
  2. open brackets.
  3. inside your brackets pass in your column name either as a string or as a list.
    if you pass in your column name as a string you will receive a Series.
    if you pass in your column name inside a list you will receive a DataFrame.
In [8]:
type(df["survived"])
Out[8]:
pandas.core.series.Series
In [12]:
type(df.survived)
Out[12]:
pandas.core.series.Series
In [7]:
type(df[["survived"]])
Out[7]:
pandas.core.frame.DataFrame

The operation below returns a Series

In [13]:
df["survived"]
Out[13]:
0      0
1      1
2      1
3      1
4      0
      ..
886    0
887    1
888    0
889    1
890    0
Name: survived, Length: 891, dtype: int64

If you want a Series back you can access your columns the Pythonic way using the dot operation such as below:

  • Note You can not create a new column using the dot operation.
In [29]:
df.survived
Out[29]:
0      0
1      1
2      1
3      1
4      0
      ..
886    0
887    1
888    0
889    1
890    0
Name: survived, Length: 891, dtype: int64

If I pass "survived" in a list I will get back a DataFrame

In [26]:
df[["survived"]]
Out[26]:
survived
0 0
1 1
2 1
3 1
4 0
... ...
886 0
887 1
888 0
889 1
890 0

891 rows × 1 columns

Selecting Two or More columns:

  • You will always need to pass in your column names inside a list.
  • Which makes sense if you think about it, when you have multiple columns you will have a list of column names, so use list by adding brackets arround your column names.
  • you might be wondering about the first brackets, those are just to access your dataframe.
In [28]:
df[["survived","fare"]]
Out[28]:
survived fare
0 0 7.2500
1 1 71.2833
2 1 7.9250
3 1 53.1000
4 0 8.0500
... ... ...
886 0 13.0000
887 1 30.0000
888 0 23.4500
889 1 30.0000
890 0 7.7500

891 rows × 2 columns

The only difference between loc and iloc is that loc takes the label address and iloc takes the index.

You can pass in a boolean to loc to filter your data, but you can not pass a boolan to iloc.

Let's Try to understand what loc method does ( also covered in filtering data Tutorials)

  • ### df.loc[<<< row selection by index label >>> , <<< col selection by column label>>> ]
  • ### df.loc[ : , [col1, col2]] Will return all rows, and col1 and col2
  • ### df.loc[1:4, :] Will select rows 1 to 3 and all columns
In [16]:
df.loc[:,["survived"]]
Out[16]:
survived
0 0
1 1
2 1
3 1
4 0
... ...
886 0
887 1
888 0
889 1
890 0

891 rows × 1 columns

Selecting Multiple columns with the Loc Method.

In [18]:
df.loc[:,["survived","fare","sex"]]
Out[18]:
survived fare sex
0 0 7.2500 male
1 1 71.2833 female
2 1 7.9250 female
3 1 53.1000 female
4 0 8.0500 male
... ... ... ...
886 0 13.0000 male
887 1 30.0000 female
888 0 23.4500 female
889 1 30.0000 male
890 0 7.7500 male

891 rows × 3 columns

Changing the order of the columns with the loc method.

In [19]:
df.loc[:,["sex","survived","fare"]]
Out[19]:
sex survived fare
0 male 0 7.2500
1 female 1 71.2833
2 female 1 7.9250
3 female 1 53.1000
4 male 0 8.0500
... ... ... ...
886 male 0 13.0000
887 female 1 30.0000
888 female 0 23.4500
889 male 1 30.0000
890 male 0 7.7500

891 rows × 3 columns

Let's Try to understand what iloc method does ( also covered in filtering data Tutorials)

  • ### df.iloc[<<< row selection by row index >>> , <<< col selection by column index>>> ]
  • ### df.iloc[ : , [0, 4] ] Will return all rows, and First and Third columns
  • ### df.iloc[ : , [0:5] ] Will return all rows, and columns 0 to 4th
  • ### df.iloc[1:4, : ] Will select rows 1 to 3 and all columns
In [31]:
# This will return everything
df.iloc[:, : ]
Out[31]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
886 0 2 male 27.0 0 0 13.0000 S Second man True NaN Southampton no True
887 1 1 female 19.0 0 0 30.0000 S First woman False B Southampton yes True
888 0 3 female NaN 1 2 23.4500 S Third woman False NaN Southampton no False
889 1 1 male 26.0 0 0 30.0000 C First man True C Cherbourg yes True
890 0 3 male 32.0 0 0 7.7500 Q Third man True NaN Queenstown no True

891 rows × 15 columns

In [32]:
df.iloc[:, 0:3]
Out[32]:
survived pclass sex
0 0 3 male
1 1 1 female
2 1 3 female
3 1 1 female
4 0 3 male
... ... ... ...
886 0 2 male
887 1 1 female
888 0 3 female
889 1 1 male
890 0 3 male

891 rows × 3 columns

We can replicate the iloc operation for columns by accessing the columns index from the columns property as shown below:

In [25]:
df[df.columns[[0,5,4]]]
Out[25]:
survived parch sibsp
0 0 0 1
1 1 0 1
2 1 0 0
3 1 0 1
4 0 0 0
... ... ... ...
886 0 0 0
887 1 0 0
888 0 2 1
889 1 0 0
890 0 0 0

891 rows × 3 columns