In [4]:

import pandas as pd
import seaborn as sns
pd.options.display.max_rows = 10
df = sns.load_dataset("titanic")

In [2]:

df.head()

Out[2]:

	survived	pclass	sex	age	sibsp	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22.0	1	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38.0	1	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26.0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35.0	1	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35.0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True

When selecting a single Column from your data you should ask yourself if you want to get a DataFrame or a Series.¶

accessing your columns in pandas.¶

Type in the name of your dataframe
open brackets.
inside your brackets pass in your column name either as a string or as a list.
if you pass in your column name as a string you will receive a Series.
if you pass in your column name inside a list you will receive a DataFrame.

In [8]:

type(df["survived"])

Out[8]:

pandas.core.series.Series

In [12]:

type(df.survived)

Out[12]:

pandas.core.series.Series

In [7]:

type(df[["survived"]])

Out[7]:

pandas.core.frame.DataFrame

The operation below returns a Series¶

In [13]:

df["survived"]

Out[13]:

0      0
1      1
2      1
3      1
4      0
      ..
886    0
887    1
888    0
889    1
890    0
Name: survived, Length: 891, dtype: int64

If you want a Series back you can access your columns the Pythonic way using the dot operation such as below:¶

Note You can not create a new column using the dot operation.

In [29]:

df.survived

Out[29]:

0      0
1      1
2      1
3      1
4      0
      ..
886    0
887    1
888    0
889    1
890    0
Name: survived, Length: 891, dtype: int64

If I pass "survived" in a list I will get back a DataFrame¶

In [26]:

df[["survived"]]

Out[26]:

	survived
0	0
1	1
2	1
3	1
4	0
...	...
886	0
887	1
888	0
889	1
890	0

891 rows × 1 columns

Selecting Two or More columns:¶

You will always need to pass in your column names inside a list.
Which makes sense if you think about it, when you have multiple columns you will have a list of column names, so use list by adding brackets arround your column names.
you might be wondering about the first brackets, those are just to access your dataframe.

In [28]:

df[["survived","fare"]]

Out[28]:

	survived	fare
0	0	7.2500
1	1	71.2833
2	1	7.9250
3	1	53.1000
4	0	8.0500
...	...	...
886	0	13.0000
887	1	30.0000
888	0	23.4500
889	1	30.0000
890	0	7.7500

891 rows × 2 columns

Selecting Columns using the loc and iloc method ( recommended method of selection)¶

The only difference between loc and iloc is that loc takes the label address and iloc takes the index.¶

You can pass in a boolean to loc to filter your data, but you can not pass a boolan to iloc.¶

Let's Try to understand what loc method does ( also covered in filtering data Tutorials)¶

### df.loc[<<< row selection by index label >>> , <<< col selection by column label>>> ]
### df.loc[ : , [col1, col2]] Will return all rows, and col1 and col2
### df.loc[1:4, :] Will select rows 1 to 3 and all columns

In [16]:

df.loc[:,["survived"]]

Out[16]:

	survived
0	0
1	1
2	1
3	1
4	0
...	...
886	0
887	1
888	0
889	1
890	0

891 rows × 1 columns

Selecting Multiple columns with the Loc Method.¶

In [18]:

df.loc[:,["survived","fare","sex"]]

Out[18]:

	survived	fare	sex
0	0	7.2500	male
1	1	71.2833	female
2	1	7.9250	female
3	1	53.1000	female
4	0	8.0500	male
...	...	...	...
886	0	13.0000	male
887	1	30.0000	female
888	0	23.4500	female
889	1	30.0000	male
890	0	7.7500	male

891 rows × 3 columns

Changing the order of the columns with the loc method.¶

In [19]:

df.loc[:,["sex","survived","fare"]]

Out[19]:

	sex	survived	fare
0	male	0	7.2500
1	female	1	71.2833
2	female	1	7.9250
3	female	1	53.1000
4	male	0	8.0500
...	...	...	...
886	male	0	13.0000
887	female	1	30.0000
888	female	0	23.4500
889	male	1	30.0000
890	male	0	7.7500

891 rows × 3 columns

Let's Try to understand what iloc method does ( also covered in filtering data Tutorials)¶

### df.iloc[<<< row selection by row index >>> , <<< col selection by column index>>> ]
### df.iloc[ : , [0, 4] ] Will return all rows, and First and Third columns
### df.iloc[ : , [0:5] ] Will return all rows, and columns 0 to 4th
### df.iloc[1:4, : ] Will select rows 1 to 3 and all columns

In [31]:

# This will return everything
df.iloc[:, : ]

Out[31]:

	survived	pclass	sex	age	sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22.0	1	0	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38.0	1	0	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26.0	0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35.0	1	0	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35.0	0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
886	0	2	male	27.0	0	0	13.0000	S	Second	man	True	NaN	Southampton	no	True
887	1	1	female	19.0	0	0	30.0000	S	First	woman	False	B	Southampton	yes	True
888	0	3	female	NaN	1	2	23.4500	S	Third	woman	False	NaN	Southampton	no	False
889	1	1	male	26.0	0	0	30.0000	C	First	man	True	C	Cherbourg	yes	True
890	0	3	male	32.0	0	0	7.7500	Q	Third	man	True	NaN	Queenstown	no	True

891 rows × 15 columns

In [32]:

df.iloc[:, 0:3]

Out[32]:

	survived	pclass	sex
0	0	3	male
1	1	1	female
2	1	3	female
3	1	1	female
4	0	3	male
...	...	...	...
886	0	2	male
887	1	1	female
888	0	3	female
889	1	1	male
890	0	3	male

891 rows × 3 columns

We can replicate the iloc operation for columns by accessing the columns index from the columns property as shown below:

In [25]:

df[df.columns[[0,5,4]]]

Out[25]:

	survived	parch	sibsp
0	0	0	1
1	1	0	1
2	1	0	0
3	1	0	1
4	0	0	0
...	...	...	...
886	0	0	0
887	1	0	0
888	0	2	1
889	1	0	0
890	0	0	0

891 rows × 3 columns

	survived	parch	sibsp
0	0	0	1
1	1	0	1
2	1	0	0
3	1	0	1
4	0	0	0
...	...	...	...
886	0	0	0
887	1	0	0
888	0	2	1
889	1	0	0
890	0	0	0

	survived	parch	sibsp
0	0	0	1
1	1	0	1
2	1	0	0
3	1	0	1
4	0	0	0
...	...	...	...
886	0	0	0
887	1	0	0
888	0	2	1
889	1	0	0
890	0	0	0

	survived	parch	sibsp
0	0	0	1
1	1	0	1
2	1	0	0
3	1	0	1
4	0	0	0
...	...	...	...
886	0	0	0
887	1	0	0
888	0	2	1
889	1	0	0
890	0	0	0