In [1]:
#preliminaries
import seaborn as sns
import pandas as pd
pd.options.display.max_rows = 10
df = sns.load_dataset("titanic")

Adding new columns to a DataFrame(Table)

  • We are going to work with the Titanic Data
  • Take a look at the data below which is in the df variable.
In [2]:
df
Out[2]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
886 0 2 male 27.0 0 0 13.0000 S Second man True NaN Southampton no True
887 1 1 female 19.0 0 0 30.0000 S First woman False B Southampton yes True
888 0 3 female NaN 1 2 23.4500 S Third woman False NaN Southampton no False
889 1 1 male 26.0 0 0 30.0000 C First man True C Cherbourg yes True
890 0 3 male 32.0 0 0 7.7500 Q Third man True NaN Queenstown no True

891 rows × 15 columns

To see the current columns we can call the columns property

In [3]:
df.columns
Out[3]:
Index(['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'fare',
       'embarked', 'class', 'who', 'adult_male', 'deck', 'embark_town',
       'alive', 'alone'],
      dtype='object')

To add a new column you just put the label(name) of the new column in a bracket such as below and assign a value to it:

  • The operation below creates a new column called new_column and fills the column values with the constant value of 1.
In [4]:
df["new_column"] = 1
In [5]:
df.head()
Out[5]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone new_column
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False 1
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False 1
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True 1
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False 1
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True 1

You can create multiple columns at once , seperating the operation by a comma As shown below.

  • if you are creating multiple columns such as below, you will need to pass your values into a list, where each value of the list would represent the column value based on the same order.
  • in the operatiob below new col 1 will have a constant value of 2 and new col 2 will have a constant value of 3.
In [6]:
df["new col 1"], df["new col 2"] = [2,3]
In [7]:
df.head()
Out[7]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone new_column new col 1 new col 2
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False 1 2 3
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False 1 2 3
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True 1 2 3
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False 1 2 3
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True 1 2 3

Creating new columns from Maths or Logical operations of other columns.

  • Most of the time when you create a new column, you want to do something to another column and store the results in a new column.
  • Lets see how we can do this below:
In [8]:
df["fare_squared"] = df.fare **2
In [9]:
df.head()
Out[9]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone new_column new col 1 new col 2 fare_squared
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False 1 2 3 52.562500
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False 1 2 3 5081.308859
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True 1 2 3 62.805625
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False 1 2 3 2819.610000
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True 1 2 3 64.802500
In [10]:
df["fare + sibsp"] = df.fare + df.sibsp
In [11]:
df.head()
Out[11]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone new_column new col 1 new col 2 fare_squared fare + sibsp
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False 1 2 3 52.562500 8.2500
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False 1 2 3 5081.308859 72.2833
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True 1 2 3 62.805625 7.9250
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False 1 2 3 2819.610000 54.1000
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True 1 2 3 64.802500 8.0500