In [1]:
import seaborn as sns
%matplotlib inline
In [2]:
import pandas as pd
pd.options.display.max_rows = 10
In [5]:
df = sns.load_dataset("exercise")

Columns

We are going to explore Columns today and how we can rename column names, before I begin my data has a column called "Unnamed:0" which seems to be an index column and since we get an index everytime we read a data in Pandas, I'm going to delete this column.

before I do that to see my current columns I can use .columns which will show all the columns in table and we are going to deal with in renaming columns too, but there is a few ways we can rename columns in pandas.

In [9]:
df.columns
Out[9]:
Index(['Unnamed: 0', 'id', 'diet', 'pulse', 'time', 'kind'], dtype='object')
In [12]:
df.drop("Unnamed: 0",1,inplace = True)

Renaming Columns

We can do this by either using the .rename() method
or
we could access the colum name we wish to change from .columns property and update it there.

rename method

suppose I want to change the column "time" to "time_at" in my data

In [13]:
df.rename(columns = {"time":"time_at"})
Out[13]:
id diet pulse time_at kind
0 1 low fat 85 1 min rest
1 1 low fat 85 15 min rest
2 1 low fat 88 30 min rest
3 2 low fat 90 1 min rest
4 2 low fat 92 15 min rest
... ... ... ... ... ...
85 29 no fat 135 15 min running
86 29 no fat 130 30 min running
87 30 no fat 99 1 min running
88 30 no fat 111 15 min running
89 30 no fat 150 30 min running

90 rows × 5 columns

Ok Let's walk through what happened.

  • We first called the renamed methods.
  • inside the rename we passed in columns as an argument which specifies that we are going to rename a column. we could also pass in index to change a row label.
  • columns takes a dictionary as a value, with keys being the current name and the value being the new name of the column.
  • we can of course change multiple column names at the same time by passing in the name of each column to the dictionary.
In [14]:
df.rename(columns = {"time":"time_at","id":"new_id"})
Out[14]:
new_id diet pulse time_at kind
0 1 low fat 85 1 min rest
1 1 low fat 85 15 min rest
2 1 low fat 88 30 min rest
3 2 low fat 90 1 min rest
4 2 low fat 92 15 min rest
... ... ... ... ... ...
85 29 no fat 135 15 min running
86 29 no fat 130 30 min running
87 30 no fat 99 1 min running
88 30 no fat 111 15 min running
89 30 no fat 150 30 min running

90 rows × 5 columns

using the .columns property to change column names

The rename method is good, but my method of choice is manipulating the values of .columns property which returns a list of your column names and you can do all sort of fun stuff with it.

In [35]:
df.columns = ['id', 'diet', 'pulse', 'new_time', 'kind']
df.columns
Out[35]:
Index(['id', 'diet', 'pulse', 'new_time', 'kind'], dtype='object')

with df.columns I tend to be more creative then the rename method, let's say I wanted to make all my column names upper case

In [37]:
df.columns = df.columns.str.upper()
df.columns
Out[37]:
Index(['ID', 'DIET', 'PULSE', 'NEW_TIME', 'KIND'], dtype='object')

since columns can accept str which stands for string method, we do all kinds of fancy selectinon, like contains, upper, lower, title and many more.

and just like the loc method , columns method accepts a boolean (Trues and Falses) to filter the columns, so if you have certain criteria you can filter your columns based on that criteria. for example let's say you had a data with 100 columns and some of them contained the word sale in them,
then you could do something like.. df.columns.str.contains("sale") and this would return a boolan with all the columns that contain sale as Trues.
Now we can pass that boolan to our columns method to filter the columns that contains the word sale:

df.columns[df.columns.str.contains("sale")]

Adding previx and suffix to the column names

since column names are strings , we can use pythonic way of adding a string to a string and pandas will apply this to all the column names, of course as mentioned above you can do this to subsector of columns using boolan filter we explained above.

In [42]:
df.columns + " suffix"
Out[42]:
Index(['ID suffix', 'DIET suffix', 'PULSE suffix', 'NEW_TIME suffix',
       'KIND suffix'],
      dtype='object')
In [43]:
"prfix " + df.columns
Out[43]:
Index(['prfix ID', 'prfix DIET', 'prfix PULSE', 'prfix NEW_TIME',
       'prfix KIND'],
      dtype='object')

String method for the columns

In [44]:
df.columns.str.upper()
Out[44]:
Index(['ID', 'DIET', 'PULSE', 'NEW_TIME', 'KIND'], dtype='object')
In [46]:
### and if i wanted to set this as permanent column names I just set df.columns to this new values
df.columns = df.columns.str.upper()
df.columns
Out[46]:
Index(['ID', 'DIET', 'PULSE', 'NEW_TIME', 'KIND'], dtype='object')
In [47]:
df.columns = df.columns.str.lower()
df.columns
Out[47]:
Index(['id', 'diet', 'pulse', 'new_time', 'kind'], dtype='object')
In [49]:
df.columns = df.columns.str.title()
df.columns
Out[49]:
Index(['Id', 'Diet', 'Pulse', 'New_Time', 'Kind'], dtype='object')

You can do some real advance stuff with .str method and columns for example below shows an example of the columns that contain the letter d, in both upper case and lower case. and if you are familiar with regular expressions you can pass in regular expression patterns inside the contains method too.

In [53]:
df.columns.str.contains("d|D")
Out[53]:
array([ True,  True, False, False,  True])
In [54]:
df.columns [ df.columns.str.contains("d|D")]
Out[54]:
Index(['Id', 'Diet', 'Kind'], dtype='object')
In [ ]: