In [1]:

import seaborn as sns
%matplotlib inline

In [2]:

import pandas as pd
pd.options.display.max_rows = 10

In [5]:

df = sns.load_dataset("exercise")

Columns¶

We are going to explore Columns today and how we can rename column names, before I begin my data has a column called "Unnamed:0" which seems to be an index column and since we get an index everytime we read a data in Pandas, I'm going to delete this column.

before I do that to see my current columns I can use .columns which will show all the columns in table and we are going to deal with in renaming columns too, but there is a few ways we can rename columns in pandas.

In [9]:

df.columns

Out[9]:

Index(['Unnamed: 0', 'id', 'diet', 'pulse', 'time', 'kind'], dtype='object')

In [12]:

df.drop("Unnamed: 0",1,inplace = True)

Renaming Columns¶

We can do this by either using the .rename() method¶

or¶

we could access the colum name we wish to change from .columns property and update it there.¶

rename method¶

suppose I want to change the column "time" to "time_at" in my data

In [13]:

df.rename(columns = {"time":"time_at"})

Out[13]:

	id	diet	pulse	time_at	kind
0	1	low fat	85	1 min	rest
1	1	low fat	85	15 min	rest
2	1	low fat	88	30 min	rest
3	2	low fat	90	1 min	rest
4	2	low fat	92	15 min	rest
...	...	...	...	...	...
85	29	no fat	135	15 min	running
86	29	no fat	130	30 min	running
87	30	no fat	99	1 min	running
88	30	no fat	111	15 min	running
89	30	no fat	150	30 min	running

90 rows × 5 columns

Ok Let's walk through what happened.

We first called the renamed methods.
inside the rename we passed in columns as an argument which specifies that we are going to rename a column. we could also pass in index to change a row label.
columns takes a dictionary as a value, with keys being the current name and the value being the new name of the column.
we can of course change multiple column names at the same time by passing in the name of each column to the dictionary.

In [14]:

df.rename(columns = {"time":"time_at","id":"new_id"})

Out[14]:

	new_id	diet	pulse	time_at	kind
0	1	low fat	85	1 min	rest
1	1	low fat	85	15 min	rest
2	1	low fat	88	30 min	rest
3	2	low fat	90	1 min	rest
4	2	low fat	92	15 min	rest
...	...	...	...	...	...
85	29	no fat	135	15 min	running
86	29	no fat	130	30 min	running
87	30	no fat	99	1 min	running
88	30	no fat	111	15 min	running
89	30	no fat	150	30 min	running

90 rows × 5 columns

using the .columns property to change column names¶

The rename method is good, but my method of choice is manipulating the values of .columns property which returns a list of your column names and you can do all sort of fun stuff with it.

In [35]:

df.columns = ['id', 'diet', 'pulse', 'new_time', 'kind']
df.columns

Out[35]:

Index(['id', 'diet', 'pulse', 'new_time', 'kind'], dtype='object')

with df.columns I tend to be more creative then the rename method, let's say I wanted to make all my column names upper case

In [37]:

df.columns = df.columns.str.upper()
df.columns

Out[37]:

Index(['ID', 'DIET', 'PULSE', 'NEW_TIME', 'KIND'], dtype='object')

since columns can accept str which stands for string method, we do all kinds of fancy selectinon, like contains, upper, lower, title and many more.

and just like the loc method , columns method accepts a boolean (Trues and Falses) to filter the columns, so if you have certain criteria you can filter your columns based on that criteria. for example let's say you had a data with 100 columns and some of them contained the word sale in them,
then you could do something like.. df.columns.str.contains("sale") and this would return a boolan with all the columns that contain sale as Trues.
Now we can pass that boolan to our columns method to filter the columns that contains the word sale:

df.columns[df.columns.str.contains("sale")]

Adding previx and suffix to the column names¶

since column names are strings , we can use pythonic way of adding a string to a string and pandas will apply this to all the column names, of course as mentioned above you can do this to subsector of columns using boolan filter we explained above.

In [42]:

df.columns + " suffix"

Out[42]:

Index(['ID suffix', 'DIET suffix', 'PULSE suffix', 'NEW_TIME suffix',
       'KIND suffix'],
      dtype='object')

In [43]:

"prfix " + df.columns

Out[43]:

Index(['prfix ID', 'prfix DIET', 'prfix PULSE', 'prfix NEW_TIME',
       'prfix KIND'],
      dtype='object')

String method for the columns¶

In [44]:

df.columns.str.upper()

Out[44]:

Index(['ID', 'DIET', 'PULSE', 'NEW_TIME', 'KIND'], dtype='object')

In [46]:

### and if i wanted to set this as permanent column names I just set df.columns to this new values
df.columns = df.columns.str.upper()
df.columns

Out[46]:

Index(['ID', 'DIET', 'PULSE', 'NEW_TIME', 'KIND'], dtype='object')

In [47]:

df.columns = df.columns.str.lower()
df.columns

Out[47]:

Index(['id', 'diet', 'pulse', 'new_time', 'kind'], dtype='object')

In [49]:

df.columns = df.columns.str.title()
df.columns

Out[49]:

Index(['Id', 'Diet', 'Pulse', 'New_Time', 'Kind'], dtype='object')

You can do some real advance stuff with .str method and columns for example below shows an example of the columns that contain the letter d, in both upper case and lower case. and if you are familiar with regular expressions you can pass in regular expression patterns inside the contains method too.

In [53]:

df.columns.str.contains("d|D")

Out[53]:

array([ True,  True, False, False,  True])

In [54]:

df.columns [ df.columns.str.contains("d|D")]

Out[54]:

Index(['Id', 'Diet', 'Kind'], dtype='object')

In [ ]: