import seaborn as sns %matplotlib inline
import pandas as pd pd.options.display.max_rows = 10
df = sns.load_dataset("exercise")
We are going to explore Columns today and how we can rename column names, before I begin my data has a column called "Unnamed:0" which seems to be an index column and since we get an index everytime we read a data in Pandas, I'm going to delete this column.
before I do that to see my current columns I can use .columns which will show all the columns in table and we are going to deal with in renaming columns too, but there is a few ways we can rename columns in pandas.
df.columns
Index(['Unnamed: 0', 'id', 'diet', 'pulse', 'time', 'kind'], dtype='object')
df.drop("Unnamed: 0",1,inplace = True)
suppose I want to change the column "time" to "time_at" in my data
df.rename(columns = {"time":"time_at"})
90 rows × 5 columns
Ok Let's walk through what happened.
df.rename(columns = {"time":"time_at","id":"new_id"})
The rename method is good, but my method of choice is manipulating the values of .columns property which returns a list of your column names and you can do all sort of fun stuff with it.
df.columns = ['id', 'diet', 'pulse', 'new_time', 'kind'] df.columns
Index(['id', 'diet', 'pulse', 'new_time', 'kind'], dtype='object')
with df.columns I tend to be more creative then the rename method, let's say I wanted to make all my column names upper case
df.columns = df.columns.str.upper() df.columns
Index(['ID', 'DIET', 'PULSE', 'NEW_TIME', 'KIND'], dtype='object')
since columns can accept str which stands for string method, we do all kinds of fancy selectinon, like contains, upper, lower, title and many more.
and just like the loc method , columns method accepts a boolean (Trues and Falses) to filter the columns, so if you have certain criteria you can filter your columns based on that criteria. for example let's say you had a data with 100 columns and some of them contained the word sale in them, then you could do something like.. df.columns.str.contains("sale") and this would return a boolan with all the columns that contain sale as Trues. Now we can pass that boolan to our columns method to filter the columns that contains the word sale: df.columns[df.columns.str.contains("sale")]
since column names are strings , we can use pythonic way of adding a string to a string and pandas will apply this to all the column names, of course as mentioned above you can do this to subsector of columns using boolan filter we explained above.
df.columns + " suffix"
Index(['ID suffix', 'DIET suffix', 'PULSE suffix', 'NEW_TIME suffix', 'KIND suffix'], dtype='object')
"prfix " + df.columns
Index(['prfix ID', 'prfix DIET', 'prfix PULSE', 'prfix NEW_TIME', 'prfix KIND'], dtype='object')
df.columns.str.upper()
### and if i wanted to set this as permanent column names I just set df.columns to this new values df.columns = df.columns.str.upper() df.columns
df.columns = df.columns.str.lower() df.columns
df.columns = df.columns.str.title() df.columns
Index(['Id', 'Diet', 'Pulse', 'New_Time', 'Kind'], dtype='object')
You can do some real advance stuff with .str method and columns for example below shows an example of the columns that contain the letter d, in both upper case and lower case. and if you are familiar with regular expressions you can pass in regular expression patterns inside the contains method too.
df.columns.str.contains("d|D")
array([ True, True, False, False, True])
df.columns [ df.columns.str.contains("d|D")]
Index(['Id', 'Diet', 'Kind'], dtype='object')