Missing values¶

Removing Missing Values and filling Missing values¶

Let's import pandas and seaborn, seaborn is used to load sample data for this example.
Take a look at the data below, we can already see some "NaN" values in the deck column so we know we have missing values.

However we do need a more robust way of identifying missing values.

In [1]:

import seaborn as sns
import pandas as pd
pd.options.display.max_rows = 15
%matplotlib inline

df = sns.load_dataset("titanic")
df.head()

Out[1]:

	survived	pclass	sex	age	sibsp	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22.0	1	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38.0	1	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26.0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35.0	1	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35.0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True

We can either useisna() method or isnull() method, they both do the same thing, however I preffer the isna() method, since it's more inline with the dropna() method which we will look at down the line.
but it all comes down to your preferences, just know you have these two method to identify missing or null values.

isna()¶

when you call isna to a DataFrame it will go over all your columns and return True for the cell which has a missing value.
Of course this boolean is used so you can filter your data and see the missing values.
But if you wanted some quic number of missing values for each of your columns, you can chain the sum method to your isna() and that will return the total number of missing values in each column.

In [2]:

df.isna()

Out[2]:

	survived	pclass	sex	age	sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	False	False	False	False	False	False	False	False	False	False	False	True	False	False	False
1	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
2	False	False	False	False	False	False	False	False	False	False	False	True	False	False	False
3	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
4	False	False	False	False	False	False	False	False	False	False	False	True	False	False	False
5	False	False	False	True	False	False	False	False	False	False	False	True	False	False	False
6	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
884	False	False	False	False	False	False	False	False	False	False	False	True	False	False	False
885	False	False	False	False	False	False	False	False	False	False	False	True	False	False	False
886	False	False	False	False	False	False	False	False	False	False	False	True	False	False	False
887	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
888	False	False	False	True	False	False	False	False	False	False	False	True	False	False	False
889	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
890	False	False	False	False	False	False	False	False	False	False	False	True	False	False	False

891 rows × 15 columns

The sum method for isna can be very handy, I now know, the age column has 177 rows of missing values and the deck column has 688 rows of missing values etc...¶

In [3]:

df.isna().sum()

Out[3]:

survived         0
pclass           0
sex              0
age            177
sibsp            0
parch            0
fare             0
embarked         2
class            0
who              0
adult_male       0
deck           688
embark_town      2
alive            0
alone            0
dtype: int64

Just to prove a point both isnull and isna do the same thing.

In [4]:

df.isnull()

Out[4]:

	survived	pclass	sex	age	sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	False	False	False	False	False	False	False	False	False	False	False	True	False	False	False
1	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
2	False	False	False	False	False	False	False	False	False	False	False	True	False	False	False
3	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
4	False	False	False	False	False	False	False	False	False	False	False	True	False	False	False
5	False	False	False	True	False	False	False	False	False	False	False	True	False	False	False
6	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
884	False	False	False	False	False	False	False	False	False	False	False	True	False	False	False
885	False	False	False	False	False	False	False	False	False	False	False	True	False	False	False
886	False	False	False	False	False	False	False	False	False	False	False	True	False	False	False
887	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
888	False	False	False	True	False	False	False	False	False	False	False	True	False	False	False
889	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
890	False	False	False	False	False	False	False	False	False	False	False	True	False	False	False

891 rows × 15 columns

and this one liner is a solid proof for this.

In [5]:

(df.isna() == df.isnull()).all().all()

Out[5]:

True

Great, now we know we have missing values, what should we do with them?¶

We can either remove them but keep in mind we might lose valuable information.
We can fill them with some value, this can be any value but it's worth thinking about this value and how it could impact your final data, using some of the more common methods of filling missing values, such as the mean of the column or median could be a good idea but it all depends on the probel you are trying to solve.

Let's take a look at filling missing values...

In the example below I first accessed my age column using the dot notation.
I then used the fillna() method, and this method will fill the missing values with whatever value you pass into it.
I used 0 in this example to fill the missing values in the age column with 0.

In [6]:

df.age.fillna(0)

Out[6]:

0      22.0
1      38.0
2      26.0
3      35.0
4      35.0
5       0.0
6      54.0
       ... 
884    25.0
885    39.0
886    27.0
887    19.0
888     0.0
889    26.0
890    32.0
Name: age, Length: 891, dtype: float64

Note:
By default this does not over write the age column.
To overwrite the age column you can do the one of the followings:

df.age.fillna(0 , inplace=True)
or have:
df.age = df.age.fillna(0)

I did briefly mention that we could use a statistical integer to fill the missing values, this is specially usefull in Machine Learning Models where you could benefit from these missing values, rather than removing them or filling them with random values.

In the exmaple below I filled the missing values of age with the mean of age column

In [7]:

df.age.fillna(df.age.mean())

Out[7]:

0      22.000000
1      38.000000
2      26.000000
3      35.000000
4      35.000000
5      29.699118
6      54.000000
         ...    
884    25.000000
885    39.000000
886    27.000000
887    19.000000
888    29.699118
889    26.000000
890    32.000000
Name: age, Length: 891, dtype: float64

Column Data Types and fillna() method.¶

It's worth remembering that when you do use fillna() you can pretty much pass in any value or symbol and it will replace the missing values with the value you passed in, *****except for Categorical columns*.¶

However you should be carefull because:¶

Let's say your column was of type Integer and you passed in a value of String this will convert that column into an object type which is used for String Columns, thus every value in that column will become of type object.

Categorical types and fillna():¶

Take a look at the example below, I want to fill the column deck's missing values with the value missing

In [8]:

df.deck.fillna("missing")

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-42e213f52202> in <module>()
----> 1 df.deck.fillna("missing")

~\Anaconda3\lib\site-packages\pandas\core\series.py in fillna(self, value, method, axis, inplace, limit, downcast, **kwargs)
   2687                                           axis=axis, inplace=inplace,
   2688                                           limit=limit, downcast=downcast,
-> 2689                                           **kwargs)
   2690 
   2691     @Appender(generic._shared_docs['shift'] % _shared_doc_kwargs)

~\Anaconda3\lib\site-packages\pandas\core\generic.py in fillna(self, value, method, axis, inplace, limit, downcast)
   4327                 new_data = self._data.fillna(value=value, limit=limit,
   4328                                              inplace=inplace,
-> 4329                                              downcast=downcast)
   4330 
   4331             elif isinstance(value, (dict, ABCSeries)):

~\Anaconda3\lib\site-packages\pandas\core\internals.py in fillna(self, **kwargs)
   3454 
   3455     def fillna(self, **kwargs):
-> 3456         return self.apply('fillna', **kwargs)
   3457 
   3458     def downcast(self, **kwargs):

~\Anaconda3\lib\site-packages\pandas\core\internals.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
   3327 
   3328             kwargs['mgr'] = self
-> 3329             applied = getattr(b, f)(**kwargs)
   3330             result_blocks = _extend_blocks(applied, result_blocks)
   3331 

~\Anaconda3\lib\site-packages\pandas\core\internals.py in fillna(self, value, limit, inplace, downcast, mgr)
   2379         values = self.values if inplace else self.values.copy()
   2380         values = self._try_coerce_result(values.fillna(value=value,
-> 2381                                                        limit=limit))
   2382         return [self.make_block(values=values)]
   2383 

~\Anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    116                 else:
    117                     kwargs[new_arg_name] = new_arg_value
--> 118             return func(*args, **kwargs)
    119         return wrapper
    120     return _deprecate_kwarg

~\Anaconda3\lib\site-packages\pandas\core\categorical.py in fillna(self, value, method, limit)
   1680 
   1681             if not isna(value) and value not in self.categories:
-> 1682                 raise ValueError("fill value must be in categories")
   1683 
   1684             mask = values == -1

ValueError: fill value must be in categories

The error i'm getting is fill value must be in categories.
This means because we have a categorical column, we can't pass in any value we want,
the replacement of the missing value must be a category so we need to add it as a category fist.

below I used the add_categories to add "missing" as a category.
Now that missing is a category observed by the deck column I can replace the missing values with missing.

In [9]:

df.deck = df.deck.cat.add_categories("missing")
df.deck.fillna("missing")

Out[9]:

0      missing
1            C
2      missing
3            C
4      missing
5      missing
6            E
        ...   
884    missing
885    missing
886    missing
887          B
888    missing
889          C
890    missing
Name: deck, Length: 891, dtype: category
Categories (8, object): [A, B, C, D, E, F, G, missing]

Data Types and missing values example:¶

Take a look at the age column Data Type.
It's a "float64" type. which is a decimal number.

In [10]:

df.age.dtype

Out[10]:

dtype('float64')

I am now going to fill the missing values of age column with a Text which says filling_missing.
I want to overwrite the age column so I use inplace = True.

In [11]:

df.age.fillna("filling_missing",inplace = True)

If I look at the Column type for age now, what do you think is going to show?

In [12]:

df.age.dtype

Out[12]:

dtype('O')

it is of type 'O' which stands for Object, so it's always good to think about what your filling your missing values with before facing this problem.

#### You should:
fill the missing values of a float column with a float.
fill the missing values of a text column with text.
fill the missing values of a categorical column with a category which we showed above.

Deleting Missing Values¶

To delete missing values, we will make use of dropna() method.

dropna() Parameters:

axis: 0 or 'index, 1 or 'columns'
when you have axis = 0 or axis = 'index' dropna() will remove the rows that contains Missing Values.
When you have axis = 1 or axis = 'columns' dropna() will remove Columns that contains Missing Values.
how : 'any' or 'all'
When you have how = "any" it will drop the corresponding row/column if there is any missing value in the row/column.
When you have how = "all" it will drop the corresponding row/column if every value is a missing value in the row/column.
subset:
by default dropna() will look at all the columns/rows when discovering Missing values, you can use subset to pass in a column name or a list of column names, and it will only focus on those columns or rows.
thresh: a minimum threshold of missing values. for example let's say I had 3 rows of data and they all contained missing values, but I wanted to only select the row that had more than 10 missing value
I could then use thresh = 10
inplace: True or False, by default it's set to False, if you set this to True it will overwrite the Column with the updated results.

Examples of dropna()¶

if you just call dropna it will use the default parameters, which is how = "any" and axis = 0 or 'index'

In [13]:

df.dropna()

Out[13]:

	survived	pclass	sex	age	sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
1	1	1	female	38	1	0	71.2833	C	First	woman	False	C	Cherbourg	yes	False
3	1	1	female	35	1	0	53.1000	S	First	woman	False	C	Southampton	yes	False
6	0	1	male	54	0	0	51.8625	S	First	man	True	E	Southampton	no	True
10	1	3	female	4	1	1	16.7000	S	Third	child	False	G	Southampton	yes	False
11	1	1	female	58	0	0	26.5500	S	First	woman	False	C	Southampton	yes	True
21	1	2	male	34	0	0	13.0000	S	Second	man	True	D	Southampton	yes	True
23	1	1	male	28	0	0	35.5000	S	First	man	True	A	Southampton	yes	True
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
862	1	1	female	48	0	0	25.9292	S	First	woman	False	D	Southampton	yes	True
867	0	1	male	31	0	0	50.4958	S	First	man	True	A	Southampton	no	True
871	1	1	female	47	1	1	52.5542	S	First	woman	False	D	Southampton	yes	False
872	0	1	male	33	0	0	5.0000	S	First	man	True	B	Southampton	no	True
879	1	1	female	56	0	1	83.1583	C	First	woman	False	C	Cherbourg	yes	False
887	1	1	female	19	0	0	30.0000	S	First	woman	False	B	Southampton	yes	True
889	1	1	male	26	0	0	30.0000	C	First	man	True	C	Cherbourg	yes	True

201 rows × 15 columns

dropna(1) will drop the columns that contain any missing value.

In [14]:

df.dropna(1)

Out[14]:

	survived	pclass	sex	age	sibsp	parch	fare	class	who	adult_male	alive	alone
0	0	3	male	22	1	0	7.2500	Third	man	True	no	False
1	1	1	female	38	1	0	71.2833	First	woman	False	yes	False
2	1	3	female	26	0	0	7.9250	Third	woman	False	yes	True
3	1	1	female	35	1	0	53.1000	First	woman	False	yes	False
4	0	3	male	35	0	0	8.0500	Third	man	True	no	True
5	0	3	male	filling_missing	0	0	8.4583	Third	man	True	no	True
6	0	1	male	54	0	0	51.8625	First	man	True	no	True
...	...	...	...	...	...	...	...	...	...	...	...	...
884	0	3	male	25	0	0	7.0500	Third	man	True	no	True
885	0	3	female	39	0	5	29.1250	Third	woman	False	no	False
886	0	2	male	27	0	0	13.0000	Second	man	True	no	True
887	1	1	female	19	0	0	30.0000	First	woman	False	yes	True
888	0	3	female	filling_missing	1	2	23.4500	Third	woman	False	no	False
889	1	1	male	26	0	0	30.0000	First	man	True	yes	True
890	0	3	male	32	0	0	7.7500	Third	man	True	no	True

891 rows × 12 columns

dropna(how = "all") will drop the rows which have all their records as null or missing value.

In [15]:

df.dropna(how = "all")

Out[15]:

	survived	pclass	sex	age	sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22	1	0	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38	1	0	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26	0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35	1	0	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35	0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True
5	0	3	male	filling_missing	0	0	8.4583	Q	Third	man	True	NaN	Queenstown	no	True
6	0	1	male	54	0	0	51.8625	S	First	man	True	E	Southampton	no	True
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
884	0	3	male	25	0	0	7.0500	S	Third	man	True	NaN	Southampton	no	True
885	0	3	female	39	0	5	29.1250	Q	Third	woman	False	NaN	Queenstown	no	False
886	0	2	male	27	0	0	13.0000	S	Second	man	True	NaN	Southampton	no	True
887	1	1	female	19	0	0	30.0000	S	First	woman	False	B	Southampton	yes	True
888	0	3	female	filling_missing	1	2	23.4500	S	Third	woman	False	NaN	Southampton	no	False
889	1	1	male	26	0	0	30.0000	C	First	man	True	C	Cherbourg	yes	True
890	0	3	male	32	0	0	7.7500	Q	Third	man	True	NaN	Queenstown	no	True

891 rows × 15 columns

you can use subset to pass in which column(s) you wish to look at when discovering missing values. wheather you are using one column or number of columns you should always pass it as a list.

In [16]:

df.dropna(subset = ["age"])

Out[16]:

	survived	pclass	sex	age	sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22	1	0	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38	1	0	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26	0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35	1	0	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35	0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True
5	0	3	male	filling_missing	0	0	8.4583	Q	Third	man	True	NaN	Queenstown	no	True
6	0	1	male	54	0	0	51.8625	S	First	man	True	E	Southampton	no	True
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
884	0	3	male	25	0	0	7.0500	S	Third	man	True	NaN	Southampton	no	True
885	0	3	female	39	0	5	29.1250	Q	Third	woman	False	NaN	Queenstown	no	False
886	0	2	male	27	0	0	13.0000	S	Second	man	True	NaN	Southampton	no	True
887	1	1	female	19	0	0	30.0000	S	First	woman	False	B	Southampton	yes	True
888	0	3	female	filling_missing	1	2	23.4500	S	Third	woman	False	NaN	Southampton	no	False
889	1	1	male	26	0	0	30.0000	C	First	man	True	C	Cherbourg	yes	True
890	0	3	male	32	0	0	7.7500	Q	Third	man	True	NaN	Queenstown	no	True

891 rows × 15 columns

In [17]:

df.dropna(subset = ["age","deck"], how = "all")

Out[17]:

	survived	pclass	sex	age	sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22	1	0	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38	1	0	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26	0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35	1	0	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35	0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True
5	0	3	male	filling_missing	0	0	8.4583	Q	Third	man	True	NaN	Queenstown	no	True
6	0	1	male	54	0	0	51.8625	S	First	man	True	E	Southampton	no	True
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
884	0	3	male	25	0	0	7.0500	S	Third	man	True	NaN	Southampton	no	True
885	0	3	female	39	0	5	29.1250	Q	Third	woman	False	NaN	Queenstown	no	False
886	0	2	male	27	0	0	13.0000	S	Second	man	True	NaN	Southampton	no	True
887	1	1	female	19	0	0	30.0000	S	First	woman	False	B	Southampton	yes	True
888	0	3	female	filling_missing	1	2	23.4500	S	Third	woman	False	NaN	Southampton	no	False
889	1	1	male	26	0	0	30.0000	C	First	man	True	C	Cherbourg	yes	True
890	0	3	male	32	0	0	7.7500	Q	Third	man	True	NaN	Queenstown	no	True

891 rows × 15 columns

Not Null Values.¶

We learnt how to discover missing values using isna() and isnull() methods, both of these return a boolean.
If you are familiar with Python, you know that it's very easy to reverse a boolan value to show the opposite. you would simply use not key word, however since we are dealing with columns operation we can't use not isna().

in pandas to reverse a boolean or a method which produces a boolean that we wish to use as a filter we can use ~ as a prefix.

Example:

In [19]:

df[~df.age.isna()] # or df.loc[~df.age.isna()]]

Out[19]:

	survived	pclass	sex	age	sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22	1	0	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38	1	0	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26	0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35	1	0	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35	0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True
5	0	3	male	filling_missing	0	0	8.4583	Q	Third	man	True	NaN	Queenstown	no	True
6	0	1	male	54	0	0	51.8625	S	First	man	True	E	Southampton	no	True
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
884	0	3	male	25	0	0	7.0500	S	Third	man	True	NaN	Southampton	no	True
885	0	3	female	39	0	5	29.1250	Q	Third	woman	False	NaN	Queenstown	no	False
886	0	2	male	27	0	0	13.0000	S	Second	man	True	NaN	Southampton	no	True
887	1	1	female	19	0	0	30.0000	S	First	woman	False	B	Southampton	yes	True
888	0	3	female	filling_missing	1	2	23.4500	S	Third	woman	False	NaN	Southampton	no	False
889	1	1	male	26	0	0	30.0000	C	First	man	True	C	Cherbourg	yes	True
890	0	3	male	32	0	0	7.7500	Q	Third	man	True	NaN	Queenstown	no	True

891 rows × 15 columns

The function above will return all the non null values

But if that's too much work for you Pandas does provide notnull() and notna() methods.¶

In [20]:

df.age.notna()

Out[20]:

0      True
1      True
2      True
3      True
4      True
5      True
6      True
       ... 
884    True
885    True
886    True
887    True
888    True
889    True
890    True
Name: age, Length: 891, dtype: bool

In [21]:

df[df.age.notna()]

Out[21]:

	survived	pclass	sex	age	sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22	1	0	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38	1	0	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26	0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35	1	0	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35	0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True
5	0	3	male	filling_missing	0	0	8.4583	Q	Third	man	True	NaN	Queenstown	no	True
6	0	1	male	54	0	0	51.8625	S	First	man	True	E	Southampton	no	True
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
884	0	3	male	25	0	0	7.0500	S	Third	man	True	NaN	Southampton	no	True
885	0	3	female	39	0	5	29.1250	Q	Third	woman	False	NaN	Queenstown	no	False
886	0	2	male	27	0	0	13.0000	S	Second	man	True	NaN	Southampton	no	True
887	1	1	female	19	0	0	30.0000	S	First	woman	False	B	Southampton	yes	True
888	0	3	female	filling_missing	1	2	23.4500	S	Third	woman	False	NaN	Southampton	no	False
889	1	1	male	26	0	0	30.0000	C	First	man	True	C	Cherbourg	yes	True
890	0	3	male	32	0	0	7.7500	Q	Third	man	True	NaN	Queenstown	no	True

891 rows × 15 columns

Missing values¶

Removing Missing Values and filling Missing values¶

isna()¶

The sum method for isna can be very handy, I now know, the age column has 177 rows of missing values and the deck column has 688 rows of missing values etc...¶

Great, now we know we have missing values, what should we do with them?¶

Column Data Types and fillna() method.¶

It's worth remembering that when you do use fillna() you can pretty much pass in any value or symbol and it will replace the missing values with the value you passed in, ***except for Categorical columns***.¶

However you should be carefull because:¶

Categorical types and fillna():¶

Data Types and missing values example:¶

Deleting Missing Values¶

Examples of dropna()¶

Not Null Values.¶

But if that's too much work for you Pandas does provide notnull() and notna() methods.¶

It's worth remembering that when you do use fillna() you can pretty much pass in any value or symbol and it will replace the missing values with the value you passed in, *****except for Categorical columns*.¶