Reading CSV files into a DataFrame

In [1]:
#import pandas
import pandas as pd
In [2]:
data = pd.read_csv("mushrooms.csv")
In [3]:
## now we have our data
data.head()
Out[3]:
class cap-shape cap-surface cap-color bruises odor gill-attachment gill-spacing gill-size gill-color ... stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring veil-type veil-color ring-number ring-type spore-print-color population habitat
0 p x s n t p f c n k ... s w w p w o p k s u
1 e x s y t a f c b k ... s w w p w o p n n g
2 e b s w t l f c b n ... s w w p w o p n n m
3 p x y w t p f c n n ... s w w p w o p k s u
4 e x s g f n f w b k ... s w w p w o e n a g

5 rows × 23 columns

NoteIf your file is not in your current directory you will need to pass in the files directories path as an argument to read_csv

In [10]:
# Example where file is in a different directory
data = pd.read_csv(r"C:\Users\Mostafa\My Files\Python\Kaggle\Mushrooms\mushrooms.csv")
In [11]:
data.head()
Out[11]:
class cap-shape cap-surface cap-color bruises odor gill-attachment gill-spacing gill-size gill-color ... stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring veil-type veil-color ring-number ring-type spore-print-color population habitat
0 p x s n t p f c n k ... s w w p w o p k s u
1 e x s y t a f c b k ... s w w p w o p n n g
2 e b s w t l f c b n ... s w w p w o p n n m
3 p x y w t p f c n n ... s w w p w o p k s u
4 e x s g f n f w b k ... s w w p w o e n a g

5 rows × 23 columns

In [ ]:

Headers

Your data may or may not have headers by default this method persumes that your data does have headers(column names) and it's the very first row of your data.

of course you can adjust this..let's look at some examples

in the example below I put header = None which means my data does not have column names, so it puts the column names in the data and assigns numerical values for your column names,

In [13]:
pd.read_csv("mushrooms.csv", header=None)
Out[13]:
0 1 2 3 4 5 6 7 8 9 ... 13 14 15 16 17 18 19 20 21 22
0 class cap-shape cap-surface cap-color bruises odor gill-attachment gill-spacing gill-size gill-color ... stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring veil-type veil-color ring-number ring-type spore-print-color population habitat
1 p x s n t p f c n k ... s w w p w o p k s u
2 e x s y t a f c b k ... s w w p w o p n n g
3 e b s w t l f c b n ... s w w p w o p n n m
4 p x y w t p f c n n ... s w w p w o p k s u
5 e x s g f n f w b k ... s w w p w o e n a g
6 e x y y t a f c b n ... s w w p w o p k n g
7 e b s w t a f c b g ... s w w p w o p k n m
8 e b y w t l f c b n ... s w w p w o p n s m
9 p x y w t p f c n p ... s w w p w o p k v g
10 e b s y t a f c b g ... s w w p w o p k s m
11 e x y y t l f c b g ... s w w p w o p n n g
12 e x y y t a f c b n ... s w w p w o p k s m
13 e b s y t a f c b w ... s w w p w o p n s g
14 p x y w t p f c n k ... s w w p w o p n v u
15 e x f n f n f w b n ... f w w p w o e k a g
16 e s f g f n f c n k ... s w w p w o p n y u
17 e f f w f n f w b k ... s w w p w o e n a g
18 p x s n t p f c n n ... s w w p w o p k s g
19 p x y w t p f c n n ... s w w p w o p n s u
20 p x s n t p f c n k ... s w w p w o p n s u
21 e b s y t a f c b k ... s w w p w o p n s m
22 p x y n t p f c n n ... s w w p w o p n v g
23 e b y y t l f c b k ... s w w p w o p n s m
24 e b y w t a f c b w ... s w w p w o p n n m
25 e b s w t l f c b g ... s w w p w o p k s m
26 p f s w t p f c n n ... s w w p w o p n v g
27 e x y y t a f c b n ... s w w p w o p n n m
28 e x y w t l f c b w ... s w w p w o p n n m
29 e f f n f n f c n k ... s w w p w o p k y u
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
8095 e b s g f n f w b g ... s w w p w t p w n g
8096 p x y c f m f c b y ... y c c p w n n w c d
8097 e k f w f n f w b w ... s w w p w t p w n g
8098 p k y n f s f c n b ... k p p p w o e w v l
8099 p k s e f y f c n b ... k w p p w o e w v d
8100 e k f w f n f w b w ... k w w p w t p w s g
8101 e f s n f n a c b o ... s o o p n o p b v l
8102 p k s e f s f c n b ... s p w p w o e w v p
8103 e x s n f n a c b y ... s o o p n o p n c l
8104 e k s n f n a c b y ... s o o p n o p o c l
8105 e k s n f n a c b y ... s o o p o o p n v l
8106 e k s n f n a c b y ... s o o p n o p y v l
8107 e k s n f n a c b o ... s o o p o o p n v l
8108 e x s n f n a c b y ... s o o p o o p n c l
8109 p k y e f y f c n b ... s p w p w o e w v l
8110 e b s w f n f w b w ... s w w p w t p w n g
8111 e x s n f n a c b o ... s o o p o o p n v l
8112 e k s w f n f w b p ... s w w p w t p w n g
8113 e k s n f n a c b o ... s o o p n o p b v l
8114 p k y e f y f c n b ... k p p p w o e w v d
8115 p f y c f m a c b y ... y c c p w n n w c d
8116 e x s n f n a c b y ... s o o p o o p o v l
8117 p k y n f s f c n b ... k p w p w o e w v l
8118 p k s e f y f c n b ... s p w p w o e w v d
8119 p k y n f f f c n b ... s p w p w o e w v d
8120 e k s n f n a c b y ... s o o p o o p b c l
8121 e x s n f n a c b y ... s o o p n o p b v l
8122 e f s n f n a c b n ... s o o p o o p b c l
8123 p k y n f y f c n b ... k w w p w o e w v l
8124 e x s n f n a c b y ... s o o p o o p o c l

8125 rows × 23 columns

Separators whilst reading the file

CSV file is seporated by a comma , so you don't need to use the sep parameter if your reading a csv file. But if your data is seporated by something else such ; , tab or anything you can still make use of read_csv() method and let the method know how your data is seporated so it knows how to delimit your dataset.

In [ ]:
#example
data = pd.read_csv("my_data.txt", sep=";")

#or if its seporated by a tab, just put 8 spaces
data = pd.read_csv("my_data.txt", sep="        ")

#or maybe your seporated by something completley random symbol such as ^, just put that symbol
data = pd.read_csv("my_data.txt", sep="^")
In [ ]:

In [ ]: