Wednesday, 8 January 2025

How to import, locate, load dataset and data preprocessing, formatting, normalization into pandas dataframe in data science.



Step 1: Import and install all dependencies.



Step 2: Search for dataset on kaggle.

 

 

Step 3: Click on All Dataset. it will show all the datasets available there.

 

 

Step 4: Just click on the dataset you required to analyze. it will show below output over there Download button is there just click on it. you will get the code which gives us the dataset directory path.

 

 

Step 5: Below is the code which shows the path of dataset directories as mentioned in step 4. Here we are locating the opensource data from the web.

 

 

Step 6: here we are storing the path inside dataset_dir variable. after that we joining that path with the data.csv file inside data_file variable and then we are displaying the complete path of that dataset on web.

 

 

Step 7: Now we are loading the dataset inside pandas data frame df with the function read_csv(data_file) here we are passing that path to this function. after that we are displaying the five rows data of that dataset.

 

 

Step 8: Check for missing values in dataset


Step 9: Check for missing values in dataset, now if there are rows in thousands we can not check it row by row so to check it overall we are summarizing it as follows


Step 10: To get some initial statistics of dataset we are using the describe function


Step 11: Provide variable descriptions. Types of variables etc. Summarize the types of variables by checking the data types of the variables in the data set.



Step 12: Check the dimensions of the data frame.It gives number of rows and columns present inside the data frame.

 
 
Step 13: If Variables are not in the correct data type, apply proper type conversion
 
 
 
Step 14: How to add extra or new column inside existing data frame. How to find length of each column and how to fill newly added column a random values in it

 
 
 
Step 15: How to change the column values with 0 and 1

 
 
Step 16: How to count the number of  0 and 1 in respective column.

 
 
Step 17: Turn categorical variables into quantitative variables in Python.

data structures and algorithms Web Developer

No comments:

Post a Comment