Tuesday, 14 January 2025
Monday, 13 January 2025
Wednesday, 8 January 2025
How to import, locate, load dataset and data preprocessing, formatting, normalization into pandas dataframe in data science.
Step 1: Import and install all dependencies.
Step 2: Search for dataset on kaggle.
Step 3: Click on All Dataset. it will show all the datasets available there.
Step 4: Just click on the dataset you required to analyze. it will show below output over there Download button is there just click on it. you will get the code which gives us the dataset directory path.
Step 5: Below is the code which shows the path of dataset directories as mentioned in step 4. Here we are locating the opensource data from the web.
Step 6: here we are storing the path inside dataset_dir variable. after that we joining that path with the data.csv file inside data_file variable and then we are displaying the complete path of that dataset on web.
Step 7: Now we are loading the dataset inside pandas data frame df with the function read_csv(data_file) here we are passing that path to this function. after that we are displaying the five rows data of that dataset.
Step 8: Check for missing values in dataset
Step 9: Check for missing values in dataset, now if there are rows in thousands we can not check it row by row so to check it overall we are summarizing it as follows
Step 10: To get some initial statistics of dataset we are using the describe function
Step 11: Provide variable descriptions. Types of variables etc. Summarize the types of variables by checking the data types of the variables in the data set.