As mentioned in my previous article Introduction to Univariate, Bivariate and Multivariate Analysis, this article will dive a bit deeper into the different analysis. We will use a Kaggle dataset, Rain in Australia, to conduct our analysis.
Univariate analysis analyzes only one variable. The most common methods to conduct univariate analysis is to check for central tendency numerical variables and frequency distribution for categorical variables.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# changing display options to increase the number of columns and rows viewable
In the field of data, there is nothing more important than understanding the data that you are trying to analyze. In order to understand the data is it important to understand the purpose of the analysis because this will help you save time and dictate how to go about analyzing the data.
There are a lots of different tools, techniques and methods that can be used to conduct your analysis. You could use software libraries, visualization tools and statistic testing methods. However, this article will be an Introduction to Univariate, Bivariate and Multivariate analysis.
In the upcoming articles, I will…
Everyone knows Python is loved for its simplistic language. However, there are a lot of modules built around Python, making Python even simpler to use. Modules are basically Python functions that someone else wrote for you and works flawlessly. Knowing which module to use and when to use the modules makes programming simpler because you do not have to write lines of code to accomplish the same task. Nonetheless, you do need to know the logic behind the methods to implement them correctly.
One of the most popular module is the Collections module. Collections is a set of tools that…
Regardless of what we are learning or doing, we start with the basics, the foundation, the core. Then we learn more tools and techniques, skills and shortcuts — we learn to do things better and make it more “advanced”.
Today, we will be looking at another tool to “advance” our data visualization skills.
Plot is a free and open source graphing library that makes amazing interactive graphs. It does everything that Seaborn and Matplotlib does, but makes it more fun for the viewer.
Data: All the articles and author data are extracted directly from Towards Data Science from 1/1/2020 until…
At some point, the majority of Data Scientists have used or are familiar with the box plot. It is a simple graph that visually highlights outliers for data cleaning and EDA. However; sometimes, you may want a more informative graph without sacrificing the simplicity of a box plot.
Well, fear not, there is a solution.
Box plot illustrations are great to a certain extent. As mentioned earlier, if you want to quickly see the number of outliers, then box plot is easy to implement.
In this article, I will be using the Tips dataset from Kaggle.com.
Here are some box…
To talk about Big data, we first must talk about data. With the rapid technological advancements, data comes in all shapes and sizes. There is structured data, unstructured data, slightly structured data also known as Semi-structured data. Structured Data is fairly easy to manage, search, filter for data analysis due to its rows and columns structure and requires less storage. Whereas Unstructured Data is complex, has many different forms of database files, are hard to manage and requires more storage.
The reason Python is so popular among the Data Scientist is because of all the built-in libraries within Python.
In Data Science, effective data visualizations are key to communicate your findings. After having done a series of data cleaning and data analysis, one has to communicate the findings from that data analysis and is usually done through visual aids: graphs and charts.
“Visualizing information can give us a very quick solution to problems. We can get clarity or the answer to a simple problem very quickly.” — David McCandless