Graphs are a powerful tools that can be used to display information in an easy to disgust graphic without scaring people with extensive data and numbers. They are used in every aspect of life to support points and ideas, but deciding which graph to use can be difficult. We are going to look at just a few of the graphs that can be easily called with Seaborn and discuss how the different graphs can have different interpretations of the same data.
Seaborn is useful and extensive. We will just be scraping the edge of what is possible through Seaborn. To learn more about all Seaborn can offer, check out their official documentation here.
First we will import seaborn.
import seaborn as sns
Next we will set our figures style and theme. We will be using the default ‘darkgrid.’ Then import the dataset which we will be using. Seaborn comes preloaded with many different dataset. To see more sets available with Seaborn, check out their Data repository at Github.
sns.set()mpg = sns.load_dataset('mpg')
We will take a quick look at the dataset to make sure it loaded in correctly and to see what we will be working with.
Now that we have our dataset loaded in, let’s start looking at how we can represent this data using various graphs.
# Create a subplot so we can see the four graph in one figure
fig, axs = plt.subplots(figsize=(12,12), ncols=2, nrows=2, )# Create the graphs using Seaborn, setting title so we know which graph is which
sns.boxplot(x= 'origin', y='mpg', data=mpg, ax=axs[0,0]).set_title('Box Plot')
sns.violinplot(x= 'origin', y='mpg', data=mpg, ax=axs[0,1]).set_title('Violin Plot')
sns.barplot(x= 'origin', y='mpg', data=mpg, ax=axs[1,0]).set_title('Bar Plot')
sns.countplot(x= 'origin', data=mpg, ax=axs[1,1]).set_title('Count Plot')
These first few plots are good for looking at quantitative values versus a non-quantitative value. In our example we examined the miles per gallon (quantitative) versus country of origin (qualitative). Looking at these box plot, it is pretty clear that American cars seem to have the worst mpg on average. However, if we look at the Count Plot, we can see that the USA has significantly more data points which could suggest that the information is skewed. This is one reason it is important to look at various graphs before deciding what to present to ensure the best representation of the data.
We will now look at a different set of graphs that focus more on quantitative variables.
# Create a subplot so we can see the graphs side by side
fig, axs = plt.subplots(figsize=(15,12), ncols=2, nrows=2)# Use Seaborn to create the graphs
sns.scatterplot(x= 'weight', y='mpg', data=mpg, ax=axs[0,0]).set_title('Scatter')
sns.regplot(x= 'weight', y='mpg', data=mpg, ax=axs[0,1]).set_title('Linear Regression')
sns.distplot(mpg['mpg'], ax=axs[1,0]).set_title('Distribution of mpg')
sns.kdeplot(mpg['weight'], mpg['mpg'], ax=axs[1,1]).set_title('kdeplot')
The graphs above show us a bit more detail about