Sign in

For this exercise, we are going to use a Macbeth transcript from Project Gutenberg to learn how to group common traits within a string and create a plot that shows the 20 most used words in Macbeth. This will involve learning how to count and sort using python.

We will pull the manuscript using the Python package requests. We will also be using numpy and matplotlib.pyplot so we will import those as well.

Now that we have imported the package, we can pull the manuscript and save it…

In this blog, we are going to look at preprocessing data in order to get it ready for modeling. For this example we are going to use the Chicago Traffic Crash Database.

2 Load the Data/Filtering for Chosen Zip-codes

First we import libraries we will need to load, process, and plot our data.

We load the data, ensure it loaded correctly, and take a quick look at .info() to see what we will be working with in this dataframe.

We will start by getting the Ames Housing dataset from and saving it as ames.csv so we can inout it into a pandas dataframe using pandas read_csv().

Save it as a dataframe to be used throughout this notebook.

Now that we have it loaded and saved as a dataframe, let’s look at the first five rows to see what will be working with:

5 rows × 81 columns

As we can see there are more columns then what will show up…

Podcasts are all the rage, especially in these weird times where many people have more free time than they did in the past. Today we are going to look at some gathered data from one particular podcast, Freaky Franchise where they “unmask horror movies based on quantity over quality.” I strongly suggest checking Freaky Franchise out if you are into horror movies.

The first part of the episode the two hosts have a friendly competition where they guess the Rotten Tomato scores of the movies they are discussing, the loser having to sum the movie up in under a minute…

Dealing missing data is an important step. Missing data can cause issues when trying to run models, visualizing the data, calculating summary statistics, or even when trying to convert the data type.

In order to deal with missing data, we first need to know how to find it. There are a few different approaches to dealing with the missing data.

Null Values

NaN, short for “not a number,” or null values are probably the easiest missing values to detect. Pandas has built in ways to check for NaNs.

The code above returns a matrix of boolean values where if a…

Graphs are a powerful tools that can be used to display information in an easy to disgust graphic without scaring people with extensive data and numbers. They are used in every aspect of life to support points and ideas, but deciding which graph to use can be difficult. We are going to look at just a few of the graphs that can be easily called with Seaborn and discuss how the different graphs can have different interpretations of the same data.

Seaborn is useful and extensive. We will just be scraping the edge of what is possible through Seaborn. …

In this post we will go through using the tmdb api to gather film information and save it in a dataframe. For more information check out the documentation here.

The first thing needed is to obtain a api key. Anyone with a tmdb account can obtain a key from their account setting page.

Now that you have an api key, it is time to download tmdbsimple. Tmdbsimple is a wrapper that simplifies the code needed to access the information in the api. To learn more about tmbdsimple check out the github repo here. …

Over the last year, I have become a huge fan of functionalizing code for ease of access and lessening repetition. In this post we are going to go over the basics principles of functions in python. To learn more about functions you can visit w3school here.

A function is defined in python by the “def” keyword followed by the name of the function, parentheses, and a colon. Just as when you create loops and if statements, the lines within your function are indented.

Below is a very simple function that prints a statement.

Now if we ever want to run…


Data analyst with experience in web scraping, SQL, data modeling, and machine learning.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store