A Look into Freaky Franchise’s Rotten Tomato Competition

7 min readJul 4, 2021

Podcasts are all the rage, especially in these weird times where many people have more free time than they did in the past. Today we are going to look at some gathered data from one particular podcast, Freaky Franchise where they “unmask horror movies based on quantity over quality.” I strongly suggest checking Freaky Franchise out if you are into horror movies.

The first part of the episode the two hosts have a friendly competition where they guess the Rotten Tomato scores of the movies they are discussing, the loser having to sum the movie up in under a minute. In this post, we are going to look data surrounding this competition and clean the data set to be used in the future run in a model.

First, we need to load in the data set and see what we are working with

import pandas as pd
ff_data = pd.read_csv('Freaky_Franchise_data.csv')
ff_data

Now let’s take a look if there is any missing data.

ff_data.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 75 entries, 0 to 74
Data columns (total 10 columns):
#                       72 non-null object
Episode Title           75 non-null object
Date Aired              72 non-null object
Cordie                  63 non-null object
Theo                    63 non-null object
Difference in scores    62 non-null object
RT Score                62 non-null object
Goes First              60 non-null object
Winner                  60 non-null object
Notes                   9 non-null object
dtypes: object(10)
memory usage: 6.0+ KB

From this summary we can see a few things we will have to do to the DataFrame before we start using it to create statistical information.

First see that we can reset the index.
It seems like there is a second table on the bottom that we should remove before continuing
We can see that Cordie, Theo, the Difference in scores, and the RT scores are listed as objects while we will need them as floats or integers
In the same vein of above, we may want to convert Date Aired to DateTime.
We also see that there are some null values that we will have to deal with

Scrubbing the data for modeling

First we are going to drop the extra table on the bottom.

# Dropping the rows without an index (episode number) by telling pandas to just keep rows that
# the episode number is not empty.
ff_data = ff_data[ff_data['#'].notna()]
ff_data.tail()

First, we can set the index to the episode number

In [4]:

ff_data.set_index("#", inplace=True)
ff_data

Let’s look at the null values and decide what to do with them

ff_data.isnull().sum()
Episode Title            0
Date Aired               0
Cordie                  12
Theo                    12
Difference in scores    12
RT Score                12
Goes First              12
Winner                  12
Notes                   63
dtype: int64

4 of these columns have the same amount of null values. This could be a coincidence or the null values could be in the same row. We should look deeper into that since it could help us decide how we deal with the null values.

# First we are going to just look at rows that have null values
ff_data[ff_data.isnull().any(axis=1)]

# This produced more rows than we wanted. We want to see if the 12 in are the same
# To check this we are going to create a new df without notes
no_notes = ff_data.copy()
no_notes.drop(labels='Notes', axis=1, inplace=True)
no_notes.head()

Run the same code again with no_notes to see all rows with null values

no_notes[no_notes.isnull().any(axis=1)]

We can see that like we suspected, the 12 null values all fall on the same rows. These episodes are mostly retrospectives and specials which we can guess (and I can confirm from listening to them) did not include the competition. Since the main thing we are looking at in this blog is the Rotten Tomato competition, we can safely drop these rows without loss of data.

Using the same method we used to remove the extra table. Since the null values fall across the row, we just need to choose one column.

ff_data = ff_data[ff_data['Cordie'].notna()]
ff_data.head()

Let’s look at ff_data.info() again to check it worked

ff_data.info()<class 'pandas.core.frame.DataFrame'>
Index: 60 entries, 1 to 71
Data columns (total 9 columns):
Episode Title           60 non-null object
Date Aired              60 non-null object
Cordie                  60 non-null object
Theo                    60 non-null object
Difference in scores    60 non-null object
RT Score                60 non-null object
Goes First              60 non-null object
Winner                  60 non-null object
Notes                   9 non-null object
dtypes: object(9)
memory usage: 4.7+ KB

Now that we have the data we will be working with, we need to convert it into a format we can work with

Using a for loop we will convert all into float. First, create a list of the column names that we need to convert

columns = ['Cordie', 'Theo', 'Difference in scores', 'RT Score']# Use a for loop to loop through columns to convert any columns that can be into floats
for x in columns:
    ff_data[x] = pd.to_numeric(ff_data[x], errors='coerce')ff_data.info()<class 'pandas.core.frame.DataFrame'>
Index: 60 entries, 1 to 71
Data columns (total 9 columns):
Episode Title           60 non-null object
Date Aired              60 non-null object
Cordie                  60 non-null int64
Theo                    59 non-null float64
Difference in scores    59 non-null float64
RT Score                59 non-null float64
Goes First              60 non-null object
Winner                  60 non-null object
Notes                   9 non-null object
dtypes: float64(3), int64(1), object(5)
memory usage: 4.7+ KB

Here we can see that ‘Theo’, ‘Difference in scores’, and ‘RT Score’ have one less non-null object than before. That mean most likely there was a non-number filler which we converted to a null value when we coerced the errors. Seeing that, we will need to check for null values again and decide what to do with them.

Checking again for nulls using the same method as above

no_notes = ff_data.copy()
no_notes.drop(labels='Notes', axis=1, inplace=True)
no_notes[no_notes.isnull().any(axis=1)]

It looks like there is one episode where Theo’s guess is not listed and thus the difference is not listed and another episode where no Rotten Tomato Score is listed. Both of these episodes have winners so we shouldn’t get drop them right out. Since it is just three null values, we are going replace the null values with probable answers using the other data we have.

In [14]: # Since Cordie won the Sleepaway Camp IV with a guess of zero and simple search, I found that the movie does not have a RT score so we will replace the null with a 0

ff_data['RT Score'] = ff_data['RT Score'].fillna(0)

For Jason Lives, we know Theo wins so we will fill it with with the RT Score. Then fill difference with the difference between it and Cordie’s guess.

ff_data['Theo'] = ff_data['Theo'].fillna(ff_data['RT Score'])
ff_data['Difference in scores'] = ff_data['Difference in scores'].fillna(
                                   abs(ff_data['Cordie'] - ff_data['Theo']))

Check for nulls once again.

ff_data.info()<class 'pandas.core.frame.DataFrame'>
Index: 60 entries, 1 to 71
Data columns (total 9 columns):
Episode Title           60 non-null object
Date Aired              60 non-null object
Cordie                  60 non-null int64
Theo                    60 non-null float64
Difference in scores    60 non-null float64
RT Score                60 non-null float64
Goes First              60 non-null object
Winner                  60 non-null object
Notes                   9 non-null object
dtypes: float64(3), int64(1), object(5)
memory usage: 4.7+ KB

One last thing we will do before we start running test and models is create boolean columns of who went first and who won using one-hot encoding.

ff_data.columns = ff_data.columns.str.replace(' ', '_')
ff_data.head()

We are just going to keep the columns with data that will affect the model

feats = ['Cordie','Theo','Difference_in_scores','RT_Score','Goes_First', 'Winner']
ff_data = ff_data[feats]
ff_data = pd.get_dummies(ff_data, drop_first=True)
ff_data

Now the data is cleaned and ready to have models run on it. We will tackle run various model on it at a later date.