Proposal

2021-10-06

Predicting Movie Success using Analytics

Team Members

Rochisman Datta, Joe Palumbo, Parisa Davoodi, Manqiu Liu, Nicholas Gonzalez

Table of Contents
  1. Background
  2. Problem Definition
  3. Method
  4. Potential Results and Discussions
  5. Proposal Timeline
  6. Link to Proposal Video
  7. References

Background

Companies in the content distribution and creation space are always on the lookout for the next big hit. What if we told you that House of Cards, which made Netflix a household name, was informed by data analytics? Data suggested the pairing of David Fincher (director) and Kevin Spacey (actor) would “bring in big audiences”1.

A model which could predict the probability of a successful tv show/movie/game would be invaluable to any content creation company.

Problem Definition

The choice of movie components – director, genre, cast, run-time – determine the success of a movie. An improper combination of these factors could lead to box office flops. Despite its highly anticipated release, why did Blade Runner 2049 fall short of public expectations2?

We believe that a combination of features could be optimized to yield the highest movie rating and box office revenue. These choices could help drive the movie making process.

Method

We will use Principal Component Analysis (unsupervised) to investigate the most impactful features. We intend to use K-Means clustering or Gaussian Mixture Models to form clusters of similar movies to pin-point any underlying patterns.

Then, we will build a predictive model that determines the likelihood of success for a movie using SVM, Decision Trees, etc. to classify successful movies.

The dataset is a list of 5000 movies from IMDB. The data includes Cast, Director, Run-time, Genre, Release date, Spoken Languages, Overview, Revenue, Vote Count, Popularity, etc.

Potential Results and Discussions

Unsupervised Model

We anticipate that the unsupervised models will output clusters that we could analyze to glean similarities. We expect to get clusters with movies that are obviously similar (i.e., common actors, genres), but also other clusters that lack clear commonalities and require further exploration. We intend to use PCA for dimensionality reduction and consider only the most relevant features3.

Supervised Model

We believe that the classification model will yield a prediction of the success likelihood for a movie within a cluster, given its feature set. The predictive model should be able to take in a set of features and predict its probability of success – IMDB score and box office revenue. Based on our findings, we could experiment with different combinations (e.g. changing director or actor) and increase the success likelihood. Therefore, the model could be used to optimize future success – for example, maximizing success likelihood while maintaining a fixed budget by casting similar actors.

Proposal Timeline

  • Video for Proposal – Oct 3rd

    • PowerPoint – Parisa, Nick

    • Voice-over – Joe and Manqiu

    • Video Editing – Rochisman

  • Submit proposal – Oct 7th

    • Data Cleaning (Everyone) – Oct 15th

    • Feature Engineering (Parisa, Manqiu) – Oct 22nd

    • Model Creation and Implementation (Joe, Rochisman, Nick) – Oct 29th

    • First Draft of Mid-term report (Everyone)– Nov 7th

    • Final Draft of Mid-term report (Everyone)– Nov 14th

  • Mid-point report – Nov 16th

    • Tweaking and optimizing the models (Parisa, Nick, Joe) – Nov 23rd

    • Drawing conclusions based on findings (Rochisman Manqiu) – Dec 3rd

    • First draft of final report (Everyone) – Dec 5th

  • Final Project – Dec 7th

Follow this link to view our team’s proposal video.

References

  1. Davies, Aran. “How Big Data Helped Netflix Series House of Cards Become a Blockbuster?” Sofy.tv - Blog, 6 Sept. 2019, https://sofy.tv/blog/big-data-helped-netflix-series-house-cards-become-blockbuster/
  2. Brueggemann, Tom. “’Blade Runner’ Box Office Deja Vu AS ‘2049’ Starring Ryan Gosling Falls Short.” IndieWire, IndieWire, 8 Oct. 2017, https://www.indiewire.com/2017/10/box-office-blade-runner-2049-ryan-gosling-1201884827/
  3. Jolliffe, Ian T, and Jorge Cadima. “Principal component analysis: a review and recent developments.” Philosophical transactions. Series A, Mathematical, physical, and engineering sciences vol. 374,2065 (2016): 20150202. doi:10.1098/rsta.2015.0202