ver wondered how production houses decide which movie to invest in? What comes to your mind? Is it the cast or the director? No!! Let us tell you, it is the SCRIPT which is the determinative factor. Sounds fair, isn't it? Now think of all those production houses which have to evaluate multiple scripts on daily basis by not only reading the storyline but also understanding if there is any inspiration from another movie. Sigh!!
Screenplay, role and behavior of characters, locations, interactions, concepts and actions define the similarity in movies– A DS team at InnovatorsBay
We designed, scrapped, redesigned, iterated, re-iterated and EUREKA!!
At InnovatorsBay, we have made this job easy with our Document Mixture algorithm, which is based on advanced analytical methods including machine learning, text analysis, statistical techniques and graph theory. An ensembled algorithm utilizing mathematics, business and technology, has been developed and verified on Hollywood movie scripts.
The initial methodology was simple and included standard NLP techniques and semantic analytics working coherently to solve the problem for us. But processing 15 movies proved the solution wrong and demanded much more complex solution.
This failure lead to further discussions and intensive research on movie scripts which included understanding of the script and screenplay. We watched multiple movies to get deeper insight on how our brain understands similarity and manually tagged them for inspirations from another movie or a set of movies. The study of scripts, a lot of EDA, Gephi and Cytoscape visual analysis helped us in making the factor map and hypothesis. Multiple iterations on various pre-processing techniques, word vector representations as well as the Mathematics behind comparison logic landed us on an optimized algorithm that was successfully tested on hollywood movie scripts.