Tiago Silva
November 29, 2020
Linear regressions are one of the most basic and effective tools to produce forecasts. All one needs is a set of historical data as a base to identify trends or patterns that lead to a formula that can be later used to forecast or predict values. For instance, a very simple example of linear regression is the relationship between height and age in children. It is possible to build a very accurate forecast of how tall a boy or a girl will be at a given age based on historical data. The process to build this forecast is simple, if historical data about age and height is available all that is needed is to find a line that minimizes the distance between these data points and the line itself.
As seen on the graph, multiple samples of children’s age and respective height were drawn in a scatter plot, then a pattern is visible - children are taller as they grow up. The dotted line represents that pattern. It’s a line that covers most cases minimizing the distance between each data point and the line itself. The line called regression line can be transformed into an equation. With that, a prediction can be made: if a child is X months old, he or she should be Y feet tall (within a margin of error) according to the historical data available.
In this case, the regression line is a straight line, but if we stretch the graph to include not only children but also adults, maybe the best representation is a curve instead of a straight line as people stop growing at some point, however the exercise of finding a pattern and trying to describe it with an equation is still the same.
Linear regressions like this are very useful to create forecasts, prediction models or simply see trends and patterns and are used by organizations around the world. Most companies and organizations are interested in knowing the future impacts of their actions and the Hollywood movie industry is not an exception. All major studios are interested to know how well a movie will perform to make critical decisions such as how much money should be invested in advertising aimed for the DVD and Blu-ray market or how much should be invested in merchandise. A very simple predictor can be made by relating box office sales or theatre tickets sold with DVD and Blu-ray unit sales of previous movies, very much like the height/age regression. This simple approach worked very well until the recent past for companies like Disney, where a very strong relationship between tickets sold and DVD and Blu-Ray units sold existed, making this simple linear regression a very accurate predictor of the market.
But nowadays maybe relating just box office sales and units sold is not enough to build a good predictor. Especially now that the market and technology are moving faster offering more options than DVD and Blu-ray in home entertainment. Streaming, video-on-demand and renting on platforms like iTunes and Amazon Prime are now into play with more options appearing each year, making even more difficult to use historical data, because in some cases it simply doesn’t exist. Moreover, other factors that are not directly measurable may now have a bigger role as part of a sales forecast. Variables like user opinion, movie genre and ratings, target-audience are increasingly more important nowadays when compared to just a few years ago.
Social media and platforms like IMDb have opened the door to different variables that heavily influence sales forecasts in ways that were not possible in the recent past and in non-quantifiable magnitudes. Although a movie may have done well in terms of box office revenue, a social media personality interested in giving movie reviews may think and say otherwise and influence a big part of the potential market. Professional critics may have mixed reviews about a movie that is in exhibition but the general opinion of movie goers expressed on the web is unanimous saying the movie is worth seeing. These dynamics can create true phenomena like the movie “The Room” of 2003 that had a very poor box office performance but a huge success in the subsequent years in the home market and even theater re-runs many years after its premier, completely unpredictable by any kind of regression formula solely based on ticket sales. “The Room” may be probably the most famous outlier, but there are other examples that confirm the importance of the role of these non-quantifiable variables. “Frozen” is also a good example of a movie that had a great performance in box office and an even better performance in merchandise sales very much influenced by difficult to quantify dynamics in social media and consumer behavior. In both cases, using a simple linear regression based on box office results would fail to predict future sales of DVD, Blu-ray, rentals and merchandise.
Although not directly measurable, these variables can be transformed and indirectly calculated to be used as any other numeric value in forecast models that can be a bit more complex than a linear regression but based on the same principles of using historical data to predict the future. IMDb compiles all the user reviews about a movie into a single numeric rate. The reach of a social media post about a movie can also be calculated and even differentiated by the kind of public reached. Trending topics in Twitter about a movie title can be also calculated. In any case, there is always a way of transforming these non-numeric values into numbers, scales or ratings.
Major studios already take into account non-conventional variables in order to build forecasts for home market sales and merchandise sales. Also, the decision about launching and producing new movies are more and more influenced by non-conventional variables and are a target of these forecasting models. New and hard to quantify variables like word-of-mouth recommendations or the influence that children have on the purchase patterns of their parents are added into the mix more frequently, building not only richer and more accurate prediction models but also adding non-conventional variables into models that not so long ago used only direct numerical variables expressed in easy to understand units like items sold or money. These new approaches reflect the changing dynamics of the market and consumer behavior that are hardly represented by a simple linear regression based on two numeric variables and the importance that consumers' dynamics and interests have for movie studios in order to deliver better experiences to the public.