The least squares method with Despacito

stats con chris
2022-03-27
0

...

I. Introduction: Data

In the article: "Linear extrapolation to predict the future of Despacito,” we predict the date when Despacito becomes the most viewed video on YouTube. Here we will deepen this study by sharing all the mathematical calculations, which are based on the least squares method. Understanding the origins of this method requires an understanding of differential calculus. In case you don’t know calculus, I suggest you jump to page 4 where we directly apply the method. The numerical steps were programmed in Python (Jupyter Notebook). To reproduce the results you can download the file from my repository @Github.

In the article we show the results of the 5 most viewed songs on Youtube. In this tutorial, to simplify the analysis, we focus on Despacito and See You Again. As a first step, we collect data regarding the number of views on Youtube for each song. We do it every 24 hours from July 4th to July 9th. 6 values are obtained, as shown in the following table:

Table I: Total number of views (million).
Day
($x$)
Despacito
($y$)
See You Again
($z$)
4 2310 2872.8
5 2328 2875.9
6 2348 2879.0
7 2369 2882.2
8 2393 2885.7
9 2415 2889.0

Our goal is to find the equation of the line that best fits the values given in Table I. Let's focus on Despacito ($d$)... The equation of the line of Despacito is given by,

$$y_{d} = m_d x + b_d. \tag 1$$

This line will not necessarily pass through all 6 values given in Table I because the real data does not describe a perfect line, so there will be an error. If $y$ corresponds to the real data and $y_d$ corresponds to the result obtained from the linear equation, then the error will be given by the difference, $y_d-y$. However, since an error as such can give positive and negative values, it will be convenient to square it, i.e.,

$$\begin{align}\xi &=\sum_i^n (y_{d_i}-y_i)^2, \\ & = \sum_i^n (m_{d} x_i+ b_{d} – y_i)^2. \tag 2\end{align}$$

$\xi$ represents the total error, and is given by summing up the error over all the possible points, that is, $n=6$ for the case given in Table I. Ideally, we want $\xi$ to be zero, but as we just said that is not possible, so what we are looking for is to find the minimum error and since this error is given to the square, that is why we call it the method of least squares. In our case, if the equation of the line is defined with variables $m_d$ and $b_d$, then we have to find the values of these variables such that they grant the least possible value for $\xi$. Mathematically, considering differential calculus, these values are obtained by finding the first derivatives of $\xi$ and equating them to zero, i.e.,

$$\begin{align}\frac{\partial \xi}{\partial m_{d}}=0, ~~~ \frac{\partial \xi}{\partial b_{d}}=0.\tag 3 \end{align}$$

The resolution of Eq. (3) is described on the next page.

Views: 1 Github

stats con chris

A writer who learned to add

A writer who learned to add

Notifications

Receive the new articles in your email

2022 © nepy

One single breath

Configure

Choose your own style

Color

Choose the perfect color mode for you


Navigation Position

Select a suitable navigation system


Vertical Navbar Style

Switch between styles for your vertical navbar

Customize