The kernel density estimation is a way to approximate the probability density function of a random variable in a non-parametric way. In the case of the spotify data-set the fitted GMM is a multivariate normal distribution due to the number of features in the date-set. The fitted GMM is then evaluated on the songs in order to calculate their individual density scores. An outlier in this model would then have a low density score, meaning the probability that a song fits into any of the clusters made by the GMM is low. The then lowest density score sogns are illustrated in a bar chart plot below.
The kernel density estimation is a way to approximate the probability density function of a random variable in a non-parametric way. In the case of the spotify data-set the fitted GMM is a multivariate normal distribution due to the number of features in the date-set. The fitted GMM is then evaluated on the songs in order to calculate their individual density scores. An outlier in this model would then have a low density score, meaning the probability that a song fits into any of the clusters made by the GMM is low. The lowest density score songs are illustrated in bar chart \ref{GMchart}.
\begin{figure}[H]
\label{GMchart}
\centering
\includegraphics[width=\linewidth]{out_KDE}
\end{figure}
The k-neighbor estimation detects which objects deviate from normal behavior. First, the inverse distance density estimation is calculated through the following expression,
\caption{text}
\end{figure}\noindent
The k-neighbor estimation detects which objects deviate from normal behavior. Firstly, the data is fitted to a KNN-model, with K-clusters. Then, the inverse distance density estimation is calculated through the following expression,
Where $ x^\prime\in N_{\mathbf{x \backslash i}(\mathbf{x_i, K})}$ is the nearest K observations to $ x_i $ that are not $ x_i $.
% TODO find a real K in an appropiate way.
If the inverse density score of a specific song is low, the more likely it is to be an outlier. Therefore, the songs with the lowest inverse distance density estimation are illustrated in bar chart \ref{invest}.
\begin{figure}[H]
\label{invest}
\centering
\includegraphics[width=\linewidth]{out_KDE}
\end{figure}
\caption{text}
\end{figure}\noindent
Another anomaly detection tool is the relative density. The same KNN-model that is fitted to the data with $ K=9$ is used. Furthermore, the relative density can be calculated with the following expression,
If the $\mathrm{ard} < 1$ then the specific songs is likely to be an outlier. The songs with lowest relative density score are illustrated in \ref{relden}.