standardo

615d64ea · sorenmulli · bef0f715 · 615d64ea
Commit 615d64ea authored 5 years ago by sorenmulli
--- a/docs/report/tex/report3.tex
+++ b/docs/report/tex/report3.tex
@@ -40,7 +40,7 @@ The data set consists of 2017 Spotify songs downloaded from Spotify's API by a K
 	Tempo,
 	Valence,
 \\}
-In the first half of the report, \textit{tempo} is used as a target variable such that only nine variables are considered.
+In the first half of the report, \textit{tempo} is used as a target variable such that only nine variables are considered. These nine variables are standardized by subtracting mean and dividing by mean error.

 \section{Clustering: Are songs grouped by their tempo?}
 Before working with the clustering, tempo is taken out as this was shown to be a variable which explained a relatively high amount of variance in the first report and thus is suspected to account for some clustering in the data. The songs are then thresholded into four groups: Those with a tempo under 90 bpm, those between 90-100 bpm, those between 100-110 bpm and those with a tempo over 110 bpm.
@@ -135,7 +135,7 @@ This suggest that the clusters of the GMM and Hierachical models are mutually mo

 \section{Outlier Detection}

-To understand whether some songs in this 2017 large data set comprised of an arbitrary selection of songs stand out from the others, three methods for outlier detection are implemented.
+To understand whether some songs in this 2017 large data set comprised of an arbitrary selection of songs stand out from the others, three methods for outlier detection are implemented. The target variable \textit{Tempo} is reintroduced and the attributes are standardized before analysis.
 \subsection{Ranking songs after typicality}
 %\textit{Rank the observations in terms of leave-one-out Gaussian Kernel Density, KNN Density %and KNN Average Relative Density}