Merge branch 'master' of https://lab.compute.dtu.dk/s183917/ml_data

4d29be70 · sorenmulli · d79e748b · f8b8c201 · 4d29be70
Commit 4d29be70 authored Nov 28, 2019 by sorenmulli
--- a/docs/report/tex/report3.tex
+++ b/docs/report/tex/report3.tex
@@ -6,13 +6,14 @@
 \usepackage{texfiles/SpeedyGonzales}
 \usepackage{texfiles/MediocreMike}
+\usepackage{booktabs}
 % \geometry{top=1cm}
 \title{Methods of Clustering and Outlier detection}
 \author{Oskar Eiler Wiese Christensen s183917, Anders Henriksen s183904, Søren Winkel Holm s183911}
 \date{\today}
+\usepackage{booktabs}[H]
 \pagestyle{plain}
 \fancyhf{}
 \rfoot{Page \thepage{} of \pageref{LastPage}}
@@ -67,16 +68,32 @@ In this section, a clustering is performed using the Gaussian Mixture Model and
 	\centering 
 	\includegraphics[width=\linewidth]{clusterfuck23}
 \end{figure}
+%[23487.56607812 18127.38559714 14899.35424052 14243.28471954
+%13453.47497572 13109.79453539 12855.24139837 12576.10894894
+%12178.15906456 12210.34021972 12243.54429977]
+\noindent From the illustration, it is obvious that the most appropiate number of cluster i 9. Meanwhile, the negative log likelihood increases when the number of clusters is either increased or decreased from 9, which is to be expected.
 \subsection{Evaluation of GMM and Hierarchical Clustering}
-\textit{Evaluate the quality of the clusterings using GMM label information and for hierarchical clustering with the same number of clusters as in the GMM.}
+%\textit{Evaluate the quality of the clusterings using GMM label information and for hierarchical clustering with the same number of clusters as in the GMM.} 
-Hierachical:
-Rand: 0.5328153158470461
+To evaluate if the cluterings are similar to the premade clusterings of the tempo-attribute, three different similarities measures are used. These are the following: Rand index, Jaccard and NMI. The Rand Index similarity will typically be very high if there are many clusters. This is intuitively due to the fact that there is a lot of pairs of observations in different clusters rather than in the same cluster. This results in a Rand Index similarity close to one. Therefore the Jaccard index is also used as a similarity measure which disregard the pairs of observation in different cluster. The third measure is the normalized mutual information which is similar to both Jaccard and Rand Index. This similarity has a more theoretical background from information theory. It is based on quantifying the amount of information one cluster(s) provides of the other cluster. The evaluation of the GMM and Hierachical Clustering are illustrated in the following table. 
-Jaccard: 0.19233601406747694
+\begin{table}[H]
-NMI: 0.022295065087228606
+	\centering
-GMM:
+	\begin{tabular}{l l r l r }
-Rand: 0.5231302775613633
+		\toprule
-Jaccard: 0.12366747954105438
+		Similarity & GMM&  & Hierachical &  \\ \midrule
-NMI: 0.018119058189570287
+		Rand   & 0.5231  &     & 0.5328   &  \\   
+		Jaccard   &  0.1237  &     & 0.1923     &  \\  
+		NMI   &  0.0181  &    &  0.0223    &   \\  
+		\bottomrule
+	\end{tabular}
+\end{table} \noindent
 \section{Outlier Detection/Anomaly Detection}
@@ -132,11 +149,30 @@ In this part of the report, the data has to be binarized in order to use the Apr
 \subsection{Apriori Algorithm for Frequent Itemsets and Association Rules}
 \textit{Find the frequent itemsets and the association rules with high confidence based on the results of the Apriori algorithm.} \\
-Hej Per Parametre
+Hej Per-parametre
 \[
 \texttt{minsup} = 	0.11\qquad \texttt{minconf} = 0.6
 \]
+	\begin{table}[H]
+		\centering
+		\begin{tabular}{l l l r r}
+			\toprule
+			Association Rule &  &  & Support & Confidence \\ \midrule
+			\{energy\_low\} & $\rightarrow$ & \{loudness\_low\}   & 0.226   & 0.674  \\   
+			\{loudness\_low\} & $\rightarrow$ & \{energy\_low\}   & 0.226    & 0.676  \\  
+			\{energy\_high\} & $\rightarrow$ & \{loudness\_high\}   & 0.210    & 0.634 \\  
+			\{loudness\_high\} & $\rightarrow$ & \{energy\_high\}   & 0.210    & 0.631 \\  \\ 
+			\{acousticness\_high, energy\_low\} & $\rightarrow$ & \{loudness\_low\}   &  0.142    & 0.740\\   
+			\{loudness\_low, acousticness\_high\} & $\rightarrow$ & \{energy\_low\}   &  0.142    & 0.842  \\
+			\{loudness\_low, energy\_low\} & $\rightarrow$ & \{acousticness\_high\}   &  0.142    & 0.631  \\ \\
+			\{valence\_low, energy\_low\} & $\rightarrow$ & \{loudness\_low\}   &  0.112    & 0.708  \\
+			\{loudness\_low, valence\_low\} & $\rightarrow$ & \{energy\_low\}   &  0.112    & 0.833  \\ \bottomrule
+		\end{tabular}
+	\end{table}
 %{energy_q1} -> {loudness_q1}  (supp: 0.226, conf: 0.674)
 %{loudness_q1} -> {energy_q1}  (supp: 0.226, conf: 0.676)
 %{energy_q3} -> {loudness_q3}  (supp: 0.210, conf: 0.634)