Skip to content
Snippets Groups Projects
Commit 4d29be70 authored by sorenmulli's avatar sorenmulli
Browse files
parents d79e748b f8b8c201
Branches
No related tags found
No related merge requests found
...@@ -6,13 +6,14 @@ ...@@ -6,13 +6,14 @@
\usepackage{texfiles/SpeedyGonzales} \usepackage{texfiles/SpeedyGonzales}
\usepackage{texfiles/MediocreMike} \usepackage{texfiles/MediocreMike}
\usepackage{booktabs}
% \geometry{top=1cm} % \geometry{top=1cm}
\title{Methods of Clustering and Outlier detection} \title{Methods of Clustering and Outlier detection}
\author{Oskar Eiler Wiese Christensen s183917, Anders Henriksen s183904, Søren Winkel Holm s183911} \author{Oskar Eiler Wiese Christensen s183917, Anders Henriksen s183904, Søren Winkel Holm s183911}
\date{\today} \date{\today}
\usepackage{booktabs}[H]
\pagestyle{plain} \pagestyle{plain}
\fancyhf{} \fancyhf{}
\rfoot{Page \thepage{} of \pageref{LastPage}} \rfoot{Page \thepage{} of \pageref{LastPage}}
...@@ -67,16 +68,32 @@ In this section, a clustering is performed using the Gaussian Mixture Model and ...@@ -67,16 +68,32 @@ In this section, a clustering is performed using the Gaussian Mixture Model and
\centering \centering
\includegraphics[width=\linewidth]{clusterfuck23} \includegraphics[width=\linewidth]{clusterfuck23}
\end{figure} \end{figure}
%[23487.56607812 18127.38559714 14899.35424052 14243.28471954
%13453.47497572 13109.79453539 12855.24139837 12576.10894894
%12178.15906456 12210.34021972 12243.54429977]
\noindent From the illustration, it is obvious that the most appropiate number of cluster i 9. Meanwhile, the negative log likelihood increases when the number of clusters is either increased or decreased from 9, which is to be expected.
\subsection{Evaluation of GMM and Hierarchical Clustering} \subsection{Evaluation of GMM and Hierarchical Clustering}
\textit{Evaluate the quality of the clusterings using GMM label information and for hierarchical clustering with the same number of clusters as in the GMM.} %\textit{Evaluate the quality of the clusterings using GMM label information and for hierarchical clustering with the same number of clusters as in the GMM.}
Hierachical:
Rand: 0.5328153158470461 To evaluate if the cluterings are similar to the premade clusterings of the tempo-attribute, three different similarities measures are used. These are the following: Rand index, Jaccard and NMI. The Rand Index similarity will typically be very high if there are many clusters. This is intuitively due to the fact that there is a lot of pairs of observations in different clusters rather than in the same cluster. This results in a Rand Index similarity close to one. Therefore the Jaccard index is also used as a similarity measure which disregard the pairs of observation in different cluster. The third measure is the normalized mutual information which is similar to both Jaccard and Rand Index. This similarity has a more theoretical background from information theory. It is based on quantifying the amount of information one cluster(s) provides of the other cluster. The evaluation of the GMM and Hierachical Clustering are illustrated in the following table.
Jaccard: 0.19233601406747694 \begin{table}[H]
NMI: 0.022295065087228606 \centering
GMM: \begin{tabular}{l l r l r }
Rand: 0.5231302775613633 \toprule
Jaccard: 0.12366747954105438 Similarity & GMM& & Hierachical & \\ \midrule
NMI: 0.018119058189570287 Rand & 0.5231 & & 0.5328 & \\
Jaccard & 0.1237 & & 0.1923 & \\
NMI & 0.0181 & & 0.0223 & \\
\bottomrule
\end{tabular}
\end{table} \noindent
\section{Outlier Detection/Anomaly Detection} \section{Outlier Detection/Anomaly Detection}
...@@ -132,11 +149,30 @@ In this part of the report, the data has to be binarized in order to use the Apr ...@@ -132,11 +149,30 @@ In this part of the report, the data has to be binarized in order to use the Apr
\subsection{Apriori Algorithm for Frequent Itemsets and Association Rules} \subsection{Apriori Algorithm for Frequent Itemsets and Association Rules}
\textit{Find the frequent itemsets and the association rules with high confidence based on the results of the Apriori algorithm.} \\ \textit{Find the frequent itemsets and the association rules with high confidence based on the results of the Apriori algorithm.} \\
Hej Per Parametre
Hej Per-parametre
\[ \[
\texttt{minsup} = 0.11\qquad \texttt{minconf} = 0.6 \texttt{minsup} = 0.11\qquad \texttt{minconf} = 0.6
\] \]
\begin{table}[H]
\centering
\begin{tabular}{l l l r r}
\toprule
Association Rule & & & Support & Confidence \\ \midrule
\{energy\_low\} & $\rightarrow$ & \{loudness\_low\} & 0.226 & 0.674 \\
\{loudness\_low\} & $\rightarrow$ & \{energy\_low\} & 0.226 & 0.676 \\
\{energy\_high\} & $\rightarrow$ & \{loudness\_high\} & 0.210 & 0.634 \\
\{loudness\_high\} & $\rightarrow$ & \{energy\_high\} & 0.210 & 0.631 \\ \\
\{acousticness\_high, energy\_low\} & $\rightarrow$ & \{loudness\_low\} & 0.142 & 0.740\\
\{loudness\_low, acousticness\_high\} & $\rightarrow$ & \{energy\_low\} & 0.142 & 0.842 \\
\{loudness\_low, energy\_low\} & $\rightarrow$ & \{acousticness\_high\} & 0.142 & 0.631 \\ \\
\{valence\_low, energy\_low\} & $\rightarrow$ & \{loudness\_low\} & 0.112 & 0.708 \\
\{loudness\_low, valence\_low\} & $\rightarrow$ & \{energy\_low\} & 0.112 & 0.833 \\ \bottomrule
\end{tabular}
\end{table}
%{energy_q1} -> {loudness_q1} (supp: 0.226, conf: 0.674) %{energy_q1} -> {loudness_q1} (supp: 0.226, conf: 0.674)
%{loudness_q1} -> {energy_q1} (supp: 0.226, conf: 0.676) %{loudness_q1} -> {energy_q1} (supp: 0.226, conf: 0.676)
%{energy_q3} -> {loudness_q3} (supp: 0.210, conf: 0.634) %{energy_q3} -> {loudness_q3} (supp: 0.210, conf: 0.634)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment