\noindent From the illustration, it is obvious that the most appropiate number of cluster i 9. Meanwhile, the negative log likelihood increases when the number of clusters is either increased or decreased from 9, which is to be expected.
\subsection{Evaluation of GMM and Hierarchical Clustering}
\subsection{Evaluation of GMM and Hierarchical Clustering}
\textit{Evaluate the quality of the clusterings using GMM label information and for hierarchical clustering with the same number of clusters as in the GMM.}
%\textit{Evaluate the quality of the clusterings using GMM label information and for hierarchical clustering with the same number of clusters as in the GMM.}
Hierachical:
Rand: 0.5328153158470461
To evaluate if the cluterings are similar to the premade clusterings of the tempo-attribute, three different similarities measures are used. These are the following: Rand index, Jaccard and NMI. The Rand Index similarity will typically be very high if there are many clusters. This is intuitively due to the fact that there is a lot of pairs of observations in different clusters rather than in the same cluster. This results in a Rand Index similarity close to one. Therefore the Jaccard index is also used as a similarity measure which disregard the pairs of observation in different cluster. The third measure is the normalized mutual information which is similar to both Jaccard and Rand Index. This similarity has a more theoretical background from information theory. It is based on quantifying the amount of information one cluster(s) provides of the other cluster. The evaluation of the GMM and Hierachical Clustering are illustrated in the following table.
Jaccard: 0.19233601406747694
\begin{table}[H]
NMI: 0.022295065087228606
\centering
GMM:
\begin{tabular}{l l r l r }
Rand: 0.5231302775613633
\toprule
Jaccard: 0.12366747954105438
Similarity & GMM&& Hierachical &\\\midrule
NMI: 0.018119058189570287
Rand & 0.5231 && 0.5328 &\\
Jaccard & 0.1237 && 0.1923 &\\
NMI & 0.0181 && 0.0223 &\\
\bottomrule
\end{tabular}
\end{table}\noindent
\section{Outlier Detection/Anomaly Detection}
\section{Outlier Detection/Anomaly Detection}
...
@@ -132,11 +149,30 @@ In this part of the report, the data has to be binarized in order to use the Apr
...
@@ -132,11 +149,30 @@ In this part of the report, the data has to be binarized in order to use the Apr
\subsection{Apriori Algorithm for Frequent Itemsets and Association Rules}
\subsection{Apriori Algorithm for Frequent Itemsets and Association Rules}
\textit{Find the frequent itemsets and the association rules with high confidence based on the results of the Apriori algorithm.}\\
\textit{Find the frequent itemsets and the association rules with high confidence based on the results of the Apriori algorithm.}\\
Hej Per Parametre
Hej Per-parametre
\[
\[
\texttt{minsup}=0.11\qquad\texttt{minconf}=0.6
\texttt{minsup}=0.11\qquad\texttt{minconf}=0.6
\]
\]
\begin{table}[H]
\centering
\begin{tabular}{l l l r r}
\toprule
Association Rule &&& Support & Confidence \\\midrule