Skip to content
Snippets Groups Projects
Select Git revision
  • master
1 result

README.md

Blame
  • jonf's avatar
    jonf authored
    e65c4f31
    History
    Code owners
    Assign users and groups as approvers for specific file changes. Learn more.

    INSTRUCTIONS

    This readme contains information on how to use the scripts.

    Installing

    Both Windows, Mac and Linux can generate synthetic data using "GenerateData.py". The installation recommended is anaconda (a library containing python as well as many important packages).

    Only Linux can run auto-sklearn, and more information on installation can be found here.

    Matlab requires a user license (more info here).

    Generating synthetic data from a shell:

    • Creating data with default settings (can be seen inside script):
    python GenerateData.py
    • Creating data changing one property in loop
    for k in $(seq -20 5 20); do python GenerateData.py xSNR=$k ; done
    for k in $(seq 0 0.05 1); do python GenerateData.py theta=$k ; done
    • Creating data set(s) setting several properties in loop(s)
    for k in $(seq 0 0.05 1) do python GenerateData.py theta=$k S=3 ; done
    for k in $(seq -20 5 20); do for j in $(seq 0 0.05 1); do python GenerateData.py xSNR=$k theta=$j S=3 ; done ; done
    • Shell code used to generate synthetic (regression) data:
    for k in $(seq 0 0.1 1); do python GenerateData.py theta=$k xSNR=-7 S=10 zModel=cluster clsreg=regression; done
    for k in $(seq 0 0.1 1); do python GenerateData.py theta=$k xSNR=-7 S=10 zModel=spectrum clsreg=regression; done
    for k in $(seq -20 5 20); do python GenerateData.py xSNR=$k S=10 zModel=cluster clsreg=regression; done
    for k in $(seq -20 5 20); do python GenerateData.py xSNR=$k S=10 zModel=spectrum clsreg=regression; done

    The same is done for classification by changing "clsreg=regression" to "clsreg=classification".

    Python pipeline

    • Example of shell code used to model synthetic data using the DTU Compute cluster. All files starting with "X---" are looped through and being analyzed.
    for filename in data/regression/X---*; do qsub -q hpc -v file="$filename",tlftt="tlftt=60",es="es=1" ModelSyntheticICR.sh;done
    for filename in data/classification/X---*; do qsub -q hpc -v file="$filename",tlftt="tlftt=60",es="es=1" ModelSyntheticTRS.sh;done
    • Example of shell code used to model synthetic data using local machine. All files starting with "X---" are looped through and being analyzed.
    for filename in data/regression/X---*; do python ModelSyntheticICR.py file="$filename" tlftt=60 es=1;done
    for filename in data/classification/X---*; do python ModelSyntheticTRS.py file="$filename" tlftt=60 es=1;done

    Matlab pipeline

    • The functions RegressionModels.m and ClassificationModels.m train and test multiple supervised regression/classification models on the provided data. Missing data entries are imputed with median imputation (impute_median.m) or probabilistic principal component analysis (impute_ppca.m). The performance of the models are evaluated and visualized with calc_performance_regression.m / calc_performance_classification.m.