Select Git revision

jonf authored
Code owners
Assign users and groups as approvers for specific file changes. Learn more.
README.md 2.97 KiB
INSTRUCTIONS
This readme contains information on how to use the scripts.
Installing
Both Windows, Mac and Linux can generate synthetic data using "GenerateData.py". The installation recommended is anaconda (a library containing python as well as many important packages).
Only Linux can run auto-sklearn, and more information on installation can be found here.
Matlab requires a user license (more info here).
Generating synthetic data from a shell:
- Creating data with default settings (can be seen inside script):
python GenerateData.py
- Creating data changing one property in loop
for k in $(seq -20 5 20); do python GenerateData.py xSNR=$k ; done
for k in $(seq 0 0.05 1); do python GenerateData.py theta=$k ; done
- Creating data set(s) setting several properties in loop(s)
for k in $(seq 0 0.05 1) do python GenerateData.py theta=$k S=3 ; done
for k in $(seq -20 5 20); do for j in $(seq 0 0.05 1); do python GenerateData.py xSNR=$k theta=$j S=3 ; done ; done
- Shell code used to generate synthetic (regression) data:
for k in $(seq 0 0.1 1); do python GenerateData.py theta=$k xSNR=-7 S=10 zModel=cluster clsreg=regression; done
for k in $(seq 0 0.1 1); do python GenerateData.py theta=$k xSNR=-7 S=10 zModel=spectrum clsreg=regression; done
for k in $(seq -20 5 20); do python GenerateData.py xSNR=$k S=10 zModel=cluster clsreg=regression; done
for k in $(seq -20 5 20); do python GenerateData.py xSNR=$k S=10 zModel=spectrum clsreg=regression; done
The same is done for classification by changing "clsreg=regression" to "clsreg=classification".
Python pipeline
- Example of shell code used to model synthetic data using the DTU Compute cluster. All files starting with "X---" are looped through and being analyzed.
for filename in data/regression/X---*; do qsub -q hpc -v file="$filename",tlftt="tlftt=60",es="es=1" ModelSyntheticICR.sh;done
for filename in data/classification/X---*; do qsub -q hpc -v file="$filename",tlftt="tlftt=60",es="es=1" ModelSyntheticTRS.sh;done
- Example of shell code used to model synthetic data using local machine. All files starting with "X---" are looped through and being analyzed.
for filename in data/regression/X---*; do python ModelSyntheticICR.py file="$filename" tlftt=60 es=1;done
for filename in data/classification/X---*; do python ModelSyntheticTRS.py file="$filename" tlftt=60 es=1;done
Matlab pipeline
- The functions RegressionModels.m and ClassificationModels.m train and test multiple supervised regression/classification models on the provided data. Missing data entries are imputed with median imputation (impute_median.m) or probabilistic principal component analysis (impute_ppca.m). The performance of the models are evaluated and visualized with calc_performance_regression.m / calc_performance_classification.m.