Reinforcement learning of causal variables using mediation analysis
Installation & setup
First, install pipenv in your preferred way (see documentation).
Then, create a virtual environment for the project and install the needed packages into it by running pipenv install
in the project root folder.
Some scripts log data using Neptune. This requires either setting up a free account, or for testing purposes using the anonymous API token.
To use the anonymous API, set the environment variable NEPTUNE_API_TOKEN='ANONYMOUS'
, and pass --neptune_project 'shared/onboarding'
as an argument to the script.
If you instead set up a Neptune user, set the API key to your personal key, and pass a personal project name to the script. When a script is run, it will print a URL to the online Neptune UI to standard output.
Running scripts for the twostage experiment:
It is recommended to start with this experiment due to the relatively shorter training time.
The main way to interact with the code is through scripts located in the scripts/
folder. To run a script in the pipenv environment previously created, run pipenv run python -m scripts.experiment --arg val
where experiment
is the name of the script, and arguments can be passed as --arg val
.
Running scripts for the doorkey experiment
The doorkey experiment is divided into three phases:
- Train the policy (
scripts/doorkey_pia
) - Train the causal variable (
scripts/doorkey_phi
) - Train (
scripts/doorkey_pib
) In addition, it is possible to specify a cross-entropy target for the second script so as to obtain
The scripts were run on a cluster in parallel, however, since this implementation is specific to the cluster (machine names, directories, etc.), we will include a simplified version of the scripts which execute locally. The workflow is similar:
- Load a resource dictionary
- Load saved models/recorded statistics from disk
- Run model for a number of episodes (typically ~1000). This step avoids tying up the same machine for exessive time but is not necesary
- Save model and exit
- Repeat
Citing
@article{herlau20,
title={Reinforcement Learning of Causal Variables using Mediation Analysis},
author={Herlau, Tue and Larsen, Rasmus},
journal={36th AAAI conference on artificial intelligence},
year={2022}
}