Experiment Tracking

Currently, we support MLFlow as a backend for experiment tracking but provide a uniform interface for easily integrating additional backends, such as Chariot training v2 when it's ready. We adopt MLFlow terminiology which has the following notions:

Run A single execution of a training script
Experiment A collection of runs

Setup

Using radops config setup you can configure the experiment tracking backend by providing the following information:

MLFlow server URL
MLFlow username
MLFlow password

Usage

The basic usage is as follows:

from radops.tracking.mlflow import MLFlowRun

# creates or gets (if already exists) a run associated to an instance.
# `name` can be ommitted in which case a random name will be generated
run = MLFlowRun(experiment_name="my experiment", name="run name")

# log parameters (e.g. hyperparameters) one at a time or in bulk
run.log_param("param name", "param value")
run.log_params({"param name 1": "param value 1", "param name 2": "param value 2"})

# log metrics associated to a step one at a time or in bulk
run.log_metric("metric name", 0.5, step=3)
run.log_metrics({"metric name 1": 0.5, "metric name 2": 0.6}, step=7)

# mark the run as finished
run.end()

# retrive parameters
run.get_params()

# retrieve metrics
run.get_metrics()