Strong RL


Strong-RL’s labs were built to enable simpler and faster means of experimentation of reinforcement learning-based applications.

Traditionally, reinforcement learning algorithms are developed outside of the framework in which they will be deployed. Researchers build an experimental dataset, test various approaches and algorithms, and then a team of engineers “productionize” those algorithms in a new software application.

This traditional process is error-prone. On the one hand, researchers may develop algorithms that exploit data that won’t be available in the production environment, or make assumptions that won’t scale/generalize appropriately. On the other hand, engineers may make mistakes translating the researchers’ algorithms, re-creating the same kinds of data and subsequent transformations the researchers had available to them, and generally introducing bugs that could only be found by actually training and evaluating a reinforcement learning agent.

To avoid these pitfalls, Strong-RL encourages development and experimentation in the same application by allowing for the easy configuration of “environments” in which to run your agent (see Environments) and through the use of these “labs” tools, described below. In short, the labs allows you to create frozen, offline datasets generated in the exact way your data will be generated in production and allows rapid iteration of agents over those datasets in the exact way your agents will learn and recommend actions in production.

If you are successful in the labs, you will be successful in production.

Creating a Labs Dataset

To create a labs dataset (, you should have your application wired to each of the components (including a Datalog, Datamodeler, Targeter, and Actor) and all of your events and models fully-specified. Ultimately, you will have a single object holding the entirety of your configured application (traditionally, in a variable called app).

With your app in hand, you simply import and pass this application to along with specifying the interval by which you want to advance time and the end_date to which you want to advance. (The start date is determined by your app.config.earliest_date setting).

A complete example of creating a labs dataset:

import datetime
from import create_dataset

from import build_app

app = build_app(mode='live')

# create a callback that saves our dataset incrementally
ds_saver = lambda dataset:'~/path/to/data.rld.gz')

# create and save a frozen dataset from daily data
dataset = create_dataset(app=app, interval=datetime.timedelta(days=1),, callbacks=(ds_saver,))

# iterate through dataset in batch-wise fashion
for interval_end, t, a, t1 in dataset.batches():
    print("Number of targets in batch: {}".format(len(t)))

This dataset includes target and observed action data from each interval, as batched by the actor. By default, it will save these data in a single file when .save() is called on the dataset; however, if in_memory=False, data will be saved incrementally in separate files (one per batch) to avoid having to maintain the entire dataset in memory.

Simulating an Agent on a Dataset

With a dataset in hand, you can now take any agent and simulate them over this dataset to build what we call a labs resultset (strong_rl.labs.simulation.SimulationResultset).

Like datasets, resultsets store data from each target at each interval along with their respective observed actions. Moreover, resultsets store agent recommendations, observed reward, and target-prime data — allowing for a complete analysis and introspection of your agent.

A complete example of simulating an agent over a dataset:

import datetime
from import load_dataset
from strong_rl.labs.simulation import simulate_historically

from import build_app

app = build_app(mode='live')

# load a saved dataset:
dataset = load_dataset(app=app, '~/path/to/data.rld.gz')

# simulate and gather results from the agent(s) in our actor's agentset,
# with a callback showing the size of the agent's memory buffer
buffer_size = lambda agent, resultset: print("Buffer size: {}".format(len(agent.memory.buffer)))
resultset = simulate_historically(dataset=dataset,, callbacks=(buffer_size, ))'~/path/to/simdata.rlsd.gz')

# gather the resultset to a Pandas dataframe for analysis
df = resultset.dataframe()

# iterate through your resultset to analyze in batch-wise fashion:
for interval_end, t, ra, a, t1, data in resultset.results():
    print("Number of targets in batch: {}".format(len(t)))