Strong RL

Action Spaces

Action Spaces (implemented in strong_rl.actions.actionspace.ActionSpace) define the possible set of actions that can be recommended to a target. At minimum, an action space will store and yield actions that represent possible actions for every target. However, in most cases, they will also have a .constrain() method that receives a target and removes certain actions from the action space so that the agent is restricted in its choice.

For example, we might define an action space of potential coupons to send:

class Coupon(Action):
    name = "Coupon"

    amount = DataField(FloatType(), False)

class CouponActionSpace(ActionSpace):
    actions = [
        (NullAction(),),
        (Coupon(amount=10),),
        (Coupon(amount=25),),
        (Coupon(amount=50),),
    ]

    def constrain(self, target):
        # restrict to coupons that are less than the target's lifetime revenue
        self.actions = [c for c in self.actions if c.null or c.amount < target.lifetime_revenue]

We have 4 potential actions that can be taken care: 3 coupons of increasing amounts or a NullAction() — i.e., not doing anything at all. (In Strong-RL, we specifically encode not doing something as doing nothing, see below).

Note that actions in action space are combined in tuples, allowing for the action space to yield “bundles” of actions that must be selected together. (In this case, each bundle has only a single action in it).

In this simple case, the action space is constrained based on a business rule that we don’t want to send a coupon to someone that is greater than their lifetime of spending with the company.

Actions

As can be seen above, actions inherit from the Action class (implemented in strong_rl.actions.action.Action) and specify the name of the action as well as the possible properties of the action in an action-specific schema.

Specific actions are instances of this class with values for those properties passed as kwargs. When written to the datalog, properties are written as JSON objects, so all properties must have JSON-serializable types (e.g., floats, integers, strings, lists, structs).

Null Actions

When an agent chooses to do nothing in Strong-RL, it explicitly records this decision by recommending a NullAction(). A null action does not have any properties. It simply says: we didn’t do anything.

We record null actions (as opposed to recording nothing at all) so we have an explicit record of each agent decision and, because in many real-world applications of reinforcement learning, we want to learn when doing nothing is actually the best thing to do.

Action Constraints

Action constraints are typically implemented via the action space (see the example above) and are a means of expanding or restricting the action space based on various business rules or other logic.

For simple constraints, it is typical to modify the action space using simple iteration and set logic in standard Python. However, in high-performance contexts with many constraints and a larger action space, this can cause performance issues. In these latter cases, we recommend conceptualizing the action space as a fixed-order, 1-dimensional numpy array where 1 represents a possible action and 0 represents an impossible action. Constraints can then mask certain indices on this vector via fast vector multiplication at significantly higher speeds in numpy than any standard Python data structures.