Action Spaces (implemented in
strong_rl.actions.actionspace.ActionSpace) define the possible
set of actions that can be recommended to a target. At minimum, an action space will store
and yield actions that represent
possible actions for every target. However, in most cases, they will also have a
that receives a target and removes certain actions from the action space so that the agent is
restricted in its choice.
For example, we might define an action space of potential coupons to send:
class Coupon(Action): name = "Coupon" amount = DataField(FloatType(), False) class CouponActionSpace(ActionSpace): actions = [ (NullAction(),), (Coupon(amount=10),), (Coupon(amount=25),), (Coupon(amount=50),), ] def constrain(self, target): # restrict to coupons that are less than the target's lifetime revenue self.actions = [c for c in self.actions if c.null or c.amount < target.lifetime_revenue]
We have 4 potential actions that can be taken care: 3 coupons of increasing amounts or a
NullAction() — i.e.,
not doing anything at all. (In Strong-RL, we specifically encode not doing something as doing nothing, see below).
Note that actions in action space are combined in tuples, allowing for the action space to yield “bundles” of actions that must be selected together. (In this case, each bundle has only a single action in it).
In this simple case, the action space is constrained based on a business rule that we don’t want to send a coupon to someone that is greater than their lifetime of spending with the company.
As can be seen above, actions inherit from the
Action class (implemented in
and specify the name of the action as well as the possible properties of the action in an action-specific schema.
Specific actions are instances of this class with values for those properties passed as
kwargs. When written to the datalog,
properties are written as JSON objects, so all properties must have JSON-serializable types (e.g., floats, integers, strings, lists, structs).
When an agent chooses to do nothing in Strong-RL, it explicitly records this decision by recommending a
A null action does not have any properties. It simply says: we didn’t do anything.
We record null actions (as opposed to recording nothing at all) so we have an explicit record of each agent decision and, because in many real-world applications of reinforcement learning, we want to learn when doing nothing is actually the best thing to do.
Action constraints are typically implemented via the action space (see the example above) and are a means of expanding or restricting the action space based on various business rules or other logic.
For simple constraints, it is typical to modify the action space using simple iteration and set logic in standard Python. However,
in high-performance contexts with many constraints and a larger action space, this can cause performance issues.
In these latter cases, we recommend conceptualizing the action space as a fixed-order, 1-dimensional
numpy array where 1 represents a possible
action and 0 represents an impossible action. Constraints can then mask certain indices on this vector via fast
vector multiplication at significantly higher speeds in
numpy than any standard Python data structures.