4 Player Tic Tac Toe

colosseumrl.envs.tictactoe.tictactoe_4p_env.PLAYER_NUM_TO_STRING = {-1: '.', 0: 'X', 1: 'O', 2: 'Y', 3: 'Z'}
colosseumrl.envs.tictactoe.tictactoe_4p_env.State

alias of builtins.object

class colosseumrl.envs.tictactoe.tictactoe_4p_env.TicTacToe4PlayerEnv(config: str = '')[source]

Bases: colosseumrl.BaseEnvironment.BaseEnvironment

Full TicTacToe 4Player environment class with access to the actual game state.

current_rewards(state: object) → List[float][source]

Returns current reward for each player (in absolute order, not relative to any specific player

Parameters

state (object) – The current state to calculate rewards from

Returns

rewards – A vector containing the current rewards for each player

Return type

List[float]

static deserialize_state(serialized_state: bytearray) → object[source]

Convert a serialized bytearray back into a game state.

Parameters

serialized_state (bytearray) – state bytearray to be deserialized

Returns

deserialized_state – deserialized state

Return type

object

is_valid_action(state: object, player_num: int, action: str) → bool[source]

Returns True if an action is valid for a specific player and state.

Parameters
  • state (object) – The current state to execute a game step from.

  • player_num (int) – The player that would be executing the action.

  • action (str) – The action in question

Returns

is_action_valid – whether this action is valid

Return type

bool

Notes

This method does not keep track of who’s turn it is. That is up to the user. If a piece may be physically placed at the location suggest by the action, this method returns true, regardless of who just executed their turn or who should be going now.

property max_players

Property holding the max number of players present for a game.

(Always 4)

property min_players

Property holding the number of players present required to play the game.

(Always 4)

new_state(self) → object[source]

Create a fresh TicTacToe 3Player board state for a new game.

Returns

  • new_state (object) – A state for the new game.

  • new_players (List[int]) – List of players who’s turn it is in this new state.

Notes

States are arbitrary internal game logic types. In a normal use case, there is no need to access or modifying individual data in a state.

States are not in a format intended to be consumable for a reinforcement learning agent. Reinforcement leaning agents are intended to take observations as input, and state_to_observation() can be used to convert states into observations.

next_state(state: object, players: List[int], actions: List[str]) → Tuple[object, List[int], List[float], bool, Optional[List[int]]][source]

Perform a game step from a given state.

Parameters
  • state (object) – The current state to execute a game step from.

  • players (List[int]) – The players who’s turn it is and are executing actions. For TicTacToe, only one player should ever be passed in this list at a time.

  • actions (List[str],) – The actions to be executed by the players who’s turn it is. For TicTacToe, only one action should ever be passed in this list at a time.

Returns

  • next_state (object) – The new state resulting after the game step.

  • next_players (List[int]) – The new players who’s turn it is after the game step. For TicTacToe, this will always only be one player.

  • rewards (List[float]) – Rewards for the players who’s turn it was. For TicTacToe, this will always only be one reward for the single player that execute the action.

  • terminal (bool) – Whether the game is now over.

  • winners (Union[List[int], None]) – The players that won the game if it is over, else None.

Notes

States are arbitrary internal game logic types. In a normal use case, there is no need to access or modifying individual data in a state.

States are not in a format intended to be consumable for a reinforcement learning agent. Reinforcement leaning agents are intended to take observations as input, and state_to_observation can be used to convert states into observations.

static observation_names()[source]

Get the names for each key in an observation dictionary.

Returns

observation_names

Return type

List[int]

property observation_shape

Property holding the numpy array shapes for each value in an observation dictionary.

static serializable() → bool[source]

Whether or not this class supports state serialization.

(This always returns True for TicTacToe)

Returns

is_serializable – True

Return type

bool

static serialize_state(state: object) → bytearray[source]

Serialize a game state and convert it to a bytearray to be saved or sent over a network.

Parameters

state (object) – state to be serialized

Returns

serialized_state – serialized state

Return type

bytearray

state_to_observation(state: object, player: int) → Dict[str, numpy.ndarray][source]

Convert the raw game state to a consumable observation for a specific player agent.

Parameters
  • state (object) – The state to create an observation for

  • player (int) – The player who is intended to view the observation

Returns

observation – The observation for the player RL agent to view

Return type

Dict[str, np.ndarray]

Notes

Observations are specific to individual players. Every observation is presented as if the player intended to receive it were actually player 0. This is done so that an RL agent only has to learn to perform moves that make player 0 win and other players lose.

valid_actions(state: object, player: int) → List[str][source]

Valid actions for a specific state and player. If there are no valid actions, empty string is given to represent a no-op

Parameters
  • state (object) – The current state to execute a game step from.

  • player (int) – The player for which valid actions will be returned.

Returns

valid_actions – A list of valid action strings which the player may execute.

Return type

list[int]

Notes

Players must always choose actions included in this list. If no actions are valid for a player, this function returns an empty string. When it is a player’s turn, if the player has no valid actions, it must pass an empty string as its action for next_state() for the game to continue.

This method does not keep track of who’s turn it is. That is up to the user. If the specified player can physically place a piece at a location, it will be returned as a valid action.

colosseumrl.envs.tictactoe.tictactoe_4p_env.WINNING_SHAPES = [array([[[1]], [[1]], [[1]]], dtype=int8), array([[[1], [1], [1]]], dtype=int8), array([[[1, 1, 1]]], dtype=int8), array([[[1], [0], [0]], [[0], [1], [0]], [[0], [0], [1]]], dtype=int8), array([[[0], [0], [1]], [[0], [1], [0]], [[1], [0], [0]]], dtype=int8), array([[[1, 0, 0], [0, 1, 0], [0, 0, 1]]], dtype=int8), array([[[0, 0, 1], [0, 1, 0], [1, 0, 0]]], dtype=int8), array([[[1, 0, 0]], [[0, 1, 0]], [[0, 0, 1]]], dtype=int8), array([[[0, 0, 1]], [[0, 1, 0]], [[1, 0, 0]]], dtype=int8), array([[[1., 0., 0.], [0., 0., 0.], [0., 0., 0.]], [[0., 0., 0.], [0., 1., 0.], [0., 0., 0.]], [[0., 0., 0.], [0., 0., 0.], [0., 0., 1.]]]), array([[[0., 0., 0.], [0., 0., 0.], [0., 0., 1.]], [[0., 0., 0.], [0., 1., 0.], [0., 0., 0.]], [[1., 0., 0.], [0., 0., 0.], [0., 0., 0.]]]), array([[[0., 0., 0.], [0., 0., 0.], [1., 0., 0.]], [[0., 0., 0.], [0., 1., 0.], [0., 0., 0.]], [[0., 0., 1.], [0., 0., 0.], [0., 0., 0.]]]), array([[[0., 0., 1.], [0., 0., 0.], [0., 0., 0.]], [[0., 0., 0.], [0., 1., 0.], [0., 0., 0.]], [[0., 0., 0.], [0., 0., 0.], [1., 0., 0.]]])]
colosseumrl.envs.tictactoe.tictactoe_4p_env.action_to_string(index: Tuple[int, int]) → str[source]

Convert an action index into a formatted action string.

Parameters

index (Tuple[int, int]) – The location where the piece will be placed in the action.

Returns

action_string

Return type

str

colosseumrl.envs.tictactoe.tictactoe_4p_env.print_board(state: object)[source]

Print board to console

Parameters

state (object) – The state to render

Notes

X marks player 0. O marks player 1. Y marks player 2. Z marks player 3.

colosseumrl.envs.tictactoe.tictactoe_4p_env.string_to_action(action_str: str) → Optional[Tuple[int, int]][source]

Convert a formatted action string into an index.

Parameters

action_str (str) – The action in string format

Returns

index – The location where the piece will be placed in the action.

Return type

Tuple[int, int]