README.md 2.87 KB
Newer Older
Jayant Khatkar's avatar
Jayant Khatkar committed
1 2
# Dec-MCTS

Jayant Khatkar's avatar
Jayant Khatkar committed
3
Python implementation of [Dec-MCTS](https://journals.sagepub.com/doi/pdf/10.1177/0278364918755924)
4

Jayant Khatkar's avatar
Jayant Khatkar committed
5 6
WARNING: This repo is incomplete, see the TODO at the bottom of this file to see how you can contribute.

7 8 9 10 11 12 13 14 15
### Installation
```bash
pip install git+https://code.research.uts.edu.au/bigprint/pydecmcts.git
```

### Usage

```python
from DecMCTS import Tree
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
from copy import deepcopy

# Object stored at each node
class State:
    def __init__(self, act_seq, cum_sum):
        self.action_seq = act_seq
        self.cumulative_sum = cum_sum

# This calculates the object stored at a given node given parent node and action
def state_storer(data, parent_state, action):
    
    # Root Node edge case
    if parent_state == None:
        return State([],0) # This state is also used Null action when calculating local reward

    state = deepcopy(parent_state) # Make sure to deepcopy if editing parent node
    state.action_seq.append(action)
    state.cumulative_sum = state.cumulative_sum + action
    return state
35 36 37

# data can be anything required to calculate your
# global reward and available actions
Jayant Khatkar's avatar
Jayant Khatkar committed
38
# It can be in any format (only used by your reward and avail_actions functions)
39 40 41
data = {}

# Create an available actions function
42 43
# This returns a list of possible actions to take from a given state (from state_storer)
def avail_actions(data, state, robot_id):
44 45 46 47 48 49 50
    
    # This example is simply getting max sum, 
    # options are same regardless of state
    return [1,2,3,4,5]

# Create a reward function. This is the global reward given a list of
# actions taken by the current robot, and every other robot
51 52
# states is a dictionary with keys being robot IDs, and values
# are the object returned by the state_storer function you provide
53 54
def reward(dat, states):
    each_robot_sum= [states[robot].cumulative_sum for robot in states]
55 56 57 58 59 60
    return sum(each_robot_sum)

# Number of Action Sequences to communicate
comm_n = 5

# Create instances for each robot
Jayant Khatkar's avatar
Jayant Khatkar committed
61 62
tree1 = Tree(data, reward, avail_actions, state_storer, comm_n, 1) # Robot ID is 1
tree2 = Tree(data, reward, avail_actions, state_storer, comm_n, 2) # Robot ID is 2
63 64 65 66 67 68 69

for i in range(350):
    tree1.grow()
    tree2.grow()
    tree1.receive_comms(tree2.send_comms(), 2) #send comms message doesn't have ID in it
    tree2.receive_comms(tree2.send_comms(), 1) 
```
Jayant Khatkar's avatar
Jayant Khatkar committed
70 71 72 73 74 75 76 77

### TODO

- Simulation: 
    - Simulation has not been implemented since it was not required for bigprint. Currently the repo simply evaluates at each node immidiately.  
    - Desired Features: simulation with depth limit, simulation with cost limit, simulation until reward function can be calculated (i.e. simulation to completion).
- Action Sequence Distribution:
    - Currently the probability distribution of the Action sequences being communicated is not being calculated as described in the paper. The probability distribution is simply being set as proportional to the local reward function.