Skip to main content
Fig. 1 | Applied Network Science

Fig. 1

From: Selective network discovery via deep reinforcement learning on embedded spaces

Fig. 1

Illustration of estimation of cumulative reward of current state \(s=s_0\) over a horizon of length \(h=3\), and discount factor \(\gamma =0.5\). The current state s is comprised of 3 types of nodes: unknown (grey), target-nodes (red), non-target nodes (black); red nodes represent the node type we would like to discover. The figure shows an instantiation of policy \(\pi\), starting at state s, corresponding to a path of length \(h=3\). We calculate the cumulative discounted reward of state s based on taking action \(a_1\) at \(t=0\) and following the highlighted path as follows: \(Q(s,a_1)=\gamma ^t*r_{t+1}+\gamma ^{t+1}*r_{t+2}+\gamma ^{t+2}*r_{t+3}= 1*0+1/2*0+1/4*1=1/4\)

Back to article page