algorithm - Continous action-state-space and tiling -


after getting used q-learning algorithm in discrete action-state-space expand continous spaces. read chapter on-policy control approximation of sutton´s introduction. here, usage of differentiable functions linear function or ann recommended solve problem of continous action-state-space. nevertheless sutton discribes tiling method maps continous variables onto discrete presentation. necessary?

trying understand methods tried implement example of hill climbing car in book without tiling method , linear base function q. state space 2 dimensional, , action 1 dimensional used 3 dimensional weight vector w in equation: enter image description here

when try choose action maximize output, obvious answer a=1, if w_2 > 0. therefore, weight converge positive 0 , agent not learn useful. sutton able solve problem using tiling wondering if problem caused absence of tiling method or if doing else wrong. so: tiling necessary?

regarding main question tiling, answer no, not necessary using tiling.

as tried, it's idea implement easy example hill climbing car in order understand concepts. here, however, misundertanding important. when book talks linear methods, refering linear in parameters, means can extract set of (non linear) features , combine them linearly. kind of approximators can represent functions more complex standard linear regression.

the parametrization have proposed it's not able represent non-linear q function. taking account in hill climbing problem want learn q-functions of style:

enter image description here

you need more powefull enter image description here. easy solution problem use radial basis function (rbf) network. in case, use set of features (or bf, example gaussians functions) map state space:

enter image description here

additionally, if action space discrete , small, easiest solution maintain independent rbf network each action. selecting action, compute q value each action , select 1 higher value. in way avoid (complex) optimization problem of selecting best action in continuous function.

you can find more detailed explanation on busoniu et al. book reinforcement learning , dynamic programming using function approximators, pages 49-51. it's available free here.


Comments

Popular posts from this blog

python - Selenium remoteWebDriver (& SauceLabs) Firefox moseMoveTo action exception -

html - How to custom Bootstrap grid height? -

transpose - Maple isnt executing function but prints function term -