python - Tensorflow: How to get gradients per instance in a batch? -
i'm looking @ policy gradients sample in notebook: https://github.com/ageron/handson-ml/blob/master/16_reinforcement_learning.ipynb
the relevant code here:
x = tf.placeholder(tf.float32, shape=[none, n_inputs]) hidden = tf.layers.dense(x, n_hidden, activation=tf.nn.elu, kernel_initializer=initializer) logits = tf.layers.dense(hidden, n_outputs) outputs = tf.nn.sigmoid(logits) # probability of action 0 (left) p_left_and_right = tf.concat(axis=1, values=[outputs, 1 - outputs]) action = tf.multinomial(tf.log(p_left_and_right), num_samples=1) y = 1. - tf.to_float(action) cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits) optimizer = tf.train.adamoptimizer(learning_rate) grads_and_vars = optimizer.compute_gradients(cross_entropy) gradients = [grad grad, variable in grads_and_vars] gradient_placeholders = [] grads_and_vars_feed = [] grad, variable in grads_and_vars: gradient_placeholder = tf.placeholder(tf.float32, shape=grad.get_shape()) gradient_placeholders.append(gradient_placeholder) grads_and_vars_feed.append((gradient_placeholder, variable)) training_op = optimizer.apply_gradients(grads_and_vars_feed) ... # run training on bunch of instances of inputs step in range(n_max_steps): action_val, gradients_val = sess.run([action, gradients], feed_dict={x: obs.reshape(1, n_inputs)}) ... # weight each gradient action values, average, , feed them training_op apply_gradients() the above works fine, each run() returns different gradients.
i'd batch this, , feed array of inputs run() instead of 1 input @ time (my environment different 1 in sample, makes sense me batch, , improve performance). ie:
action_val, gradients_val = sess.run([action, gradients], feed_dict={x: obs_array}) where obs_array has shape [n_instances, n_inputs].
the problem optimizer.compute_gradients(cross_entropy) seems return single gradient, though cross_entropy 1d tensor of shape [none, 1]. action_val return 1d tensor of actions, expected - 1 action per instance in batch.
is there way me array of gradients, 1 per instance in batch?
the problem
optimizer.compute_gradients(cross_entropy)seems return single gradient, though cross_entropy 1d tensor of shape[none, 1].
that happens design, gradient terms each tensor automatically aggregated. gradient computation operations such optimizer.compute_gradients , low-level primitive tf.gradients make sum of gradient operations, according default addn aggregation method. fine cases of stochastic gradient descent.
in end unfortunately, gradient computation have made on single batch. of course, unless custom gradient function built, or tensorflow api extended provide gradient computation without full aggregation. changing implementation of tf.gradients not seem trivial.
one trick might wish employ reinforcement learning model perform multiple session runs in parallel. according faq, session api supports multiple concurrent steps, , take advantage of existing resources parallel computation. question asynchronous computation in tensorflow shows how this.
Comments
Post a Comment