python - How to properly create a Deep LSTM Stack for Many to One Learning in Keras without memory issues -
i have made function creates lstm stack in keras via functional programming loops. have 2 parameters; 1 number of units in each lstm , number of times loop (how many lstms in stack). not understand how optimize nor ensure model learns sentence embedding list of word vectors.
i have input shape [max_time_step, 300] max_time_step equal max sentence length in batch (all of sentences padded length) , 300 dimension of word vector.
i have done reading , found (probably inaccurate) information told me number of units in keras lstm should equal dimensionality of input (i assumed in case 300, not time_steps). , since have read in various places time_step needs applied in way, made sense make deep stack of lstm's size of max time_steps. thought allow model learn usage of individual word vectors in determining sentence embedding.
in task, embed 2 sentences above method, , compare them via concatenation , perceptron. when use 1 lstm achieve 0.40 pearson correlation. single lstm has units match dimension of input (300). when stack multiple of these lstm's number of time steps (please correct me if wrong practice), absolutely no learning accomplished (it performs worse basic perceptron). tried flipping parameters, units=time_step , stack size =300, results in (perhaps obvious) machine running out of memory. when runs out of memory locks , needs force rebooted (iam ssh-ing machine). when used 5 lstms in stack, error when used 5 time_steps:
resourceexhaustederror (see above traceback): oom when allocating tensor shape[300,49,1200] [[node: gradients_7/lstm_hidden_2_5_7/transpose_grad/transpose = transpose[t=dt_float, tperm=dt_int32, _class=["loc:@lstm_hidden_2_5_7/transpose"], _device="/job:localhost/replica:0/task:0/gpu:0"](gradients_7/lstm_hidden_2_5_7/tensorarrayunstack/tensorarrayscatter/tensorarrayscatterv3_grad/tensorarraygatherv3, gradients_7/lstm_hidden_2_5_7/transpose_grad/invertpermutation)]]
my question is: how can stack keras lstm's in general model without causing memory issue?
lstm model:
def lstm_hidden(x, units=hidden_nodes, time_step=time_step, sequences=flags.sequences, identifier=""): """ easy function call creating multiple lstms in stack or sequence """ in range(0, time_step-1): x = lstm(units, return_sequences=true, stateful=flags.stateful, activation='elu', name="lstm_hidden_" + identifier + "_" + str(i))(x) last = lstm(units, return_sequences=false, stateful=flags.stateful, activation='elu', name="lstm_hidden_" + identifier + "_" + str(time_step)) x = last(x) print("x_lstm_last shape = ", x.get_shape().as_list()) print("last.output_shape ", last.output_shape) return x def lstm((input_shape, embed_models_tup)): """ basic lstm model recieves 2 sentences , embeds them words , learns relation. """ input1 = input(shape=input_shape) input2 = input(shape=input_shape) (embed_model1, embed_model2)= embed_models_tup # unpack word embeds emb1 = embed_model1(input1) emb2 = embed_model2(input2) sent_emb1 = lstm_hidden(emb1, input_shape[-1], input_shape[0], identifier="1") sent_emb2 = lstm_hidden(emb2, input_shape[-1], input_shape[0], identifier="2") concat = concatenate() combine = concat([sent_emb1, sent_emb2]) dense = dense(input_shape[0], activation='elu', kernel_initializer='he_normal')(combine) predictions = dense(1, activation='linear', name="single_dense")(dense) model = model([input1, input2], predictions) opt = rmsprop(lr=flags.learning_rate) model.compile(optimizer=opt,#'rmsprop', loss='mean_squared_error', metrics=['accuracy', 'mean_squared_error' ]) return model
if required, whole file in gist. not able run without sentence embedding code, can add if necessary (this larger project). should not necessary since above lstm model in question.
system info: geforce 1050 ti 4gb gpu ram 16 gb system ram 8 cpu threads
Comments
Post a Comment