gcloud ml-engine predict is very slow on inference -
i'm testing segmentation model on gcloud , inference incredibly slow. takes 3 min result (averaged on 5 runs). same model runs ~2.5 s on laptop when running through tf-serving. normal? didn't find mention in documentation on how define instance type , seems impossible run inference on gpu. steps i'm using straightforward , follows examples , tutorials:
gcloud ml-engine models create "seg_model" gcloud ml-engine versions create v1 \ --model "seg_model" \ --origin $deployment_source \ --runtime-version 1.2 \ --staging-bucket gs://$bucket_name gcloud ml-engine predict --model ${model_name} --version v1 --json-instances request.json
upd: after running more experiments found redirecting output file gets inference time down 27s. model output size 512x512, causes delays on client side. although lower 3 min, still order of magnitude slower tf-serving.
Comments
Post a Comment