I am using tensorflow-serving to run models on my local nvidia-docker image tensorflow/serving:latest-gpu with below configuration
- gpu:GeForce GTX 1080 with 12 GB
- Driver Version: 418.88
- CUDA Version: 10.1
But my inference is taking almost "double" time when I am running in aws-ec2 with below configuration.
- gpu: Tesla K80 with 12 GB
- Driver Version: 418.40.04
- CUDA Version: 10.1
Why is this happening?