Quantcast
Channel: Active questions tagged amazon-ec2 - Stack Overflow
Viewing all articles
Browse latest Browse all 29248

Deep learning based GAN model error on AWS gpu

$
0
0

I'm training a Deep learning GAN model on AWS Ec2 instance . I'm getting an error " Nan in summary histogram " for which i searched and got several answers like playing with learning rate and batch size but it appears to be a temporary solution as it sometimes run for some epochs say 100 and next time does not even complete 50 with same parameters as in the case of 100 epochs . It is not certain that after a particular point my model would through me an error as it may run for 200 and next time would give an error after 20 . so my question is why am i getting this error and how to get rid of this .

The error message is as follows :

Traceback (most recent call last):  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call    return fn(*args)  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn    target_list, run_metadata)  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun    run_metadata)tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.  (0) Invalid argument: Nan in summary histogram for: discriminator/layer_4/batch_normalization/gamma/values     [[{{node discriminator/layer_4/batch_normalization/gamma/values}}]]     [[generator/encoder_3/batch_normalization/beta/read/_658]]  (1) Invalid argument: Nan in summary histogram for: discriminator/layer_4/batch_normalization/gamma/values     [[{{node discriminator/layer_4/batch_normalization/gamma/values}}]]0 successful operations.0 derived errors ignored.During handling of the above exception, another exception occurred:Traceback (most recent call last):  File "pix2pix.py", line 803, in <module>    main()  File "pix2pix.py", line 769, in main    results = sess.run(fetches, options=options, run_metadata=run_metadata)  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run    run_metadata_ptr)  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run    feed_dict_tensor, options, run_metadata)  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run    run_metadata)  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call    raise type(e)(node_def, op, message)tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.  (0) Invalid argument: Nan in summary histogram for: discriminator/layer_4/batch_normalization/gamma/values     [[node discriminator/layer_4/batch_normalization/gamma/values (defined at /home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]     [[generator/encoder_3/batch_normalization/beta/read/_658]]  (1) Invalid argument: Nan in summary histogram for: discriminator/layer_4/batch_normalization/gamma/values     [[node discriminator/layer_4/batch_normalization/gamma/values (defined at /home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]0 successful operations.0 derived errors ignored.Original stack trace for 'discriminator/layer_4/batch_normalization/gamma/values':  File "pix2pix.py", line 803, in <module>    main()  File "pix2pix.py", line 700, in main    tf.summary.histogram(var.op.name +"/values", var)  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/summary/summary.py", line 179, in histogram    tag=tag, values=values, name=scope)  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_logging_ops.py", line 329, in histogram_summary"HistogramSummary", tag=tag, values=values, name=name)  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper    op_def=op_def)  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func    return func(*args, **kwargs)  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op    attrs, op_def, compute_device)  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal    op_def=op_def)  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__    self._traceback = tf_stack.extract_stack()

Please let me know how to tackle this and what are the reasons behind this .

Thanks in Advance.


Viewing all articles
Browse latest Browse all 29248

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>