I'm training a Deep learning GAN model on AWS Ec2 instance . I'm getting an error " Nan in summary histogram " for which i searched and got several answers like playing with learning rate and batch size but it appears to be a temporary solution as it sometimes run for some epochs say 100 and next time does not even complete 50 with same parameters as in the case of 100 epochs . It is not certain that after a particular point my model would through me an error as it may run for 200 and next time would give an error after 20 . so my question is why am i getting this error and how to get rid of this .
The error message is as follows :
Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata)tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Nan in summary histogram for: discriminator/layer_4/batch_normalization/gamma/values [[{{node discriminator/layer_4/batch_normalization/gamma/values}}]] [[generator/encoder_3/batch_normalization/beta/read/_658]] (1) Invalid argument: Nan in summary histogram for: discriminator/layer_4/batch_normalization/gamma/values [[{{node discriminator/layer_4/batch_normalization/gamma/values}}]]0 successful operations.0 derived errors ignored.During handling of the above exception, another exception occurred:Traceback (most recent call last): File "pix2pix.py", line 803, in <module> main() File "pix2pix.py", line 769, in main results = sess.run(fetches, options=options, run_metadata=run_metadata) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run run_metadata_ptr) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run run_metadata) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call raise type(e)(node_def, op, message)tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Nan in summary histogram for: discriminator/layer_4/batch_normalization/gamma/values [[node discriminator/layer_4/batch_normalization/gamma/values (defined at /home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] [[generator/encoder_3/batch_normalization/beta/read/_658]] (1) Invalid argument: Nan in summary histogram for: discriminator/layer_4/batch_normalization/gamma/values [[node discriminator/layer_4/batch_normalization/gamma/values (defined at /home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]0 successful operations.0 derived errors ignored.Original stack trace for 'discriminator/layer_4/batch_normalization/gamma/values': File "pix2pix.py", line 803, in <module> main() File "pix2pix.py", line 700, in main tf.summary.histogram(var.op.name +"/values", var) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/summary/summary.py", line 179, in histogram tag=tag, values=values, name=scope) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_logging_ops.py", line 329, in histogram_summary"HistogramSummary", tag=tag, values=values, name=name) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper op_def=op_def) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op attrs, op_def, compute_device) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal op_def=op_def) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__ self._traceback = tf_stack.extract_stack()
Please let me know how to tackle this and what are the reasons behind this .
Thanks in Advance.