"loss Is Nan" Error In Keras, How To Debug?
Solution 1:
Over the last few days I've learned some relevant things.
- This is a known issue with PlaidML, apparently still unresolved. Here is the discussion at github.
- While I read in many places that Tensorflow will only work on an Nvidia GPU, I found at the Intel website that Tensorflow will work with the Mac Intel CPU.
- The pip install commands I found here and there for Tensorflow did not work for me, but conda will work: "conda install tensorflow" or "conda install tensorflow -c anaconda"
- Since I now have both Tensorflow Keras and PlaidML Keras installed, I can import with either "from tensorflow.keras import XXX" or "from keras import XXX" to choose the CPU or GPU version. That's useful.
Not yet investigated: According to many websites, if I have a Thunderbolt 3 port (one Mac does, one does not), it is possible to add an external Nvidia eGPU to a Mac. Whether it will work is unknown. Gamer discussion boards seem to say it doesn't work for their purposes, but Keras-related websites say that Tensorflow will use an eGPU just fine.
At any rate, I can proceed with my projects using tensorflow.keras on the CPU. It's not ideal but it works. On the test code above I was seeing about 100 μs/sample, 6 s/epoch.
Bottom line for people experiencing PlaidML-Keras issues: It's a universal issue. PlaidML is broken for a lot of GPUs. For now use Tensorflow on the Intel CPU, and keep your eye on Issue #168 to watch for a fix.
Post a Comment for ""loss Is Nan" Error In Keras, How To Debug?"