13

Using tensorflow.keras (2.0-alpha0 with GPU support) I have extremely long initialize times with tf.keras.model.fit() on both newly compiled models and models previously saved and reloaded.

I believe this is after the tf.data.Datasets() have already been loaded and preprocessed, so I don't understand what is taking so long and there is no output from TF/Keras:

2019-04-19 23:29:18.109067: tensorflow/core/common_runtime/gpu/gpu_device.cc:1149] Created TensorFlow device
Resizing images and creating data sets with num_parallel_calls=8
Loading existing model to continue training.
Starting model.fit()
Epoch 1/100
2019-04-19 23:32:22.934394: tensorflow/core/kernels/data/shuffle_dataset_op.cc:150] Shuffle buffer filled.
2019-04-19 23:38:52.374924: tensorflow/core/common_runtime/bfc_allocator.cc:230] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.62GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

3 minutes to load the model and fill the shuffle buffer and 6 minutes for ... what? And how can this mysterious work be optimized? (5ghz 8700K, 32 GB RAM, NVME SSD, 1080ti 11G DDR5 - task manager shows 100% single-thread CPU use, moderate disk access, slowly expanding RAM usage to ~28GB max, zero GPU usage during this period).

Is there any way to serialize or store the models in a more efficient way such that they can be started and stopped regularly without the 10 minutes of overhead?

Is TF/Keras somehow lazy-loading the data sets and preprocessing them in this period?

1 Answer 1

1

It looks like an issue with using multiple workers for tf.data.Datasets(). From the logging messages, it shows that you're using 8 parallel processes, which would explain why you're showing such high CPU/RAM usage. So this isn't a problem with the model.

To my knowledge, the first time you use Datasets should be fairly slow, but it will get faster after the data gets cached.

If the model.fit() call is still starting very slowly, you can tune down the number of processes to 4 or 2. That might impact your training time as your SSD could slow down due to having to load the data.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.