9

I ran into an apparent circular dependency trying to use log data for TensorBoard during a hyper-parameter search done with Keras Tuner, for a model built with TF2. The typical setup for the latter needs to set up the Tensorboard callback in the tuner's search() method, which wraps the model's fit() method.

from kerastuner.tuners import RandomSearch
tuner = RandomSearch(build_model, #this method builds the model
             hyperparameters=hp, objective='val_accuracy')
tuner.search(x=train_x, y=train_y,
             validation_data=(val_x, val_y),
             callbacks=[tensorboard_cb]

In practice, the tensorboard_cb callback method needs to set up the directory where data will be logged and this directory has to be unique to each trial. A common way is to do this by naming the directory based on the current timestamp, with code like below.

log_dir = time.strftime('trial_%Y_%m_%d-%H_%M_%S')
tensorboard_cb = TensorBoard(log_dir)

This works when training a model with known hyper-parameters. However, when doing hyper-parameters search, I have to define and specify the TensorBoard callback before invoking tuner.search(). This is the problem: tuner.search() will invoke build_model() multiple times and each of these trials should have its own TensorBoard directory. Ideally defining log_dir will be done inside build_model() but the Keras Tuner search API forces the TensorBoard to be defined outside of that function.

TL;DR: TensorBoard gets data through a callback and requires one log directory per trial, but Keras Tuner requires defining the callback once for the entire search, before performing it, not per trial. How can unique directories per trial be defined in this case?

1 Answer 1

2

The keras tuner creates a subdir for each run (statement is probably version dependent).

I guess finding the right version mix is of importance.

Here is how it works for me, in jupyterlab.

prerequisite:

  1. pip requirements
    keras-tuner==1.0.1
    tensorboard==2.1.1
    tensorflow==2.1.0
    Keras==2.2.4
    jupyterlab==1.1.4

(2.) jupyterlab installed, built and running [standard compile arguments: production:minimize]

Here is the actual code. First i define the log folder and the callback

# run parameter
log_dir = "logs/" + datetime.datetime.now().strftime("%m%d-%H%M")

# training meta
stop_callback = EarlyStopping(
    monitor='loss', patience=1, verbose=0, mode='auto')

hist_callback = tf.keras.callbacks.TensorBoard(
    log_dir=log_dir,
    histogram_freq=1,
    embeddings_freq=1,
    write_graph=True,
    update_freq='batch')

print("log_dir", log_dir)

Then i define my hypermodel, which i do not want to disclose. Afterwards i set up the hyper parameter search

from kerastuner.tuners import Hyperband

hypermodel = get_my_hpyermodel()

tuner = Hyperband(
    hypermodel
    max_epochs=40,
    objective='loss',
    executions_per_trial=5,
    directory=log_dir,
    project_name='test'
)

which i then execute

tuner.search(
    train_data,
    labels,
    epochs=10,
    validation_data=(val_data, val_labels),
    callbacks=[hist_callback],
    use_multiprocessing=True)

tuner.search_space_summary()

While the notebook with this code searches for adequate hyper parameters i control the loss in another notebook. Since tf V2 tensorboard can be called via a magic function

Cell 1

import tensorboard

Cell 2

%load_ext tensorboard

Cell 3

%tensorboard --logdir 'logs/'

Sitenote: Since i run jupyterlab in a docker container i have to specifiy the appropriate address and port for tensorboard and also forward this in the dockerfile.

The result is not really predictable for me... I did not understand yet, when i can expect histograms and distributions in tensorboard. Some runs the loading time seems really excessive... so have patience

Under scalars i find a list of the turns as follows

"logdir"/"model_has"/execution[iter]/[train/validation]

E.g. 0101-1010/bb7981e03d05b05106d8a35923353ec46570e4b6/execution0/train 0101-1010/bb7981e03d05b05106d8a35923353ec46570e4b6/execution0/validation

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.