9

I am following this tutorial for binary class classification. While defining the model it is defined as follows and quotes:

Apply a tf.keras.layers.Dense layer to convert these features into a single prediction per image. You don't need an activation function here because this prediction will be treated as logit or a raw prediction value. Positive numbers predict class 1, negative numbers predict class 0.

model = tf.keras.Sequential([
  base_model,
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(1)
])

and then its compiled as

base_learning_rate = 0.0001
model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=base_learning_rate),
              loss='binary_crossentropy',
              metrics=['accuracy'])

I have seen a similar model definition here as follows:

model = tf.keras.Sequential([
  mobile_net,
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(len(label_names))])

model.compile(optimizer=tf.train.AdamOptimizer(), 
              loss=tf.keras.losses.sparse_categorical_crossentropy,
              metrics=["accuracy"])

In the above cases where no activation function is used, I observed predicted values take any real value(not in the range of [0,1]) and not a single negative value for example.

model = tf.keras.Sequential([
  mobile_net,
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(1)])

base_learning_rate = 0.0001
model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=base_learning_rate),
              loss='binary_crossentropy',
              metrics=['accuracy'])

np.squeeze(model.predict(test_ds, steps=test_steps_per_epoch))

# array([0.8656062 , 1.1738479 , 1.3243774 , 0.43144074, 1.3459874 ,
       0.8830215 , 0.27673364, 0.61824167, 0.6811296 , 0.31660053,
       0.66832197, 0.9944696 , 1.1472682 , 0.643435  , 1.6108004 ,
       0.46332538, 1.0919437 , 0.9578197 , 1.176657  , 1.1019497 ,
       1.2280573 , 1.3852577 , 1.0576394 , 0.89174306, 0.75531614,
       0.77309614, 0.2964771 , 1.4851328 , 0.52786475, 0.8349319 ,
       0.6725186 , 0.850648  , 1.5454502 , 1.5105858 , 0.8132403 ,
       0.8769205 , 0.8270436 , 0.5637488 , 1.0141921 , 1.7030811 ,
       1.4353518 , 1.4161562 , 1.378978  , 0.501247  , 0.6213258 ,
       0.9437766 , 2.429086  , 1.2481798 , 0.6229276 , 0.37893608,
       1.3877648 , 1.0904361 , 1.0879816 , 0.42403704, 0.79637295,
       2.8160148 , 0.8214861 , 0.8503458 , 0.80563146, 1.4901325 ,
       1.0303755 , 0.77981436, 1.088749  , 0.71522933, 1.3340217 ,
       2.0090134 , 1.0075089 , 0.8950774 , 0.6173111 , 0.7857665 ,
       1.7411164 , 1.3057053 , 0.33380216, 0.76223296, 1.5859761 ,
       0.96682435, 0.6254643 , 1.4843993 , 1.1031054 , 0.6320849 ,
       0.01859415, 0.72086346, 1.1440296 , 0.29395923, 1.5440805 ,
       0.380056  , 1.7602444 , 0.6369114 , 0.7867059 , 1.1418453 ,
       1.8237758 , 0.2560327 , 2.6044023 , 1.5562654 , 0.737739  ,
       0.40826577], dtype=float32)

QUESTION: 1

How does tensorflow calculate accuracy based on such values? Because these values are not 0 or 1, what threshold value does it use to decide whether a sample is of class 1 or class 0?


In another tutorial, I have seen the use of sigmoid or softmax activation function for the last layer.

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(10, activation=tf.nn.softmax)
])

similarly, I defined my model as follows:

model = tf.keras.Sequential([
  mobile_net,
  keras.layers.GlobalAveragePooling2D(),
  keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=0.0001),
              loss='binary_crossentropy',
              metrics=['accuracy'])

and observed values get in range of [0,1]

np.squeeze(model.predict(test_ds, steps=test_steps_per_epoch))

# array([0.5962706 , 0.41386074, 0.7369955 , 0.4375754 , 0.4081418 ,
       0.5233598 , 0.54559284, 0.58932847, 0.46750832, 0.73593813,
       0.49894634, 0.49055347, 0.37505004, 0.6098627 , 0.5756561 ,
       0.5219231 , 0.37050545, 0.5673407 , 0.5554987 , 0.531324  ,
       0.28257015, 0.74096835, 0.57002604, 0.46783662, 0.7368346 ,
       0.5332815 , 0.5606995 , 0.5541738 , 0.57862717, 0.40553188,
       0.46588784, 0.30736524, 0.43870398, 0.74726176, 0.71659195,
       0.27446586, 0.50352675, 0.43134567, 0.68349624, 0.38074452,
       0.5150338 , 0.7177907 , 0.61012363, 0.63375396, 0.43830383,
       0.5749217 , 0.4520418 , 0.42618847, 0.53284496, 0.55864084,
       0.55283684, 0.56968784, 0.5476512 , 0.47232378, 0.43477964,
       0.424371  , 0.5257551 , 0.4982109 , 0.6054718 , 0.45364827,
       0.5447099 , 0.5589619 , 0.6879043 , 0.43605927, 0.49726096,
       0.5986774 , 0.46806905, 0.45553213, 0.4558573 , 0.2709099 ,
       0.29398417, 0.42126212, 0.4208623 , 0.25966096, 0.5174277 ,
       0.5691663 , 0.6820154 , 0.66986185, 0.29530805, 0.5368336 ,
       0.6704497 , 0.4770817 , 0.58965963, 0.66673934, 0.44505033,
       0.3894297 , 0.53820807, 0.47612685, 0.3273378 , 0.6933465 ,
       0.54334545, 0.49939007, 0.5978731 , 0.49409997, 0.4585469 ,
       0.43943945], dtype=float32)

QUESTION: 2

How accuracy, in this case, is calculated by tensorflow?


QUESTION: 3

What is the difference between using sigmoid activation and not using it in the last layer? When I used the sigmoid activation function, the accuracy of the model somehow decreased by 10% than when I didn't use the sigmoid function. Is this coincident or does it has to do anything with the use of activation function.

1 Answer 1

6

The functions used to calculate the accuracy can be found here. There are different definitions depending on your problem, such as binary_accuracy or categorical_accuracy. The proper one is chosen automatically, based on the output shape and your loss (see the handle_metrics function here). Based on those:

1.

It depends on your model. In your first example it will use

def binary_accuracy(y_true, y_pred):
    '''Calculates the mean accuracy rate across all predictions for binary
    classification problems.
    '''
    return K.mean(K.equal(y_true, K.round(y_pred)))

As you can see it simply rounds the models predictions. In your second example it will use

def sparse_categorical_accuracy(y_true, y_pred):
    '''Same as categorical_accuracy, but useful when the predictions are for
    sparse targets.
    '''
    return K.mean(K.equal(K.max(y_true, axis=-1),
                          K.cast(K.argmax(y_pred, axis=-1), K.floatx())))

Here no rounding occurs, but it checks weather the class with the highest prediction is the same as the class with the true label.

2.

Again binary_accuracy will be used. However the predictions will come from a sigmoid activation.

3.

The sigmoid activation will change your outputs. It will ensure that the predictions are between 0 and 1. The accuracy changes because of that, e.g. 0 becomes 0.5 and is therefore rounded to 1. It will also effect training. It is common to use a sigmoid activation with crossentropy as it expects a probability.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.