Skip to content Skip to sidebar Skip to footer

Keras Beginner: What Is Supposed To Be The Output Shape Of The Last Layer?

I'm having a hard time wrapping my head around the math behind CNN's and how exactly I should modify the output shape in between layers of my neural network. I am trying to do the

Solution 1:

The documentation on Dense is not the clearest, but it is clear from the section describing input and output shapes.

Note: if the input to the layer has a rank greater than 2, then it is flattened prior to the initial dot product with kernel.

...

Input shape

nD tensor with shape: (batch_size, ..., input_dim). The most common situation would be a 2D input with shape (batch_size, input_dim).

Output shape

nD tensor with shape: (batch_size, ..., units). For instance, for a 2D input with shape (batch_size, input_dim), the output would have shape (batch_size, units).

This is very confusing because it talks about how higher rank tensors will be flattened first (which makes you think the overall output of Dense(1) would be a purely scalar value for each example from a batch), but as you demonstrated with your printout from summary(), it maintains the same intermediate dimensions of the tensor.

So if you give an input that is (None, 640, 959, 8), it means Dense will treat the final dimension as the one to go along for full connections, and will treat each individual unit among the 640x959 locations specified by the inner dimensions as a separate output neuron...

So if your network is this:

nn = Sequential()
nn.add(Conv2D(8, (3,3), input_shape = (640, 959, 3), activation='relu', padding='same'))
nn.add(Conv2D(8, (3,3), activation='relu', padding='same'))
nn.add(Dense(1, activation='softmax'))

then the final output shape will be

(None, 640, 959, 1)

That is, each output "pixel" (i, j) in the 640x959 grid is calculated as a dense combination of the 8 different convolution channels at point (i, j) from the previous layer.

There are various ways to achieve the same thing, for example a 1x1 convolution that downsamples the channel dimension from 8 to 1 would produce the same output shape as well, with a layer like,

Conv2D(1, (1,1), activation='relu', padding='same')

or you could reference the "naive Keras" example for the particular Kaggle competition you're working on, which uses this:

model = Sequential()
model.add( Conv2D(16, 3, activation='relu', padding='same', input_shape=(320, 480, 12) ) )
model.add( Conv2D(32, 3, activation='relu', padding='same') )
model.add( Conv2D(1, 5, activation='sigmoid', padding='same') )

Separately from all of this we have two problems of incorrect data dimensions from the code you've printed for us.

One is that you state the image height is 440, but the keras output says 640.

The other is that your final Dense layer has 6 channels in the output, but the corresponding code you provided could only lead to 1 channel.

So likely there is still some mismatch between the code you're using and the code you've pasted here, which prevents us from seeing the full problem with the dimension issues.

For example, the loss layer for this network ought to compare the ground truth bitmasks of car location pixels with the 640x959 Dense output of your final layer (once you fix the weird issue where you're showing 6 channels in the output).

But the error message you reported is

ValueError: Error when checking target: expected dense_1 to have 4 dimensions, but got array with shape (159, 640, 959)

and this suggests the batch of target data might need to be reshaped into a tensor of shape (159, 640, 959, 1), just for the sake of conformability with the shape that comes out of your Dense layer.

Post a Comment for "Keras Beginner: What Is Supposed To Be The Output Shape Of The Last Layer?"