It is frequently observed by students using the “Test with Your Own Image” section of the C1W4 Application Assignment with their own uploaded images that even the 4 layer model that we trained doesn’t do very well on new images, even though it has 80% accuracy on the test set here. It turns out that the datasets we have here are quite small compared to the sizes required to get good “generalizable” performance on an image recognition task like this. As a comparison, the Kaggle “Cats and Dogs” dataset has 25k images. It’s clear that the limitations of the online environment here required them to come up with pretty small datasets, so it occurred to me to flip the question around: how did they get such good performance with such a small dataset? Is there something special about the dataset that they are using here that allows them to get such relatively good performance with so few input data samples?

The first step is to do a little error analysis on the results. For all the experiments here, I increased the number of iterations to 3000, but used the same 4 layer network and the learning rate of 0.0075 that they used for the “official” results. Here’s the result from that run with the original dataset analyzed with a little extra code to compute the numbers of false positives and false negatives on the test set:

```
layers_dims = [12288, 20, 7, 5, 1] # 4-layer model
parameters, costs = L_layer_model(train_x, train_y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost = True)
pred_train = predict(train_x, train_y, parameters)
pred_test = predict(test_x, test_y, parameters)
print(f"pred_test error count = {np.sum(test_y != pred_test)}")
print(f"pred_test false negatives = {np.sum(pred_test[test_y == 1] == 0)}")
print(f"pred_test false positives = {np.sum(pred_test[test_y == 0] == 1)}")
print_mislabeled_images(classes, test_x, test_y, pred_test)
```

Running that gives this result:

```
Accuracy: 0.9904306220095691
Accuracy: 0.8200000000000001
pred_test error count = 9
pred_test false negatives = 2
pred_test false positives = 7
```

So you can see that most of the errors on the test set are false positives, meaning that the model seems to be a bit “yes happy”.

The next thing to look at is the balance of “cat” (yes) samples versus “non-cat” (no) samples in the two datasets. We already know that training set has 209 samples and the test set has 50 samples. Let’s see how many of each are “true” samples:

```
print(f"sum(train_y) = {np.sum(train_y)}")
print(f"sum(test_y) = {np.sum(test_y)}")
print(f"train positive sample ratio {np.sum(train_y)/train_y.shape[1]}")
print(f"test positive sample ratio {np.sum(test_y)/test_y.shape[1]}")
sum(train_y) = 72
sum(test_y) = 33
train positive sample ratio 0.3444976076555024
test positive sample ratio 0.66
```

Interesting! The training set has only 34% cats, but the test set is 66% cats, which makes things seem a bit “unbalanced”. But maybe that’s a good strategy if they know that the learned model is “yes happy”. So the next question is whether that imbalance is important or not. One way to experiment with that would be to trade positive samples from the test set with negative samples from the training set to make the two look a bit more similar. Unfortunately because of the smaller size of the test set, we don’t have enough positive samples to get the training set to 50/50 without completely depleting the positive examples in the test set. The reason for trading entries rather than just moving them is to try to control the number of variables that we are changing in this scientific experiment. If we increase the size of the training set, then we can’t be sure whether it’s the balance change or the size change that made the difference.

Here’s a block of code to trade the same number (*numTrade*) of positive images from the test set with the same number of negative images from the training set:

```
print(f"Starting positive samples: train = {np.sum(train_y)}, test = {np.sum(test_y)}")
numTrade = 4
test_x_pos = test_x[:,np.squeeze(test_y == 1)]
test_x_neg = test_x[:,np.squeeze(test_y == 0)]
# Permute the positive samples randomly before we pick the ones to trade
perm = np.squeeze(np.random.permutation(test_x_pos.shape[1]))
test_x_pos_perm = test_x_pos[:,perm]
test_x_pos_trade = test_x_pos_perm[:,0:numTrade]
test_x_pos_keep = test_x_pos_perm[:,numTrade:]
test_y_pos_trade = np.ones((1,test_x_pos_trade.shape[1]), dtype = 'int64')
test_y_pos_keep = np.ones((1,test_x_pos_keep.shape[1]), dtype = 'int64')
test_y_neg = np.zeros((1,test_x_neg.shape[1]), dtype = 'int64')
print_these_images(classes, test_x_pos_trade, test_y_pos_trade)
train_x_pos = train_x[:,np.squeeze(train_y == 1)]
train_x_neg = train_x[:,np.squeeze(train_y == 0)]
# Permute the negative samples randomly before we pick the ones to trade
perm = np.squeeze(np.random.permutation(train_x_neg.shape[1]))
train_x_neg_perm = train_x_neg[:,perm]
train_x_neg_trade = train_x_neg_perm[:,0:numTrade]
train_x_neg_keep = train_x_neg_perm[:,numTrade:]
train_y_neg_trade = np.zeros((1,train_x_neg_trade.shape[1]), dtype = 'int64')
train_y_neg_keep = np.zeros((1,train_x_neg_keep.shape[1]), dtype = 'int64')
train_y_pos = np.ones((1,train_x_pos.shape[1]), dtype = 'int64')
print_these_images(classes, train_x_neg_trade, train_y_neg_trade)
bal_train_x = np.concatenate((train_x_pos, test_x_pos_trade, train_x_neg_keep), axis=1)
bal_train_y = np.concatenate((train_y_pos, test_y_pos_trade, train_y_neg_keep), axis=1)
bal_test_x = np.concatenate((test_x_pos_keep, test_x_neg, train_x_neg_trade), axis=1)
bal_test_y = np.concatenate((test_y_pos_keep, test_y_neg, train_y_neg_trade), axis=1)
print(bal_train_x.shape)
print(bal_train_y.shape)
print(bal_test_x.shape)
print(bal_test_y.shape)
print(f"After rebalance positive samples: train = {np.sum(bal_train_y)}, test = {np.sum(bal_test_y)}")
```

That block of code references a “print images” function that I created by hacking on the *print_mislabeled_images* function that they provided:

```
def print_these_images(classes, X, y):
"""
Plots images.
X -- dataset
y -- true labels
"""
plt.rcParams['figure.figsize'] = (40.0, 40.0) # set default size of plots
num_images = X.shape[1]
for ii in range(num_images):
plt.subplot(2, num_images, ii + 1)
plt.imshow(X[:,ii].reshape(64,64,3), interpolation='nearest')
plt.axis('off')
plt.title("Class: " + classes[y[0,ii]].decode("utf-8"))
plt.show()
```

I’ll show some experiments using the above code in another post in a few minutes.