Week 2 Classic networks AlexNet 8:40

I am probably missing a detail here where
13x13x384 volums is convolved with a 3x3 filter resulting in 13x13x384 filter

then again volume of the same size 13x13x384 is convolved with 3x3 filter. yet we get 13x13 x256 volume.

Is there a detail on stride or padding choice possibly missing in the video?

They are evidently using “same” padding and stride of 1. Note that the number of “channels” (the last dimension) is purely a choice: it depends on the number of filters specified in the given layer.