I am probably missing a detail here where
13x13x384 volums is convolved with a 3x3 filter resulting in 13x13x384 filter
then again volume of the same size 13x13x384 is convolved with 3x3 filter. yet we get 13x13 x256 volume.
Is there a detail on stride or padding choice possibly missing in the video?