Separable Convolution Intuition

Redd · August 8, 2022, 2:26am

I think I answered my own question. In the original MobileNet paper, Howard et. al. 2017, the first network layer is a traditional convolution using 32 \times (3,3,3) kernels. Each of those kernels are convolved independently with the RGB image, allowing the network to optimize to 32 spatial features.

The first depthwise convolution uses 32 \times (3,3) filters. Since the feature-space of the network is first expanded to 32 by a traditional convolution, it makes me think that the purpose of the (1,1) projection convolution is to create many different “strengths” of each filter – like in an Inception Network.

Topic		Replies	Views
In start they say 6x6x3 image 3 is color channel but now they are telling 37x37x40 40 is filter size. anybody can elaborate this Convolutional Neural Networks	3	497	December 20, 2022
Doubt in filter recognition Convolutional Neural Networks	5	549	December 27, 2021
General Convolution Question about filters/kernals Convolutional Neural Networks	5	567	September 17, 2022
How does convolution to fewer channels work Convolutional Neural Networks	1	457	May 28, 2023
Question - Trying to understand how convolution operation works w.r.t input feature volume & output feature volume Convolutional Neural Networks week-1	3	250	March 18, 2024

Separable Convolution Intuition

Related topics