Hi,
I’m wondering what type of models are usually used when it comes to autonomous driving. For the obstacle detection part.
In the DLS course 4 we had an example using CNN architectures (YOLO). But shouldn’t we also consider Sequential type of models because thing may also be time related/dependent.
But I understand that RNN can become very quickly computational expensive if we go for deeper models, which are kinda necessary for high level feature extraction.
So I was wondering what’s the state of the art on this, are there good paper recomendations to read?
@Dyxuki
For a sequential task such as self-driving, you do need to consider a range of data, then give a rise to a need for some architectures such as GRU, LSTM. But as you point out, they can get quickly expensive and you definitely want something really fast for something like self-driving.
As you discover in Course 5, one of the most recent and the most significant development in this area is the Transformer with Attention. Attention model looks at a given range of data all at once as opposed to sequentially, which results in much faster computation time. You can have a multi-head version to attend to different parts of the data concurrently.
I am not a self-driving expert but found this paper, which holds the SOTA on CARLA dataset if you are interested. Full disclosure: I have not read the paper myself (not yet at least)
Hello @suki
thanks for the reply, I’m gonna give that paper a look !
Another question I got on my mind is, for live tasks, how does one define a sequence? since that sequence never ends. So do you use sth like a “sliding window over time” ?
Also while learning Transformer Attention model, another architecture came accross my mind which also kinda is a “fusion or CNN & RNN”, basically I was imagining to use CNN structures to act as feature extractor, similar to word embedding, so in practice I would get rid of the last softmax layer of a CNN, the use that last FC layer as input for a RNN, for each timestep. I don’t know if that is a viable solution in terms of processing time for live tasks.