What model to use for continuous tasks?

Hi,
I’m wondering what type of models are usually used when it comes to autonomous driving. For the obstacle detection part.

In the DLS course 4 we had an example using CNN architectures (YOLO). But shouldn’t we also consider Sequential type of models because thing may also be time related/dependent.
But I understand that RNN can become very quickly computational expensive if we go for deeper models, which are kinda necessary for high level feature extraction.
So I was wondering what’s the state of the art on this, are there good paper recomendations to read?

Thanks !

@Dyxuki
For a sequential task such as self-driving, you do need to consider a range of data, then give a rise to a need for some architectures such as GRU, LSTM. But as you point out, they can get quickly expensive and you definitely want something really fast for something like self-driving.
As you discover in Course 5, one of the most recent and the most significant development in this area is the Transformer with Attention. Attention model looks at a given range of data all at once as opposed to sequentially, which results in much faster computation time. You can have a multi-head version to attend to different parts of the data concurrently.
I am not a self-driving expert but found this paper, which holds the SOTA on CARLA dataset if you are interested. Full disclosure: I have not read the paper myself (not yet at least) :slight_smile:

2 Likes

Hello @suki
thanks for the reply, I’m gonna give that paper a look ! :slight_smile:

Another question I got on my mind is, for live tasks, how does one define a sequence? since that sequence never ends. So do you use sth like a “sliding window over time” ?

Also while learning Transformer Attention model, another architecture came accross my mind which also kinda is a “fusion or CNN & RNN”, basically I was imagining to use CNN structures to act as feature extractor, similar to word embedding, so in practice I would get rid of the last softmax layer of a CNN, the use that last FC layer as input for a RNN, for each timestep. I don’t know if that is a viable solution in terms of processing time for live tasks.