I have a basic question regarding the Beam search complexity vs the Greedy search. Though the former was presented as a more efficient alternative to greedy search, it seems at first to me that its complexity is higher: aren’t we performing at each step a greedy search over all the words’ dictionary, repeating the process B-depth times. If the greedy search complexity is O(10000^sentence_length), then the beam search complexity is O(Beam depthx10000^sentence_length), right?
greedy is searching every word against every word in vocab which means 10k * 10k for each word in sentence. Greedy search does not fix any word and thus has to search against all the 10k words.
beam search is taking a choice of beam size (3): 3 * 10k for each word in the sentence. Beam search fixes one option as the highest probability word and then looks for the second word and so on. This process of choosing the highest probability word reduces the complexity.
I agree with your argument but then Andrew says Greedy search is Beam Search with B=1, doesn’t this mean the time complexity of greedy would be lower?
Right. the time complexity of the greedy search is lower than the Beam search. The advantage of the Beam search is, with having multiple tokens (beam_size), it can significantly improve the quality of the output sequence. But, as you pointed out, the drawback is the time complexity, the speed.
Is this greedy search different from the graph greedy search because greedy search time complexity is O(b^m) while beam search is O(B*m). Doesn’t it mean that greedy search should be slower?
And on another note Andrew says that greedy search only looks at the last word but beam search solves this problem by look at all the previous words. While if greedy search is beam search with B=1, then it still can look at all the previous words but instead of the 3 (B=3) most probable one it would only chose 1 at each step. My point is that if greedy search is Beam search with B=1 it still can look at all the previous words and not just one last word. The difference between these searches is that at each step we are moving forward with more than one option, right?
I think what you mentioned as “Greedy search” is “Exhaustive Search”.
To be consistent, let’s set common notations. (this may not be same as ones by Andrew, but, I do not have time to watch his all videos now… )
m : number of candidates in one single time step t
T : total time step (t_0, t_1, ...)
Exhaustive Search : O(m^T)
Greedy Search : O(m*T)
Beam Search (Beam size B) : O(B*m*T)
And on another note Andrew says that greedy search only looks at the last word but beam search solves this problem by look at all the previous words.
I do not think this is correct. Beam search only keeps “B” candidate words, not all.
To be clear, in the case of Beam search, it keeps B candidates, in the case of Greedy search, it keeps 1 candidates, and in the case of Exhaustive Search, it keeps all candidates.