Hi, while going through the lab i noticed the surrogateKey function concatenates the text_values list twice, is this intentional?
First:
data = ''.join(text_values)
and then:
for value in text_values:
data = data + str(value)
Hi, while going through the lab i noticed the surrogateKey function concatenates the text_values list twice, is this intentional?
First:
data = ''.join(text_values)
and then:
for value in text_values:
data = data + str(value)
Hello @MPS,
Thanks for reporting this, I don’t remember this issue before. Could you provide some info if its a grader or practice lab so I can look at it, thanks
Oh yeah, sorry, its the
Graded Programming Assignment 3: Data Transformations with Apache Spark
In the
C4_W3_Assignment.ipynb
notebook
@MPS Yes good catch this for loop looks redundant and could be removed in the surrogateKey function. I will forward this and see if it is removed. Thanks
You could also keep the for loop and do a .update(value) on every iteration. Anyway, tricky one to find with tests since the function still does what it should as a hash of a twice repeated string works equally well, I guess.
@MPS I tried the lab without the for loop and it seems I get the correct outputs with the surrogateKey function. As it is I wouldn’t change it too much for the learners to be easier to understand. Thanks for the suggestion