Clarification Needed on Techniques for Handling Structured vs. Unstructured Data in AI

Hi everyone!

I’m currently taking the course “[AI for Everyone]” (Week 1: What is Data), the link is https://www.coursera.org/learn/ai-for-everyone/lecture/dLSWR/what-is-data.

I came across a few statements that I’m hoping to get some help understanding:

“The techniques for dealing with unstructured data are different than the techniques for dealing with structured data. Germs of AI today are used primarily to generate unstructured data, rather than structured data. Supervised learning can work very well for both structured and unstructured data.”

I’m curious about the techniques used for unstructured data. Are these techniques meant to transform unstructured data into a structured format before processing? Is supervised learning the best way to apply structured and unstructured data?

Also, I don’t understand the sentence " Germs of AI today are used primarily to generate unstructured data, rather than structured data ". Could someone explain what that means?

Thanks in advance for any insights you can provide!

1 Like

To answer part of your question:

“Germs of AI today…” makes no sense to me.

I believe there was a mistake in editing the video, and part of the word was clipped-off by accident.

I do not know what the intended word might have been.

If anyone else wants to try to decode the intended dialog, it’s at time mark 10:16 in the linked video.

Thanks @TMosh

Using the housing price example from the video, the position of a particular value in the data has meaning. You can think of the data as having columns, which convey the meaning, and rows, which are the values (conceptually true regardless of the exact mechanism for storing the data). The data is said to have a structure, because there is a regular pattern. This is why the spreadsheet analogy is useful. The columns tell you the structure and the rows all have the same pattern. Another word used in this context is schema.

Unstructured data might be the pixels in an image, the words in a document, or the frequency in an audio file. Unlike in structured data, there is no required or explicit regularity, so the position of a value carries no meaning. Any pixel value, word, or sound can occur in any order.

Generally there is no reason, ability, or purpose to attempting to impose such regularity, so my answer to your question is ‘No, you are not generally attempting to transform unstructured data into a structured format before processing.’

If you’re just starting your machine learning and AI journey (my assumption) it may be premature in this thread to talk in any detail about how the techniques of supervised and unsupervised learning differ; plenty of opportunity and resources available elsewhere to go deep on this topic. But I would say the decision of whether to use supervised or unsupervised learning depends more on what kind of meaning or story you want to extract from the data, and less on whether the data itself is structured or unstructured. Both supervised and unsupervised learning have broad applicability to both structured and unstructured data. Hope this helps.

Regarding the ‘Germ…’ sentence, I have no idea what is going on there. I find it odd that Prof Ng talks about generating data, because the remainder of the video focusses on the type of data that AI consumes. Modern AI is certainly capable of generating content, but that just isn’t what this video is about. Sorry, can’t help here.