What is the use of NER-named entity recognition, specifically?

I have searched that question on the internet, and all I got were vague answers. I still don’t understand why this task is necessary. We classify each word in the text, then what?

I am happy to hear your answer.

Thank you all.

1 Like

Hi @20020069_Le_Thai_S_n

What do mean by saying “specifically”?

Named Entity Recognition is important linguistic category

“Named Entity” is often used for anything that can be referred to with a name: a person, a location, an organization. Parts of speech (POS) historically were used to study language and are generally assigned to individual words, a proper name is often an entire phrase, like the name “New York”.

POS and named entities are useful clues to sentence structure and meaning. For example, knowing whether a word is a noun or a verb tells us about likely neighboring words. Knowing if a named entity like Washington is a name of a person, a place, or a car is important to many natural language processing tasks like question answering, or information extraction.

One concrete use that I had in my work is detecting car brands and models in a string.
For example, one very specific example:
“BMW X3 sDrive2.8 …” - which part is brand name, which part is model, which part is engine etc. This string can appear in many forms like “x3 Bayerische Motoren Werke”, “BMW Series X3 F25; 2010” and many more. Knowing which part is what is your use case for further downstream tasks (like insurance premium calculation).


Thank you for your previous answer, my deer arvyzukai.
NER extracts information; yes, everyone says that. And I wonder, what is that information used for? What are the downstream tasks? QA, grammar checks, and logging into databases are all I can think about. Do chatbots benefit from it too? Can you provide some more examples?

More concretely, why do you need to know the model part and engine part of "BMW X3 sDrive2.8?

Thank you.

Hi, @20020069_Le_Thai_S_n

In my case, the model is a feature (a risk factor, among with others).

For example, you can imagine that you are the insurer and two people come to you for insurance. One has Ford Mustang (sportier version) valued at $30 000 and the other has Ford Galaxy (mini van) also valued at $30 000. You would probably find the sportier car more risky to insure and you would ask a bigger insurance fee because of that. Of course, age and other factors matter but if you have a string that includes the model, you would probably account for it.

Your calculation could be:

  • price = value_coef * value + model_coef * value

    • fee_1 = 0.01 * 30000 + 0.02 * 30000 = $900
    • fee_2 = 0.01 * 30000 + 0.005 * 30000 = $450

That’s an overly simple downstream task for illustration.

Same would thing for engine - bigger engine tends to mean riskier client.