Does AI have confirmation bias?

elizabeth_stip · May 30, 2022, 10:06pm

I just started the deeplearning.AI coursera on tensorflow, and was interested in knowing if AI is ever trained on a dataset that has false data, as to show what something is not supposed to be. For example with the 2x-1 example, could it be possible to train the data on a false dataset that shows what it is not supposed to do (example: dataset with different equations)? If this is a silly question, sorry, I’m just curious and very new to this!

David_Simmonds · May 30, 2022, 10:46pm

Well here’s my take. If you train a model on beautiful faces, and ALL of the training set is White faces, what prediction do you think it will make about Black faces? I don’t see how AI can perform any different from a child who grew up in a biased/prejudiced environment. The training set is the people the child and parent came across together. The label is going to be whatever label the parent gave the person.

paulinpaloalto · May 30, 2022, 11:28pm

It’s an interesting and important question. With supervised machine learning, the algorithm has no intrinsic notion of what “correctness” is. Its definition of “correct” is the labels that you provide it on the training data. It is only learning the patterns in the input data that map to the answers (“labels”) that you give it. So if you train it on wrong answers, then it will learn to produce (duplicate) those wrong answers, assuming that you choose an appropriate model and get it to train successfully. For example if you have a training set consisting of pictures of cats and dogs and you label the cat pictures as “dog” and the dog pictures as “cat”, then that’s what the algorithm will reproduce. It only “knows” what you tell it. Of course, there’s also the case in which your data is basically garbage, meaning the labelling is not consistent (e.g. some of the pictures that contain cats, you label as dogs, and some you label correctly), then the behavior will be inconsistent. The high level point is that the quality of the training data is critical to the success of the predictions that your model will make: without meaningful data, it’s hopeless.

David_Simmonds · May 30, 2022, 11:46pm

Exactly. Parenting is nothing but a series of training data with (situations) with labels provided by the parents (or the TV if that’s what serves as guidance in the home). Labelling can of course be explicit statements by the parent that a behavior is acceptable or unacceptable, or implied by the parents’ own acceptance/rejection of the situation. The model cannot be better than the labelling of the data. I was co-author on a study of cheating in college, and one finding was that the environment of the student is the main driver of cheating behaviors. Same thing with AI models trained by biased data. Bias can be in the subtle form of over or under-sampling of one category of persons in the training data.

paulinpaloalto · May 30, 2022, 11:48pm

Exactly. The way a child learns to recognize a cat is that one of their parents points and says “look at the kitty!” Enough times …

BTW I should have said that I was adding to your answer, not critiquing it. Just providing another perhaps more general statement of how all this works …

David_Simmonds · May 30, 2022, 11:55pm

For sure Paul. The way you explained it reminds me of the constant confusion between British & US English speakers. Hood/Bonnet, Trunk/Boot, color/colour etc. lol.

Topic		Replies	Views
Over 90% accuracy but wrong predictions AI Discussions ai-discussions	14	1000	April 16, 2024
Extra info about week2 - Cleaning up incorrectly label data Structuring Machine Learning Projects	5	541	December 29, 2022
Course 3 Week 2 - Cleaning Up Incorrectly Labeled Data Structuring Machine Learning Projects	1	524	October 7, 2022
Doubts for error analysis AI Discussions ai-discussions	3	33	August 9, 2024
OMG, L_layer_model classified me as cat Neural Networks and Deep Learning	6	651	February 4, 2023

Does AI have confirmation bias?

Related topics