Clown emotions

In lesson 2, NexusRaven is able to learn to draw a happy or sad clown face simply by adding “Controls the emotions of the clown.” to the docstring of the mouth_theta parameter. The only other information being passed in is that the default value for this parameter is (200, 340), the two values are degrees, and an example where the mouth theta (0,180) is described as “smiling”.

Somehow, the model is able to correctly deduce that by reversing the parameters to (180,0) it will be able to generate a sad face. I’m not sure I would be able to do the same, given the same prompt and without looking at some examples.

Is this based on knowing something about the shape of a smile or frown, or is it just knowing that a sad face is the opposite of a smiling face, so tried to pick points as far as possible from the points generating the smile on a circle?

How does it know how to make the clown frown from only an example of a clown smiling?

I didn’t look at course codes yet, but in orthodontics (dentistry)

there are features which determine smile curve

So probably either the metadata might be provided this information(because you mention the parameter which seem relative to smile arc), or the another way is multiple images are taken of different emotions relative to the emotion being assigned using a camera and then trained on model to detect same using a clown or any facial emotion images.

But the ideal way would be to follow orthodontics measurement which I feel personally.

You can actually check how they have designed this, by clicking File ==> Open to the lesson you are currently having this doubt, which will have a utils.py file (metadata) which would provide you this information.

Regards
DP

Thank you. I have looked at all of the code, and even gotten the whole lesson running locally using the llama.cpp with the gguf at TheBloke/NexusRaven-V2-13B-GGUF · Hugging Face so I know I listed all the relevant facts being provided to the model above. I don’t get the feeling it’s doing anything as complex as what you describe. I’ve also tried asking for other emotions like surprise or shock, which I would expect to respond with a full circle, but it appears to continue to respond with a frown, so maybe it tries to depict any emotion other than “smiling” with the opposite inputs.

If the user really wanted to depict a few specific emotions, they could provide several examples, but the authors only provided the one example and that was enough for the llm to figure out what they were trying to get it to do with ‘sad’, which I find impressive.

I did also try adding another example:

    draw_clown_face(mouth_theta=(0, 360))

    # This will draw a simplified clown face with a surprised mouth.

and when asked for “draw me a surprised one” I got the (0, 360) mouth theta. I tried to think of the opposite of surprised and asked for an indifferent one, but in that case the model did not provide a value for mouth_theta, so it just drew the default. Maybe it doesn’t consider indifference an emotion?

Aaron

they seem to have used different llm models using gguf in hugging face

if I looked in details, it is more data from internet and comparing, not anthropometric which would be actually an ideal raw(real data) to create such clown face model.

For the surprise, it gave frown clown?, that’s probably again based on llm model it was created and probably needs some different approach

You stating that mouth theta couldn’t give the surprise clown expression to the degree you assigned, indicates a different approach to model, perhaps the one I mentioned related to smile arc approach to be fed with all angle detail between teeth, lip, lip corner, and distance of upper lip to tip of the nose. This would required lots of dental radiographs and images to be trained upon, but it would be fun to create one for sure.

do you have any information on how the mouth theta was programmed in the course?

All of the outputs from the local gguf act identical to the outputs in the course, so it seems to be essentially the same. They don’t say exactly which version of the NexusRaven model they are using. There are at least three versions with slight differences, but v2 seems to match what they are doing. The class is just on using an LLM to generate function calls, so I don’t think they are doing anything involving anthropometrics, although as you mention it wouldn’t be that hard to do and could be interesting, but probably more useful in image generation than demonstrating function calling.

Anyway, I’m assuming it has something to do with the fact that “smiling” and “sad” would be close to opposites in vector space. It just surprised me.

1 Like