Talking Heads

This post will take a closer look at a couple of methods that I stumbled across while working on making a face generator.


The generator renders faces as 3D models and animates them, rotating the heads, moving eyes, mouth and eyebrows, and lets them speak.




Shaping a Head

While the goal was of course expressiveness above anything to do with realism, the shape of a head is still quite important. It has been said that we recognise faces mostly by looking at the shape of cheek-bones and chin, so of course, it will be necessary for our heads to have shapely cheek-bones and chins.

I render the face simply as an array of vertices with different Z-heights - from top to bottom of head - and in different X/Y-slices - rotating from nose, to back of head, to nose again. The general shape is that of a sphere, so the x- and y- coordinates of a given vertex is sin(Z) distance away from the middle of the plane.

First of all, the head itself is slightly more barrel than sphere-shaped. The full "arc" of the face is only 160°, so the top of the head is at 10°, and the bottom is at 170°. This shape can be seen at the flat backs of the heads:




The front, however, needs a face! This is shaped by applying four geometric perturbations to the general sphere-shape.


N O S E

For the face-slice that faces directly forward, a nose is added, by taking the height of the vertex and adding a value to it according to this graph:

eyes        <=======>       mouth 



'
C H I N

While the nose is only active on one slice, the chin is spread over all the slices that have positive X-coordinates, that is, facing forwards. The further the X-coordinate, the stronger the chin - so the chin is mostly at the very front of the face, like the nose, but also a bit to each side, too

The graph of the chin looks quite weird, but is necessary since its peak coincides with the point at which the "ball" of the face recedes. The peak is somewhere around where the mouth is located.






C H E E K

Next, the cheekbones are added with the same method, but instead of using the X-coordinate, they are tied to the absolute value of another sine function, one that is active on both sides of the nose, peaking at approximately 50° distance, while being zero at the nose itself.

This perturbation is added inwards instead of outwards. Compared to the chin, which has its peak at the height of the mouth, the cheek peaks midway between nose and mouth:




E A R S

Finally, the ears are added the same way that the nose is, with just one slice. Instead of being placed on the front of the face, they are placed 115° from the nose. Their shape is as follows:


Important to note, the rightmost elevated point is also controlled by a random factor, which determines if the ear should be round or more elongated. What is illustrated above is the average.


Each of these functions are given an arbitrary strength, which is random from person to person. Thus, the wide variability in shapes of heads arises.





Drawing Eyes

Again, this approach only really works because of the goal of expressive, not realistic eyes. Instead, eyes here are made as simply a circle and a triangle put together:
The lines where the circle of the eye and the triangle overlap are then removed to create one combined shape.

Apart from expressiveness, the goal was also to achieve variability. This approach allows variability in the shape of the eye allows in the distance from of the point from the centre of the eye, as well as the angle, both of which are quite easy to implement and create big differences in the perceived personality of the face.

The rest of the facial features are more or less static shapes, that only vary in size (width, height) and positioning (distance from symmetry-line, heigh). The nose is a bit more complicated, but going into detail on that will have to wait for another time.





Animating Faces

One can go about this problem in two ways - one is to think for a long time about what motions faces make and model them in intricate detail, and the other is to simply add random motions and let our perception guess at what caused each motion. For this project, I am going for the latter.

Despite the simplicity of the approach, there are a few things I should point out.


I use three rotational axes, and for each of them, there is a current value, a speed, and a goal. When the current value and the goal are different, the speed is increased in the correct direction. To avoid overshooting, the speed takes itself into consideration.

All the animations set the goals of one or several of the rotational axes to random values. Quite often, the three goals are offset by a small value, creating the lifelike motions of the faces. Less often, a large motion is started by setting one of the axes to a random value within a 90° realm, where the centre is looking straight ahead.

However, the Z-rotational axis has all applied motions halved. This is because the Z axis moves the head in a way that is rarely used in Western cultures, where we instead nod or shake our heads.

Another thing! The motion of the pupils seems quite lifelike, but really, the pupils just use the speed of the X and Y rotational axes. This means that the small, barely visible motions of the head, are accompanied by fitting eye tracking. Just like a human would do - we lead with our eyes into any motion to stabilize the visual field. Luckily, it is extremely simple to implement.

Apart from these motions, animations also exist to have the eyes blink and the eyebrows move up and down, as well as have the mouth change shape from smile to frown. The eyebrows are semi-independent - half the time, one moves without the other - creating a multitude of expressions.



Making Faces Talk

The last thing I had to do was to make the faces talk. When speaking, the mouth is replaced with an ellipse that shrinks and grows. I also added in a corpus of text I had lying around, so that they would actually say something.

The mix of random, life-like motions, and random, life-like stories, has an effect that adds up to be a thing on its own.





The music that plays is Herd's Chant, a piece I composed in Sonic Pi with this very installation in mind. It is cacophonous and uncomfortable to listen to on purpose, emulating the feeling of a hundred people talking at once. Or maybe I just suck at music.



PS

After it was requested, I have made the full code available for use. It is written in GML, so it is more as inspiration than anything else. If you can use it, it's yours! https://pastebin.com/fDnCjDch