Introduction:

Face Morphing Workflow
Figure 1: The workflow of our program.

In the final project, we have implemented a face morphing program. Our program first applies face detection algorithms to locate the face in the image and uses the active appearance model (AAM) [1] to find the facial expression. The AAM model outputs feature points which are common to the human faces and can be used as sparse correspondences between two images. After the correspondence has been established, we interpolate a dense warping field either with a triangular-mesh or by solving a Laplacian linear system. The warping field works very well in the face region but do not handle the parts outside of it. Thus, we further apply GraphCut to conceal the ghost effect in alpha-blending. To ease the usage, we have also designed a simple GUI for our program.

Algorithm implementation & details:

We implemented our program with Visual Studio .NET C++ 2005. Our implementation include three components: 1) pre-processing and face feature detection, 2) image warping and blending and 3) an easy-to-use GUI. The details of respective parts are following:

Pre-processing and face detection:

We use an open source AAM library [2] to work with the AAM model. This library is quite complete and provides model training and optimization functions. In brief, the AAM model captures the shape (position) and the intensity variations of the feature points on human faces. The distribution of variations are modeled by PCA in the training phase and stored as vectors. When being given a new face, the algorithm performs gradient descent from the mean vector along the computed directions. The stopping minima is then used to represent the face.

We train our AAM model using the dataset from [2]. The feature positions are represented by percentages relative to the image width and height. This limits the aspect ratio of input images we can use. Furthermore, the AAM models are not scale-invariant because of the intensity data. To apply the model on general images with different aspect ratios and sizes, we use the fast face detection function available in OpenCV to locate the face in an image. We then extract a region-of-interest of appropriate size and ratio around the face location and use it as the input for the AAM library. This pre-processing effectively handles the problem without complex manipulation. When the face detection fails, we assume the image is mainly a face and proceed the above method. The individual steps are summarized in Figure 2.

Figure 2: The steps of extracting facial expression from an image.

We have found the pre-processing method works very well and seldom fails. As the intial position is near the global minimum, we set a small search size (7) during optimization. The optimization typically takes 1 to 2 seconds to complete.

We also list the AAM failure modes here for reference:

1. Light-pictures: Light pictures have relatively low gradients and make the optimization difficult to locate features.
2. Glasses: Glasses produce specular lights and distort gradients. This could be partially handled by including faces with glasses into the training data.
3. Distorted faces: Face with exaggarated expressions, such like a open-mouth laugh will also get the optimization into trouble.
4. Sex: Females usually have long hairs, which make their chins obvious and easy to locate. On the contrary, males' chins are less distinguishable from the background.

Figure 3: AAM results on some images. Left to Right: Source image, AAM initial state, AAM optimized.

Image warping and blending:

The features from the AAM models provides good locations for sparse face landmarks. However, we still need a dense warping field to perform the face morphing effect. For this purpose, we have implemented two different algorithms. In the first one, we triangulate the pictures according to the feature points and use the obtained mesh to warp the image. We have chosen the Delaunay algorithm for triangularization. This algorithm has the property of maxmizing the minimum inner angle among all the triangles, which could soothe the possible tearing effects near the boundary of the triangles. The result of mesh warping can be seen in Figure 4. We warp the two input images according to the morphing ratio and apply alpha-blending to produce the composite.

male female
Figure 4: Left Two: Source images. Right: Mesh warping result.

In the second method, we try to compute an optimal warping field based on the smoothness assumption. The idea is that for a smooth warping field, every neighboring pixels in one image should also be warped to neighboring points in the other. When being given feature points with fixed targets, we could use the points as seeds to interpolate the full warping field. For each pixel in the image which is not a feature point, we add a discretized Laplacian constraint on the target position vectors of it and its 4 neighbors. This contraints together form a large sparse linear system which could be solved efficiently by sparse linear solvers, e.g. [3]. We show the comparison between mesh warping and Laplacian warping in Figure 5. In our experiments, we have found that both algorithms yield good results, while the mesh method is significantly faster. Thus, we adopt the first method as our default warping algorithm.

mesh_warping laplacian
Figure 5: Left: Mesh warping. Right: Laplacian.

Our warping and blending algorithm works very well in the face region but gives noticeable artifacts in the background area. This is because while human faces are mostly common in the composition, the backgrounds can vary very greatly from one image to another. To deal with this problem, we try to apply the recently popular MRF optimization algorithms [4] to select appropriate parts from different images to form a visually plausible composite. The idea of applying MRF methods to stitch different images for improved visual experience is firstly proposed in [5]. Here, we are facing a different scenario in that no user inputs are available.

For each pixel in the output image, the MRF optimization tries to give it a label which indicates the image its value has to be taken from. Obviously, if one pixel is in the face region, we will want it to take the value from the blended image (B). To ensure this, we compute a convex hull of the AAM features. All the pixels in the convex hull are then allowed to take the label B only. For pixels outside of the region, we bias the choice between the two warped images (W0, W1) according to the morphing ratio and give a halfway preference to the blended image B. The above description constitutes our data term of the MRF formulation.

The smoothness term reflects our definition of a visually pleasing image. We would like our output to be as smooth as possible, and so we penalize labeling differences by the resulting image gradients. In practice, we also find if we slighty penalize the image gradients of the same label as well, better results could be achieved. We summarize our MRF formulation in Figure 6. The parameters we used in the experiments are beta = 5.0, alpha b = 0.17 and alpha w = 0.14. They were determined by hand-tuning on several testcases.

mesh_warping MRF_indexmap
Figure 6: Our MRF formulation.
Figure 7: The output MRF label map on the sample image pair.

We optimize the above energy by the alpha-expansion from a free library [6]. The output labeling on the sample image pair is illustrated in Figure . We compare the background montage result with those by direct alpha-blending and mesh-warping in Figure 8. It can be seen from the picture that our algorithm does a good job in reducing the ghost effects and outputs a satisfactory composite. We have found that our algorithm works well as long as the AAM model successfully converges.

alpha_blending mesh_warping bg_montage
Figure 8: Left: Direct alpha-blending. Middle: Mesh warping. Right: Background montage.

User interface:

We design our GUI with Visual Studio .NET C++ 2005. Our UI can let the user easily load images, select the morphing ratio, do face morphing and save the result. It also supports advanced features such like adding and modifying feature points. We illustrate the functions with a workflow on the sample image pair.

Load images and add AAM feature point files (*.asf):

View images:
Double-click on the loaded images to view them in their original size.

Morphing:
Drag the trackbar to select the morphing ratio. Drag it leftward if you want the output to look more like the left image and vice versa. Check the "Clear Background" checkbox to turn on the MRF post-processing. Click the "Morph" button to generate the result. The below figure shows the background cleared result with morphing ratio = 45%.

Add/Modify the feature points:
After the feature point file is loaded, the image display window will now show the feature points on the image. The user can drag the point position by left-clicking on the point. Right-click on the image to add a new point. The below figure shows a screenshot when the user adds feature points.

Save the result:

Results & Discussion:

Here we show some face morphing results automatically generated by our program. The morphing ratio is set to 50%. In each dataset, the top two pics are the source images and the left-bottom/right-bottom is the result without/with background montage respectively. The background montage clears most notable artifacts, but may still retain some seams in the forehead region. This could be improved either by using gradient-domain image blending techniques as in [5] or by designing a better MRF energy.

result1

Result2

Acknowledgements:

<>

References:

[1] T. F. Cootes, G. J. Edwards and C. J. Taylor, "Active Appearance Model", ECCV 1998.
[2] AAM-API, http://www2.imm.dtu.dk/~aam/ .
[3] TAUCS, http://www.tau.ac.il/~stoledo/taucs/.
[4] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen and C. Rother, "A Comparative Study of Energy Minimization Methods for Markov Random Fields", ECCV 2006.
[5] A. Agarwala, M. Dontcheva, M. Agrawala, S. Drucker, A. Colburn, B. Curless, D. Salesin and M. Cohen, "Interactive Digital Photomontage", SIGGRAPH 2004.
[6] Middlebury MRF Library, http://vision.middlebury.edu/MRF/.