Handbook of Face Recognition

Stan Z. Li and Anil K. Jain. 2011. Handbook of Face Recognition (2nd. ed.). Springer Publishing Company, Incorporated.

Chapter 1: Introduction

This book is aimed at students and practitioners of face recognition, focusing on advanced tutorials, surveys of methods, and a guide to current (2011) technology. Chapter 1 is focused on components of facial analysis, such as detection and tracking, and major technical challenges to building a system. Face recognition tasks are largely motivated by pragmatic and pratical applications, particularly targeting biometrics, security, authentication, and multimedia management. The authors discuss the advantages of face recognition over other bioemtric technologies as "natural, nonintrusive, and easy to use."

Face recognition operates in two modes: (1) face verification (authentication) or (2) face identification (recognition). Verification involves confirming a 1-to-1 match against a face image whose identity a person is claiming. Recognition involves a one-to-many match that queries against all faces in a database to determine who the face belongs to. Face recognition may also be one-to-few, like comparing a face against a pre-tailored list of potentials (e.g., suspects). The performance of facial analysis systems works best under constrained conditions - it is much less accurate in unconstrained conditions that have varied lighting, angles, expressions, occlusion, etc.

How it Works

A common pipeline for face recognition algorithms is as follows: image/video is inputted > a face is detected/tracked > face alignment processing > feature extraction on the aligned face > feature matching to a database of faces > face ID (or lack thereof).

Face alignment: "is aimed at achieving more accurate localization and at normalizing faces ... Facial components, such as eyes, nose, and mouth and facial outline, are located; based on the location points, the input face image is normalized with respect to geometrical properties, such as size and pose, using geometrical transforms or morphing. The face is usually further normalized with respect to photometrical properties such illumination and gray scale."

Feature extraction: "is performed to provide effective information that is useful for distinguishing between faces of different persons and stable with respect to the geometrical and photometrical variations."

Recognition results depend greatly on the features extracted from the face image and the classification methods used to distinguish faces. Face localization and normalization are the base of extracting the features.

Face Subspaces

Subspace analysis techniques are based on the notion that class patterns that a researcher is interested in locating (e.g., a face) reside in a subspace of the overall image space. Even a small image has a large number of pixels that can express many pattern classes (e.g., trees, houses, faces). Among these possible configurations, only a few will correspond to a face pattern. "Therefore, the original image representation is highly redundant, and the dimensionality of this representation could be greatly reduced when only the face pattern are of interest."

Eigenfaces/PCA: A small number (>40) of eigenfaces are derived from face training data by using PCA or Karhunen-Loeve transform. A single face image is represented as a feature vector (weights) of low dimensionality. "The features in such subspace provide more salient and richer information for recognition than the raw image. The use of subspace modeling techniques has significantly advanced face recognition technology." The face "manifold" accounts for the distribution of faces in an image while the nonface manifold represents everything else.

Technical Challenges

Large variability in facial appearance: This includes angle, illumination, and expression, as well as imaging parameters such as aperture, exposure time, and lens aberrations.

Highly Complex Nonlinear Manifolds: "In a linear subspace, Euclidean distance and more generally Mahalanobis distance, which are normally used for template matching, do not perform well for classifying between face and nonface manifolds and between manifolds of individuals ... This crucial fact limits the power of the linear methods to achieve highly accurate face detection and recognition."

High Dimensionality and Small Sample Size: "the number of examples per person (typically fewer than 10, even just one) available for learning the manifold is usually much smaller than the dimensionality of the image space; a system trained on so few examples may not generalize well to unseen instances of the face."

Technical Solutions

Two solutions to the technical problems faced above:
(1) feature extraction for constructing a "good" feature space where face manifolds become simpler, through two-steps of processing (normalizing faces geometrically and photometrically, and "extract features in the normalized images which are stable with respect to such variations, such as based on Gabor wavelets");
(2) "construct classification engines able to solve difficult nonlinear classification and regression problems in the feature space and to generalize better."

Chapter 2: Face Detection

The first step to face recognition. Ideal face detection should detect all faces in a photo regardless of size, angle, position, expression, or lighting. Face detection is performed based on cues: "skin color (for faces in color images and videos), motion (for faces in videos), facial/head shape, facial appearance, or a combination of these parameters." The basic steps are: "An input image is scanned at all possible locations and scales by a subwindow. Face detection is posed as classifying the pattern in the subwindow as either face or nonface. The face/nonface classifier is learned from face and nonface training examples using statistical learning methods." AdaBoost methods are most commonly used due to positive performance. Other methods include non-linear methods, using neural nets or kernel-based methods.

Appearance-Based Methods

This approach involves classifying each sub-window into two classes: face or no-face. The face/no-face classifier is taught, ideally, what a face looks like under a variety of lighting conditions.


Skin color filtering is used for skin color based segmentation, as the color distribution varies compared to non-human objects. "A simple color-based face detection algorithm consists of two steps: (1) segmentation of likely face regions and (2) region merging." However, skin color based detection tends to be less effective. "Although a color-based face detection system may be computationally attractive, the color constraint alone is insufficient for achieving high accuracy face detection. This is due to large facial color variation as a result of different lighting, shadow, and ethic groups. Indeed, it is the appearance, albeit colored or gray level, rather than the color that is most essential for face detection. Skin color is often combined with the motion cue to improve the reliability for face detection and tracking on video [49, 50]. However, the most successful face detection systems do not rely on color or motion information, yet achieve good performance."

Image normalization is necessary because appearance-based methods operate on windows of a fixed size. "After normalization, the distribution of subwindow images becomes more compact and standardized, which helps reduce the complexity of the subsequent face/nonface classification."


"A single face in an image may be detected several times at close locations or on multiple scales. False alarms may also occur but usually with less consistency than multiple face detections. The number of multiple detections in a neighborhood of a location can be used as an effective indication for the existence of a face at that location. This assumption leads to a heuristic for resolving the ambiguity caused by multiple detections and eliminating many false detections. A detection is confirmed if the number of multiple detections is greater than a given value; and given the confirmation, multiple detections are merged into a consistent one."

Performance Evals

Two types of data are recommended: (1) face icons of a fixed size, to evaluate the performance of the face/no-face classifier without being affected by merging; and (2) normal images, to test the overall system including the merging. "The face detection performance is primarily measured by two rates: the correct detection rate (which is 1 minus the miss detection rate) and the false alarm rate."

Chapter 14: Evaluation Methods in Face Recognition

"Performance is reported on three standard tasks: verification (is this person who they claim to be?) and open-set (do we know this face? is the image a person in the gallery?) and closed-set identification (whose face is this?)." Each task has its own performance metrics.

Open-Set Performance

Performance statistics used: The detection and identification rate and the false alarm rate. The detection and identification rate is the fraction of probes that are correctly detected and identified (with threshold T). The false alarm rate is the fraction of probes that are false alarms.

"The ideal system would have a detection and identification rate of 1.0 and a false alarm rate of 0; all people in the gallery are detected and identified, and there are no false alarms. However, in real-world systems there is a trade-off between the detection and identification, and false alarm rates. By changing the operating threshold, the performance rates change. Increasing an operating threshold lowers both the false alarm rate and the detection and identification rate. Both these performance rates cannot be maximized simultaneously; there is a trade-off between them."

Verification Performance

The system compares the facial image to those in the database. "he comparison produces a similarity score. The system accepts the identity claim if the similarity score is greater than the system's operating threshold. The operational threshold is determined by the application, and different applications have different operational thresholds. Otherwise, the system rejects the claim."

Two standard performance measures: (1) round robin; and (2) true imposter.

Round Robin: "All scores between gallery and probe set samples are computed. All match scores between a gallery and a probe set are used to compute the verification rate, and all nonmatch scores are used to compute the false accept rate ... One complaint with the round robin protocol is that probes are used to generate both verification and false accept rates. There is a concern that this does not adequately model the situation where false identity claims are generated by people not in the gallery."

True Imposter: "In the true imposter protocol, performance is computed from two probe sets, PG and PN. The verification rate is computed from the match scores between a gallery and PN. The number of match scores is the size of PG. The false alarm rate is computed from all nonmatch scores between the gallery and PN. These nonmatch scores are called true imposters because people in PN are not in the gallery."

Closed-Set Performance

"Performance on the closed-set identification task is the classic performance statistic in face recognition. With closed-set identification, the question is not always "Is the top match correct?" but rather "Is the correct answer in the top n matches?"."


The variability of performance dependent on the gallery and dependent on difference classes of probes (e.g., races).