Towards High-Fidelity Face Swapping:
A Comprehensive Survey and New Benchmark

A unified taxonomy, a high-quality face swapping benchmark, and extensive evaluation under normal, cross-ethnicity, and cross-attribute protocols.

Qi Li, Weining Wang, Shuangjun Du, Bo Peng, Jing Dong, Kun Wang, Zhenan Sun, and Ming-Hsuan Yang
Institute of Automation, Chinese Academy of Sciences · Nanyang Technological University · University of California, Merced

Overview

Face swapping aims to transfer the identity of a source face onto a target while preserving target-specific attributes such as pose, expression, illumination, and background. Existing evaluations are often fragmented across different datasets, protocols, and implementation details, making it difficult to fairly compare different methods.

This project page accompanies our paper and provides additional materials beyond the manuscript, including dataset examples, benchmark protocols, quantitative summaries, and more qualitative comparisons across representative face swapping methods.

1,291subjects
2,582videos
2.83Mframes
4Kraw video resolution

Survey Taxonomy

We organize face swapping methods into five major paradigms according to their design principles and representation choices.

Evolutionary path of high-fidelity face swapping technology.

Evolutionary path of high-fidelity face swapping methods.

CASIA FaceSwapping Dataset

CASIA FaceSwapping is designed specifically for controlled and fine-grained face swapping evaluation. It contains balanced demographic distributions and explicit attribute variations, enabling systematic analysis of identity preservation, attribute consistency, visual fidelity, and robustness.

Subjects1,291 identities
Videos2,582 videos
Raw resolution2160 × 3840
DemographicsAsian, African, and Caucasian
VariationsNormal, pose, expression, and illumination
Aligned imagesUniformly sampled and aligned face images for benchmark evaluation
Dataset examples showing ethnicity, pose, illumination, and expression variations.

Dataset examples showing ethnicity, pose, illumination, and expression variations.

Evaluation Protocols

We establish three standardized protocols to isolate different factors and provide interpretable evaluation. These protocols are designed to evaluate standard performance, demographic generalization, and robustness to dynamic attribute variations.

Protocol 1: Normal

Same ethnicity · normal recordings

Measures baseline face swapping performance under relatively controlled conditions.

4,500 pairs baseline same ethnicity

Protocol 2: Cross-ethnicity

Different ethnicities · normal recordings

Tests whether a method can generalize across demographic groups without identity leakage or appearance bias.

1,200 pairs fairness generalization

Protocol 3: Cross-attribute

Same ethnicity · pose / expression / illumination shifts

Evaluates robustness when the target contains challenging attribute variations.

4,300 pairs robustness attribute shifts

Protocol Summary

Protocol Ethnicity Attribute Setting Pair Count Evaluation Focus
Normal Same Normal 4,500 Identity transfer and target attribute preservation under standard conditions
Cross-ethnicity Different Normal 1,200 Demographic generalization and potential ethnicity-related bias
Cross-attribute Same Different attributes 4,300 Robustness to pose, expression, and illumination variations
Evaluation protocol overview.

Protocol visualization showing the three evaluation settings.

Evaluation Metrics

We evaluate face swapping methods from complementary perspectives, including identity preservation, target attribute consistency, image realism, and temporal stability.

Identity

ID Retrieval and ID Similarity measure whether the generated face preserves the source identity.

Attributes

Pose Error and Expression Error evaluate whether the generated result preserves the target pose and expression.

Realism & Stability

FID measures visual realism, while temporal consistency metrics evaluate video-level stability.

Radar chart comparing face swapping methods across metrics.

Radar chart summarizing identity preservation, pose/expression consistency, and FID under the three protocols.

Benchmark Results

Quantitative evaluation of 14 face swapping methods across three protocols. Identity preservation is measured by ID retrieval and ID similarity, while pose error, expression error, and FID reflect attribute preservation and generation quality.

Method Protocol ID Retrieval ↑ ID Similarity ↑ Pose Error ↓ Expr. Error ↓ FID ↓
HifiFaceNormal93.37%0.623.593.1220.40
Cross-ethnicity93.23%0.603.663.2921.73
Cross-attribute83.57%0.574.123.149.93
FSGANNormal65.08%0.503.342.3556.23
Cross-ethnicity57.82%0.443.442.5058.06
Cross-attribute43.74%0.404.082.3740.83
FaceshifterNormal66.41%0.445.263.64169.11
Cross-ethnicity66.36%0.435.323.78172.31
Cross-attribute56.09%0.406.383.67151.61
BlendFaceNormal73.35%0.483.283.0893.20
Cross-ethnicity70.60%0.453.373.1994.51
Cross-attribute64.83%0.443.933.0778.67
FaceDancerNormal72.81%0.493.423.1519.14
Cross-ethnicity78.74%0.503.723.5622.33
Cross-attribute62.49%0.463.953.146.32
SimSwapNormal90.00%0.612.142.4321.75
Cross-ethnicity90.74%0.582.212.6324.01
Cross-attribute81.50%0.552.432.427.86
CSCSNormal88.75%0.633.813.4133.28
Cross-ethnicity96.92%0.654.113.7236.17
Cross-attribute87.54%0.604.473.4421.23
InsightFaceNormal96.92%0.732.842.6430.50
Cross-ethnicity97.19%0.712.972.8732.32
Cross-attribute95.14%0.673.222.6215.86
MegaFSNormal73.70%0.505.092.9623.69
Cross-ethnicity72.72%0.495.093.1525.93
Cross-attribute55.82%0.445.983.0218.24
FSLSDNormal15.52%0.255.633.4428.64
Cross-ethnicity13.95%0.235.623.5830.47
Cross-attribute11.95%0.237.243.5423.24
RAFSwapNormal87.77%0.543.693.2845.61
Cross-ethnicity86.00%0.513.743.4647.47
Cross-attribute72.40%0.484.803.3131.37
RGISwapNormal80.84%0.534.003.4118.77
Cross-ethnicity80.92%0.524.033.5821.28
Cross-attribute62.96%0.464.903.5413.94
DiffSwapNormal15.64%0.323.672.8896.55
Cross-ethnicity13.70%0.273.742.9797.52
Cross-attribute14.38%0.304.142.8987.11
FaceAdapterNormal95.49%0.664.382.9523.83
Cross-ethnicity94.74%0.664.833.2226.51
Cross-attribute88.81%0.615.052.9614.71

More Qualitative Results

We provide additional qualitative comparisons across the three protocols. Each row contains the source, target, and swapped outputs from representative methods, making it easier to inspect identity preservation, expression consistency, illumination adaptation, boundary artifacts, and failure modes.

Citation

BibTeX entry will be updated upon publication.

@article{li2026highfidelityfaceswapping,
  title   = {Towards High-Fidelity Face Swapping: A Comprehensive Survey and New Benchmark},
  author  = {Li, Qi and Wang, Weining and Du, Shuangjun and Peng, Bo and Dong, Jing and Wang, Kun and Sun, Zhenan and Yang, Ming-Hsuan},
  journal = {Pending},
  year    = {2026}
}