# Attribute-specific Control Units in StyleGAN for Fine-grained Image Manipulation

Rui Wang1   Jian Chen2   Gang Yu2   Li Sun3   Changqian Yu1   Changxin Gao1*   Nong Sang1
1Key Laboratory of Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China   2Tencent, Shanghai, China   3East China Normal University, Shanghai, China

## Abstract

Image manipulation with StyleGAN has been an increasing concern in recent years. Recent works have achieved tremendous success in analyzing several semantic latent spaces to edit the attributes of the generated images. However, due to the limited semantic and spatial manipulation precision in these latent spaces, the existing endeavors are defeated in fine-grained StyleGAN image manipulation, i.e., local attribute translation. To address this issue, we discover attribute-specific control units, which consist of multiple channels of feature maps and modulation styles. Specifically, we collaboratively manipulate the modulation style channels and feature maps in control units rather than individual ones to obtain the semantic and spatial disentangled controls. Furthermore, we propose a simple yet effective method to detect the attribute-specific control units. We move the modulation style along a specific sparse direction vector and replace the filter-wise styles used to compute the feature maps to manipulate these control units. We evaluate our proposed method in various face attribute manipulation tasks. Extensive qualitative and quantitative results demonstrate that our proposed method performs favorably against the state-of-the-art methods. The manipulation results of real images further show the effectiveness of our method.

## Attribute-Specific Control Units

Feature maps in StyleGAN2 Generator activate consistently in different semantic regions across various generated images:

We divided each channel of the intermediate features into different region-specific groups based on the spatial location of the top activated region of the feature map with a simple yet effective gradient-based strategy.

So we can modify the modulation styles of the convolutional layer by moving along the sparse direction vector. More specifically, we use a portion of the difference between the positive and negative sample latent code as the editing direction vector.

However, the results manipulated by these sparse direction vectors still suffer from the insufficient change or entanglement issue:

The editing results are strongly correlated with the spatial distribution of the feature maps. We should collaboratively manipulate the modulation styles and feature maps rather than individual ones to obtain the fine-grained controls.

The specific semantic region’s attribute is controlled by a few channels of intermediate feature and its corresponding modulation styles, which are represented as control units.

## Pipeline

Visualization of a typical attribute manipulation pipeline:

Our modification consists of a optimized styles $\hat{S}^{l-1}$ and a direction vector $\Delta{S}^l$. A few channels of $F^l$ are replaced by $F^{l}_{U_a}$ computed with $\hat{S}^{l-1}$, while other channels of $F^l$ keep untouched. The original modulation style $S^l$ and $\Delta{S}^l$ are summed to form the new modulation style.

## Citation

@inproceedings{10.1145/3474085.3475274,
author = {Wang, Rui and Chen, Jian and Yu, Gang and Sun, Li and Yu, Changqian and Gao, Changxin and Sang, Nong},
title = {Attribute-Specific Control Units in StyleGAN for Fine-Grained Image Manipulation},
year = {2021},
isbn = {9781450386517},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3474085.3475274},
doi = {10.1145/3474085.3475274},
booktitle = {Proceedings of the 29th ACM International Conference on Multimedia},
pages = {926–934},
numpages = {9},
keywords = {generative adversarial networks(GANs), control unit, image manipulation},
location = {Virtual Event, China},
series = {MM '21}
}

## Acknowledgments

This work is supported by National Natural Science Foundation of China (No. 61876210), Science and Technology Commission of Shanghai Municipality (No. 19511120800).

Rui Wang