MatAnyone

Stable Video Matting with Consistent Memory Propagation

1S-Lab, Nanyang Technological University, 2SenseTime Research, Singapore

MatAnyone is a practical human video matting framework supporting target assignment, with stable performance in both semantics of core regions and fine-grained boundary details.

Abstract

Auxiliary-free human video matting methods, which rely solely on input frames, often struggle with complex or ambiguous backgrounds. To address this, we propose MatAnyone, a robust framework tailored for target-assigned video matting. Specifically, building on a memory-based paradigm, we introduce a consistent memory propagation module via region-adaptive memory fusion, which adaptively integrates memory from the previous frame. This ensures semantic stability in core regions while preserving fine-grained details along object boundaries. For robust training, we present a larger, high-quality, and diverse dataset for video matting. Additionally, we incorporate a novel training strategy that efficiently leverages large-scale segmentation data, boosting matting stability. With this new network design, dataset, and training strategy, MatAnyone delivers robust and accurate video matting results in diverse real-world scenarios, outperforming existing methods.

Method

MatAnyone is a memory-based framework for video matting. Given a target segmentation map in the first frame, our model achieves stable and high-quality matting through consistent memory propagation, with a region-adaptive memory fusion module to combine information from the previous and current frame. To overcome the scarcity of real video matting data, we incorporate a new training strategy that effectively leverages matting data for fine-grained matting details and segmentation data for semantic stability, with designed losses separately.

MatAnyone Demo

Instance/Interactive Matting Examples

The assignment of target object at the first frame gives us flexibility for instance/interactive video matting. Thanks to the success of promptable segmentation methods, the target object could be easily assigned with a few clicks (segmentation mask annotated in the figure). MatAnyone demonstrates superior performance in instance video matting, particularly in maintaining object tracking stability and preserving fine-grained details of alpha mattes. (Click full screen for best view)


  • Slide 1
  • Slide 2
  • Slide 3
Input
Composition
Alpha Matte

Recurrent Refinement

Given the first-frame segmentation mask, the first-frame alpha matte is predicted based on that, and impacts performance in subsequent frames. The sequential prediction in the memory-based paradigm enables recurrent refinement without the need for retraining during inference. This (1) enhances robustness to the given segmentation mask and (2) refines matting details to achieve image-matting-level quality.

Interpolate start reference image.

Input

Iteratively Refined Alpha Matte (from seg mask)

BibTeX

@InProceedings{yang2025matanyone,
      title     = {{MatAnyone}: Stable Video Matting with Consistent Memory Propagation},
      author    = {Yang, Peiqing and Zhou, Shangchen and Zhao, Jixin and Tao, Qingyi and Loy, Chen Change},
      booktitle = {arXiv preprint arXiv:2501.14677},
      year      = {2025}
    }