Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
[Accepted by TASLP] We introduce a novel Self-Supervised Learning model for audio processing, named ATST-Frame. Thorough experiments on clip/frame-level downstream tasks are implemented. SOTA performances are obtained on most task.
Authors
Xian Li; Nian Shao; Xiaofei Li