Vggsound: A large-scale audio-visual dataset H Chen, W Xie, A Vedaldi, A Zisserman ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 567 | 2020 |
Localizing Visual Sounds the Hard Way H Chen, W Xie, T Afouras, A Nagrani, A Vedaldi, A Zisserman Conference on Computer Vision and Pattern Recognition (CVPR), 2021, 2021 | 208 | 2021 |
Auto-avsr: Audio-visual speech recognition with automatic labels P Ma, A Haliassos, A Fernandez-Lopez, H Chen, S Petridis, M Pantic ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 101 | 2023 |
Audio-Visual Synchronisation in the Wild H Chen, W Xie, T Afouras, A Nagrani, A Vedaldi, A Zisserman British Machine Vision Conference (BMVC), 2021, 2021 | 46 | 2021 |
Synthvsr: Scaling up visual speech recognition with synthetic supervision X Liu, E Lakomkin, K Vougioukas, P Ma, H Chen, R Xie, M Doulaty, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 20 | 2023 |
AutoCorrect: Deep Inductive Alignment of Noisy Geometric Annotations H Chen, W Xie, A Vedaldi, A Zisserman British Machine Vision Conference (BMVC), 2019, 2019 | 18 | 2019 |
SparseVSR: Lightweight and noise robust visual speech recognition A Fernandez-Lopez, H Chen, P Ma, A Haliassos, S Petridis, M Pantic arXiv preprint arXiv:2307.04552, 2023 | 5 | 2023 |
The CHiME-8 MMCSG Challenge: Multi-modal conversations in smart glasses K Zmolikova, S Merello, K Kalgaonkar, J Lin, N Moritz, P Ma, M Sun, ... | 3 | 2024 |
Localizing visual sounds the hard way A Vedaldi, H Chen, W Xie, T Afouras, A Nagrani, A Zisserman Institute of Electrical and Electronics Engineers, 2021 | 2 | 2021 |
Large Language Models Are Strong Audio-Visual Speech Recognition Learners U Cappellazzo, M Kim, H Chen, P Ma, S Petridis, D Falavigna, A Brutti, ... arXiv preprint arXiv:2409.12319, 2024 | 1 | 2024 |
RT-LA-VocE: Real-Time Low-SNR Audio-Visual Speech Enhancement H Chen, R Mira, S Petridis, M Pantic arXiv preprint arXiv:2407.07825, 2024 | 1 | 2024 |
MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization A Fernandez-Lopez, H Chen, P Ma, L Yin, Q Xiao, S Petridis, S Liu, ... arXiv preprint arXiv:2406.17614, 2024 | 1 | 2024 |
The CHiME-8 MMCSG Challenge: Multi-modal conversations in smart glasses K Žmolíková, S Merello, K Kalgaonkar, J Lin, N Moritz, P Ma, M Sun, ... Proc. CHiME 2024, 7-12, 2024 | 1 | 2024 |
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs A Haliassos, R Mira, H Chen, Z Landgraf, S Petridis, M Pantic arXiv preprint arXiv:2411.02256, 2024 | | 2024 |
Learning with multimodal self-supervision H Chen University of Oxford, 2021 | | 2021 |
Supplementary Material: SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision X Liu, E Lakomkin, K Vougioukas, P Ma, H Chen, R Xie, M Doulaty, ... | | |
SparseVSR: Lightweight and Noise Robust Visual Speech Recognition–Extended Abstract A Fernandez-Lopez, H Chen, P Ma, A Haliassos, S Petridis, M Pantic | | |