Publications
$^\star$ equal contribution, $^\dagger$ corresponding author(s)
Conference: 26, Journal: 4, Workshop: 6, Preprint: 5
2026
- [C28] Erasing Your Voice Before It’s Heard: Training-Free Speaker Unlearning For Zero-Shot Text-To-SpeechIn IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2026
2025
- [J4] Language-guided Recursive Spatiotemporal Graph Modeling for Video SummarizationInternational Journal of Computer Vision (IJCV), 2025
- [P3] Descriptive Image-Text Matching with Graded Contextual SimilarityarXiv preprint arXiv:2505.09997, 2025
2024
- [J3] Prototype-Guided Attention Distillation for Discriminative Person SearchIEEE transactions on pattern analysis and machine intelligence (TPAMI), 2024
- [J2] Discriminative action tubelet detector for weakly-supervised action detectionPattern Recognition, 2024
2023
- [C22] Robust camera pose refinement for multi-resolution hash encodingIn International Conference on Machine Learning (ICML), 2023
-
-
- [C17] Dual-path adaptation from image to video transformersIn IEEE/CVF International Conference on Computer Vision Pattern Recognition (CVPR), 2023
- [C16] Language-free training for zero-shot video groundingIn Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023
2022
- [C15] Pointfix: Learning to fix domain bias for robust online stereo adaptationIn European Conference on Computer Vision (ECCV), 2022
- [C12] Multi-domain unsupervised image-to-image translation with appearance adaptive convolutionIn IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
- [C10] Probabilistic representations for video contrastive learningIn IEEE/CVF International Conference on Computer Vision Pattern Recognition (CVPR), 2022
2021
- [C9] Wide and Narrow: Video Prediction from Context and MotionIn British Machine Vision Conference (BMVC), 2021
- [C8] Self-balanced learning for domain generalizationIn IEEE international conference on image processing (ICIP), 2021
- [C7] Bridge to answer: Structure-aware graph interaction network for video question answeringIn IEEE/CVF International Conference on Computer Vision Pattern Recognition (CVPR), 2021
- [C6] Looking into your speech: Learning cross-modal affinity for audio-visual speech separationIn IEEE/CVF International Conference on Computer Vision Pattern Recognition (CVPR), 2021
2020
- [J1] Multi-modal recurrent attention networks for facial expression recognitionIEEE Transactions on Image Processing, 2020
- [C5] Sumgraph: Video summarization via recursive graph modelingIn European Conference on Computer Vision (ECCV), 2020
2019
- [C4] Graph regularization network with semantic affinity for weakly-supervised temporal action localizationIn IEEE International conference on image processing (ICIP), 2019
2018
- [W2] Audio-visual attention networks for emotion recognitionIn Proceedings of the 2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia, 2018
- [W1] Learning to detect, associate, and recognize human actions and surrounding scenes in untrimmed videosIn Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild, 2018
- [C2] Spatiotemporal attention based deep neural networks for emotion recognitionIn IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018
2017
- [C1] Automatic 2d-to-3d conversion using multi-scale deep neural networkIn IEEE International Conference on Image Processing (ICIP), 2017