publications
$^\star$ equal contribution
2025
2024
- Prototype-Guided Attention Distillation for Discriminative Person SearchIEEE transactions on pattern analysis and machine intelligence (TPAMI), 2024
- Discriminative action tubelet detector for weakly-supervised action detectionPattern Recognition, 2024
2023
- Robust camera pose refinement for multi-resolution hash encodingIn International Conference on Machine Learning (ICML), 2023
-
-
- Dual-path adaptation from image to video transformersIn IEEE/CVF International Conference on Computer Vision Pattern Recognition (CVPR), 2023
- Language-free training for zero-shot video groundingIn Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023
2022
- Pointfix: Learning to fix domain bias for robust online stereo adaptationIn European Conference on Computer Vision (ECCV), 2022
- Multi-domain unsupervised image-to-image translation with appearance adaptive convolutionIn IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
- Probabilistic representations for video contrastive learningIn IEEE/CVF International Conference on Computer Vision Pattern Recognition (CVPR), 2022
2021
- Wide and Narrow: Video Prediction from Context and MotionIn British Machine Vision Conference (BMVC), 2021
- Self-balanced learning for domain generalizationIn IEEE international conference on image processing (ICIP), 2021
- Bridge to answer: Structure-aware graph interaction network for video question answeringIn IEEE/CVF International Conference on Computer Vision Pattern Recognition (CVPR), 2021
- Looking into your speech: Learning cross-modal affinity for audio-visual speech separationIn IEEE/CVF International Conference on Computer Vision Pattern Recognition (CVPR), 2021
2020
- Multi-modal recurrent attention networks for facial expression recognitionIEEE Transactions on Image Processing, 2020
- Sumgraph: Video summarization via recursive graph modelingIn European Conference on Computer Vision (ECCV), 2020
2019
- Graph regularization network with semantic affinity for weakly-supervised temporal action localizationIn IEEE International conference on image processing (ICIP), 2019
2018
- Audio-visual attention networks for emotion recognitionIn Proceedings of the 2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia, 2018
- Learning to detect, associate, and recognize human actions and surrounding scenes in untrimmed videosIn Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild, 2018
- Spatiotemporal attention based deep neural networks for emotion recognitionIn IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018
2017
- Automatic 2d-to-3d conversion using multi-scale deep neural networkIn IEEE International Conference on Image Processing (ICIP), 2017