publications

$^\star$ equal contribution

2025

  1. Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video Representations
    Jungin ParkJiyoung Lee*, and Kwanghoon Sohn*
    In IEEE/CVF International Conference on Computer Vision Pattern Recognition (CVPR), 2025
  2. Read, watch and scream! sound generation from text and video
    Yujin JeongYunji KimSanghyuk Chun, and Jiyoung Lee
    In AAAI Conference on Artificial Intelligence (AAAI), 2025

2024

  1. Prototype-Guided Attention Distillation for Discriminative Person Search
    Hanjae Kim, Jiyoung Lee, and Kwanghoon Sohn
    IEEE transactions on pattern analysis and machine intelligence (TPAMI), 2024
  2. Discriminative action tubelet detector for weakly-supervised action detection
    Jiyoung LeeSeungryong KimSunok Kim, and Kwanghoon Sohn
    Pattern Recognition, 2024
  3. Bridging Vision and Language Spaces with Assignment Prediction
    Jungin ParkJiyoung Lee*, and Kwanghoon Sohn*
    In International Conference on Learning Representations (ICLR), 2024
  4. Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation
    Junyoung Seo, Wooseok Jang, Min-Seop Kwak , Hyeonsu Kim, Jaehoon Ko , Junho Kim , Jin-Hwa Kim*Jiyoung Lee*, and Seungryong Kim*
    In International Conference on Learning Representations (ICLR), 2024

2023

  1. Robust camera pose refinement for multi-resolution hash encoding
    Hwan Heo , Taekyung Kim, Jiyoung Lee , Jaewon Lee , Soohyun Kim , Hyunwoo J Kim* , and Jin-Hwa Kim*
    In International Conference on Machine Learning (ICML), 2023
  2. Midms: Matching interleaved diffusion models for exemplar-based image translation
    Junyoung Seo , Gyuseong Lee, Seokju Cho, Jiyoung Lee, and Seungryong Kim
    In AAAI Conference on Artificial Intelligence (AAAI), 2023
  3. Imaginary voice: Face-styled diffusion model for text-to-speech
    Jiyoung Lee, Joon Son Chung, and Soo-Whan Chung
    In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
  4. Panoramic Image-to-Image Translation
    Soohyun Kim , Junho Kim , Taekyung Kim, Hwan Heo, Seungryong Kim*Jiyoung Lee* , and Jin-Hwa Kim*
    arXiv preprint arXiv:2304.04960, 2023
  5. Semi-parametric video-grounded text generation
    Sungdong Kim , Jin-Hwa Kim, Jiyoung Lee, and Minjoon Seo
    arXiv preprint arXiv:2301.11507, 2023
  6. Dense text-to-image generation with attention modulation
    Yunji KimJiyoung Lee , Jin-Hwa Kim, Jung-Woo Ha, and Jun-Yan Zhu
    In IEEE/CVF International Conference on Computer Vision (ICCV), 2023
  7. Hierarchical visual primitive experts for compositional zero-shot learning
    Hanjae Kim, Jiyoung Lee , Seongheon Park, and Kwanghoon Sohn
    In IEEE/CVF International Conference on Computer Vision (ICCV), 2023
  8. Three recipes for better 3d pseudo-gts of 3d human mesh estimation in the wild
    Gyeongsik Moon, Hongsuk Choi, Sanghyuk ChunJiyoung Lee, and Sangdoo Yun
    In IEEE/CVF International Conference on Computer Vision Pattern Recognition Workshops (CVPRW), 2023
  9. Dual-path adaptation from image to video transformers
    Jungin Park*Jiyoung Lee*, and Kwanghoon Sohn
    In IEEE/CVF International Conference on Computer Vision Pattern Recognition (CVPR), 2023
  10. Language-free training for zero-shot video grounding
    Dahye Kim, Jungin ParkJiyoung Lee , Seongheon Park, and Kwanghoon Sohn
    In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

2022

  1. Pointfix: Learning to fix domain bias for robust online stereo adaptation
    Kwonyoung Kim, Jungin ParkJiyoung Lee, Dongbo Min, and Kwanghoon Sohn
    In European Conference on Computer Vision (ECCV), 2022
  2. Causalcity: Complex simulations with agency for causal discovery and reasoning
    Daniel McDuff, Yale Song, Jiyoung Lee, Vibhav Vineet, Sai Vemprala, Nicholas Alexander Gyde, Hadi Salman, Shuang Ma, Kwanghoon Sohn, and Ashish Kapoor
    In Conference on Causal Learning and Reasoning, 2022
  3. Mutual information divergence: A unified metric for multimodal generative models
    Jin-Hwa Kim , Yunji KimJiyoung Lee, Kang Min Yoo , and Sang-Woo Lee
    In Advances in Neural Information Processing Systems (NeurIPS), 2022
  4. Multi-domain unsupervised image-to-image translation with appearance adaptive convolution
    Somi Jeong, Jiyoung Lee, and Kwanghoon Sohn
    In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
  5. Pin the memory: Learning to generalize semantic segmentation
    Jin KimJiyoung LeeJungin Park, Dongbo Min, and Kwanghoon Sohn
    In IEEE/CVF International Conference on Computer Vision Pattern Recognition (CVPR), 2022
  6. Probabilistic representations for video contrastive learning
    Jungin ParkJiyoung Lee , Ig-Jae Kim, and Kwanghoon Sohn
    In IEEE/CVF International Conference on Computer Vision Pattern Recognition (CVPR), 2022

2021

  1. Wide and Narrow: Video Prediction from Context and Motion
    Jaehoon Cho, Jiyoung Lee, Changjae Oh, Wonil Song, and Kwanghoon Sohn
    In British Machine Vision Conference (BMVC), 2021
  2. Self-balanced learning for domain generalization
    Jin KimJiyoung LeeJungin Park, Dongbo Min, and Kwanghoon Sohn
    In IEEE international conference on image processing (ICIP), 2021
  3. Bridge to answer: Structure-aware graph interaction network for video question answering
    Jungin ParkJiyoung Lee, and Kwanghoon Sohn
    In IEEE/CVF International Conference on Computer Vision Pattern Recognition (CVPR), 2021
  4. Looking into your speech: Learning cross-modal affinity for audio-visual speech separation
    Jiyoung Lee*, Soo-Whan Chung*Sunok Kim, Hong-Goo Kang*, and Kwanghoon Sohn*
    In IEEE/CVF International Conference on Computer Vision Pattern Recognition (CVPR), 2021

2020

  1. Multi-modal recurrent attention networks for facial expression recognition
    Jiyoung LeeSunok KimSeungryong Kim, and Kwanghoon Sohn
    IEEE Transactions on Image Processing, 2020
  2. Sumgraph: Video summarization via recursive graph modeling
    Jungin Park*Jiyoung Lee* , Ig-Jae Kim, and Kwanghoon Sohn
    In European Conference on Computer Vision (ECCV), 2020

2019

  1. Graph regularization network with semantic affinity for weakly-supervised temporal action localization
    Jungin ParkJiyoung Lee, Sangryul Jeon, Seungryong Kim, and Kwanghoon Sohn
    In IEEE International conference on image processing (ICIP), 2019
  2. Video summarization by learning relationships between action and scene
    Jungin ParkJiyoung Lee, Sangryul Jeon, and Kwanghoon Sohn
    In IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2019
  3. Context-aware emotion recognition networks
    In IEEE/CVF International Conference on Computer Vision (ICCV), 2019

2018

  1. Audio-visual attention networks for emotion recognition
    Jiyoung LeeSunok KimSeungryong Kim, and Kwanghoon Sohn
    In Proceedings of the 2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia, 2018
  2. Learning to detect, associate, and recognize human actions and surrounding scenes in untrimmed videos
    Jungin Park, Sangryul Jeon, Seungryong KimJiyoung LeeSunok Kim, and Kwanghoon Sohn
    In Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild, 2018
  3. Spatiotemporal attention based deep neural networks for emotion recognition
    Jiyoung LeeSunok Kim, Seungryong Kiim, and Kwanghoon Sohn
    In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018

2017

  1. Automatic 2d-to-3d conversion using multi-scale deep neural network
    Jiyoung Lee, Hyungjoo Jung , Youngjung Kim, and Kwanghoon Sohn
    In IEEE International Conference on Image Processing (ICIP), 2017