Skip to main content

Showing 1–50 of 70 results for author: Oh, T

  1. arXiv:2407.13676  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment

    Authors: Arda Senocak, Hyeonggon Ryu, Junsik Kim, Tae-Hyun Oh, Hanspeter Pfister, Joon Son Chung

    Abstract: Recent studies on learning-based sound source localization have mainly focused on the localization performance perspective. However, prior work and existing benchmarks overlook a crucial aspect: cross-modal interaction, which is essential for interactive sound source localization. Cross-modal interaction is vital for understanding semantically matched or mismatched audio-visual events, such as sil… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Journal Extension of ICCV 2023 paper (arXiV:2309.10724). Code is available at https://github.com/kaistmm/SSLalignment

  2. arXiv:2407.13442  [pdf, other

    cs.CV cs.CL

    BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models

    Authors: Moon Ye-Bin, Nam Hyeon-Woo, Wonseok Choi, Tae-Hyun Oh

    Abstract: Vision language models (VLMs) perceive the world through a combination of a visual encoder and a large language model (LLM). The visual encoder, pre-trained on large-scale vision-text datasets, provides zero-shot generalization to visual data, and the LLM endows its high reasoning ability to VLMs. It leads VLMs to achieve high performance on wide benchmarks without fine-tuning, exhibiting zero or… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024. [Project Pages] https://beafbench.github.io/

  3. arXiv:2407.01034  [pdf, other

    cs.CV cs.GR

    Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert

    Authors: Han EunGi, Oh Hyun-Bin, Kim Sung-Bin, Corentin Nivelet Etcheberry, Suekyeong Nam, Janghoon Joo, Tae-Hyun Oh

    Abstract: Speech-driven 3D facial animation has recently garnered attention due to its cost-effective usability in multimedia production. However, most current advances overlook the intelligibility of lip movements, limiting the realism of facial expressions. In this paper, we introduce a method for speech-driven 3D facial animation to generate accurate lip movements, proposing an audio-visual multimodal pe… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: INTERSPEECH 2024

  4. arXiv:2406.14272  [pdf, other

    cs.CV cs.GR

    MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset

    Authors: Kim Sung-Bin, Lee Chae-Yeon, Gihun Son, Oh Hyun-Bin, Janghoon Ju, Suekyeong Nam, Tae-Hyun Oh

    Abstract: Recent studies in speech-driven 3D talking head generation have achieved convincing results in verbal articulations. However, generating accurate lip-syncs degrades when applied to input speech in other languages, possibly due to the lack of datasets covering a broad spectrum of facial movements across languages. In this work, we introduce a novel task to generate 3D talking heads from speeches of… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  5. arXiv:2406.04867  [pdf, other

    cs.LG cs.AI cs.CV

    Deep learning for precipitation nowcasting: A survey from the perspective of time series forecasting

    Authors: Sojung An, Tae-Jin Oh, Eunha Sohn, Donghyun Kim

    Abstract: Deep learning-based time series forecasting has dominated the short-term precipitation forecasting field with the help of its ability to estimate motion flow in high-resolution datasets. The growing interest in precipitation nowcasting offers substantial opportunities for the advancement of current forecasting technologies. Nevertheless, there has been a scarcity of in-depth surveys of time series… ▽ More

    Submitted 13 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: 21 pages, 7 figures, 5 tables

  6. arXiv:2404.00285  [pdf, other

    cs.CV cs.AI

    Long-Tailed Recognition on Binary Networks by Calibrating A Pre-trained Model

    Authors: Jihun Kim, Dahyun Kim, Hyungrok Jung, Taeil Oh, Jonghyun Choi

    Abstract: Deploying deep models in real-world scenarios entails a number of challenges, including computational efficiency and real-world (e.g., long-tailed) data distributions. We address the combined challenge of learning long-tailed distributions using highly resource-efficient binary neural networks as backbones. Specifically, we propose a calibrate-and-distill framework that uses off-the-shelf pretrain… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  7. arXiv:2403.14963  [pdf, other

    cs.CR

    Enabling Physical Localization of Uncooperative Cellular Devices

    Authors: Taekkyung Oh, Sangwook Bae, Junho Ahn, Yonghwa Lee, Dinh-Tuan Hoang, Min Suk Kang, Nils Ole Tippenhauer, Yongdae Kim

    Abstract: In cellular networks, it can become necessary for authorities to physically locate user devices for tracking criminals or illegal devices. While cellular operators can provide authorities with cell information the device is camping on, fine-grained localization is still required. Therefore, the authorized agents trace the device by monitoring its uplink signals. However, tracking the uplink signal… ▽ More

    Submitted 25 March, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  8. arXiv:2403.14539  [pdf, other

    cs.CV cs.AI cs.LG

    Object-Centric Domain Randomization for 3D Shape Reconstruction in the Wild

    Authors: Junhyeong Cho, Kim Youwang, Hunmin Yang, Tae-Hyun Oh

    Abstract: One of the biggest challenges in single-view 3D shape reconstruction in the wild is the scarcity of <3D shape, 2D image>-paired data from real-world environments. Inspired by remarkable achievements via domain randomization, we propose ObjectDR which synthesizes such paired data via a random simulation of visual variations in object appearances and backgrounds. Our data synthesis framework exploit… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Project Page: https://ObjectDR.github.io

  9. arXiv:2403.01898  [pdf, other

    cs.CV eess.IV

    Revisiting Learning-based Video Motion Magnification for Real-time Processing

    Authors: Hyunwoo Ha, Oh Hyun-Bin, Kim Jun-Seong, Kwon Byung-Ki, Kim Sung-Bin, Linh-Tam Tran, Ji-Yun Kim, Sung-Ho Bae, Tae-Hyun Oh

    Abstract: Video motion magnification is a technique to capture and amplify subtle motion in a video that is invisible to the naked eye. The deep learning-based prior work successfully demonstrates the modelling of the motion magnification problem with outstanding quality compared to conventional signal processing-based ones. However, it still lags behind real-time performance, which prevents it from being e… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 19 pages

  10. arXiv:2402.04625  [pdf, other

    cs.CV

    Noise Map Guidance: Inversion with Spatial Context for Real Image Editing

    Authors: Hansam Cho, Jonghyun Lee, Seoung Bum Kim, Tae-Hyun Oh, Yonghyun Jeong

    Abstract: Text-guided diffusion models have become a popular tool in image synthesis, known for producing high-quality and diverse images. However, their application to editing real images often encounters hurdles primarily due to the text condition deteriorating the reconstruction quality and subsequently affecting editing fidelity. Null-text Inversion (NTI) has made strides in this area, but it fails to c… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: ICLR 2024

  11. arXiv:2401.05516  [pdf, other

    cs.CV cs.AI cs.GR

    FPRF: Feed-Forward Photorealistic Style Transfer of Large-Scale 3D Neural Radiance Fields

    Authors: GeonU Kim, Kim Youwang, Tae-Hyun Oh

    Abstract: We present FPRF, a feed-forward photorealistic style transfer method for large-scale 3D neural radiance fields. FPRF stylizes large-scale 3D scenes with arbitrary, multiple style reference images without additional optimization while preserving multi-view appearance consistency. Prior arts required tedious per-style/-scene optimization and were limited to small-scale 3D scenes. FPRF efficiently st… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: Project page: https://kim-geonu.github.io/FPRF/

  12. arXiv:2312.11360  [pdf, other

    cs.CV cs.AI cs.GR

    Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

    Authors: Kim Youwang, Tae-Hyun Oh, Gerard Pons-Moll

    Abstract: We present Paint-it, a text-driven high-fidelity texture map synthesis method for 3D meshes via neural re-parameterized texture optimization. Paint-it synthesizes texture maps from a text description by synthesis-through-optimization, exploiting the Score-Distillation Sampling (SDS). We observe that directly applying SDS yields undesirable texture quality due to its noisy gradients. We reveal the… ▽ More

    Submitted 7 May, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: CVPR 2024. Project page: https://kim-youwang.github.io/paint-it

  13. arXiv:2312.10975  [pdf, other

    cs.LG cs.AI math.NA

    Inducing Point Operator Transformer: A Flexible and Scalable Architecture for Solving PDEs

    Authors: Seungjun Lee, Taeil Oh

    Abstract: Solving partial differential equations (PDEs) by learning the solution operators has emerged as an attractive alternative to traditional numerical methods. However, implementing such architectures presents two main challenges: flexibility in handling irregular and arbitrary input and output formats and scalability to large discretizations. Most existing architectures are limited by their desired s… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: AAAI 2024

  14. arXiv:2312.09818  [pdf, other

    cs.CL cs.AI

    SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models

    Authors: Lee Hyun, Kim Sung-Bin, Seungju Han, Youngjae Yu, Tae-Hyun Oh

    Abstract: Despite the recent advances of the artificial intelligence, building social intelligence remains a challenge. Among social signals, laughter is one of the distinctive expressions that occurs during social interactions between humans. In this work, we tackle a new challenge for machines to understand the rationale behind laughter in video, Video Laugh Reasoning. We introduce this new task to explai… ▽ More

    Submitted 24 May, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: 19 pages, 14 figures

  15. arXiv:2312.09551  [pdf, other

    eess.IV cs.CV

    Learning-based Axial Video Motion Magnification

    Authors: Kwon Byung-Ki, Oh Hyun-Bin, Kim Jun-Seong, Hyunwoo Ha, Tae-Hyun Oh

    Abstract: Video motion magnification amplifies invisible small motions to be perceptible, which provides humans with a spatially dense and holistic understanding of small motions in the scene of interest. This is based on the premise that magnifying small motions enhances the legibility of motions. In the real world, however, vibrating objects often possess convoluted systems that have complex natural frequ… ▽ More

    Submitted 26 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: main paper: 12 pages, supplementary: 10 pages, 20 figures, 1 table

  16. arXiv:2311.00994  [pdf, other

    cs.CV cs.GR

    LaughTalk: Expressive 3D Talking Head Generation with Laughter

    Authors: Kim Sung-Bin, Lee Hyun, Da Hye Hong, Suekyeong Nam, Janghoon Ju, Tae-Hyun Oh

    Abstract: Laughter is a unique expression, essential to affirmative social interactions of humans. Although current 3D talking head generation methods produce convincing verbal articulations, they often fail to capture the vitality and subtleties of laughter and smiles despite their importance in social context. In this paper, we introduce a novel task to generate 3D talking heads capable of both articulate… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted to WACV2024

  17. arXiv:2310.03205  [pdf, other

    cs.CV cs.AI

    A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization

    Authors: Kim Youwang, Lee Hyun, Kim Sung-Bin, Suekyeong Nam, Janghoon Ju, Tae-Hyun Oh

    Abstract: We propose NeuFace, a 3D face mesh pseudo annotation method on videos via neural re-parameterized optimization. Despite the huge progress in 3D face reconstruction methods, generating reliable 3D face labels for in-the-wild dynamic videos remains challenging. Using NeuFace optimization, we annotate the per-view/-frame accurate and consistent face meshes on large-scale face videos, called the NeuFa… ▽ More

    Submitted 6 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: 9 pages, 7 figures, and 3 tables for the main paper. 8 pages, 6 figures and 3 tables for the appendix

  18. arXiv:2309.10724  [pdf, other

    cs.CV cs.AI cs.MM cs.SD eess.AS

    Sound Source Localization is All about Cross-Modal Alignment

    Authors: Arda Senocak, Hyeonggon Ryu, Junsik Kim, Tae-Hyun Oh, Hanspeter Pfister, Joon Son Chung

    Abstract: Humans can easily perceive the direction of sound sources in a visual scene, termed sound source localization. Recent studies on learning-based sound source localization have mainly explored the problem from a localization perspective. However, prior arts and existing benchmarks do not account for a more important aspect of the problem, cross-modal semantic understanding, which is essential for ge… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  19. An Iterative Method for Unsupervised Robust Anomaly Detection Under Data Contamination

    Authors: Minkyung Kim, Jongmin Yu, Junsik Kim, Tae-Hyun Oh, Jun Kyun Choi

    Abstract: Most deep anomaly detection models are based on learning normality from datasets due to the difficulty of defining abnormality by its diverse and inconsistent nature. Therefore, it has been a common practice to learn normality under the assumption that anomalous data are absent in a training dataset, which we call normality assumption. However, in practice, the normality assumption is often violat… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

    Comments: IEEE Transactions on Neural Networks and Learning Systems, 2023

  20. arXiv:2308.07378  [pdf, other

    cs.CV

    The Devil in the Details: Simple and Effective Optical Flow Synthetic Data Generation

    Authors: Kwon Byung-Ki, Kim Sung-Bin, Tae-Hyun Oh

    Abstract: Recent work on dense optical flow has shown significant progress, primarily in a supervised learning manner requiring a large amount of labeled data. Due to the expensiveness of obtaining large scale real-world data, computer graphics are typically leveraged for constructing datasets. However, there is a common belief that synthetic-to-real domain gaps limit generalization to real scenes. In this… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

  21. arXiv:2308.00994  [pdf, other

    cs.CV cs.LG

    SYNAuG: Exploiting Synthetic Data for Data Imbalance Problems

    Authors: Moon Ye-Bin, Nam Hyeon-Woo, Wonseok Choi, Nayeong Kim, Suha Kwak, Tae-Hyun Oh

    Abstract: Data imbalance in training data often leads to biased predictions from trained models, which in turn causes ethical and social issues. A straightforward solution is to carefully curate training data, but given the enormous scale of modern neural networks, this is prohibitively labor-intensive and thus impractical. Inspired by recent developments in generative models, this paper explores the potent… ▽ More

    Submitted 25 April, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: The paper is under consideration at Pattern Recognition Letters

  22. arXiv:2307.14611  [pdf, other

    cs.CV

    TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation

    Authors: Moon Ye-Bin, Jisoo Kim, Hongyeob Kim, Kilho Son, Tae-Hyun Oh

    Abstract: We propose TextManiA, a text-driven manifold augmentation method that semantically enriches visual feature spaces, regardless of class distribution. TextManiA augments visual data with intra-class semantic perturbation by exploiting easy-to-understand visually mimetic words, i.e., attributes. This work is built on an interesting hypothesis that general language models, e.g., BERT and GPT, encompas… ▽ More

    Submitted 11 September, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: Accepted at ICCV 2023. [Project Pages] https://textmania.github.io/

  23. arXiv:2305.16699  [pdf, other

    eess.AS cs.AI cs.LG

    Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis

    Authors: Seongyeon Park, Bohyung Kim, Tae-hyun Oh

    Abstract: Recently, zero-shot TTS and VC methods have gained attention due to their practicality of being able to generate voices even unseen during training. Among these methods, zero-shot modifications of the VITS model have shown superior performance, while having useful properties inherited from VITS. However, the performance of VITS and VITS-based zero-shot models vary dramatically depending on how the… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023

  24. arXiv:2303.17490  [pdf, other

    cs.CV cs.MM cs.SD eess.AS eess.IV

    Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment

    Authors: Kim Sung-Bin, Arda Senocak, Hyunwoo Ha, Andrew Owens, Tae-Hyun Oh

    Abstract: How does audio describe the world around us? In this paper, we propose a method for generating an image of a scene from sound. Our method addresses the challenges of dealing with the large gaps that often exist between sight and sound. We design a model that works by scheduling the learning procedure of each model component to associate audio-visual modalities despite their information gaps. The k… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  25. arXiv:2303.17489  [pdf, other

    eess.AS cs.MM cs.SD

    Prefix tuning for automated audio captioning

    Authors: Minkyu Kim, Kim Sung-Bin, Tae-Hyun Oh

    Abstract: Audio captioning aims to generate text descriptions from environmental sounds. One challenge of audio captioning is the difficulty of the generalization due to the lack of audio-text paired training data. In this work, we propose a simple yet effective method of dealing with small-scaled datasets by leveraging a pre-trained language model. We keep the language model frozen to maintain the expressi… ▽ More

    Submitted 4 April, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  26. arXiv:2303.15669  [pdf, other

    eess.AS cs.AI cs.LG

    Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low Resource Languages

    Authors: Seongyeon Park, Myungseo Song, Bohyung Kim, Tae-Hyun Oh

    Abstract: Neural text-to-speech (TTS) models can synthesize natural human speech when trained on large amounts of transcribed speech. However, collecting such large-scale transcribed data is expensive. This paper proposes an unsupervised pre-training method for a sequence-to-sequence TTS model by leveraging large untranscribed speech data. With our pre-training, we can remarkably reduce the amount of paired… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  27. arXiv:2302.09765  [pdf, other

    cs.CV

    ENInst: Enhancing Weakly-supervised Low-shot Instance Segmentation

    Authors: Moon Ye-Bin, Dongmin Choi, Yongjin Kwon, Junsik Kim, Tae-Hyun Oh

    Abstract: We address a weakly-supervised low-shot instance segmentation, an annotation-efficient training method to deal with novel classes effectively. Since it is an under-explored problem, we first investigate the difficulty of the problem and identify the performance bottleneck by conducting systematic analyses of model components and individual sub-tasks with a simple baseline model. Based on the analy… ▽ More

    Submitted 30 July, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

    Comments: Accepted at Pattern Recognition (PR)

  28. arXiv:2302.01078  [pdf, other

    cond-mat.mtrl-sci cs.LG

    Computational Discovery of Microstructured Composites with Optimal Stiffness-Toughness Trade-Offs

    Authors: Beichen Li, Bolei Deng, Wan Shou, Tae-Hyun Oh, Yuanming Hu, Yiyue Luo, Liang Shi, Wojciech Matusik

    Abstract: The conflict between stiffness and toughness is a fundamental problem in engineering materials design. However, the systematic discovery of microstructured composites with optimal stiffness-toughness trade-offs has never been demonstrated, hindered by the discrepancies between simulation and reality and the lack of data-efficient exploration of the entire Pareto front. We introduce a generalizable… ▽ More

    Submitted 3 January, 2024; v1 submitted 31 January, 2023; originally announced February 2023.

  29. arXiv:2301.11174  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data

    Authors: Dong-Jin Kim, Tae-Hyun Oh, Jinsoo Choi, In So Kweon

    Abstract: We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models. Constructing a large-scale labeled image captioning dataset is an expensive task in terms of labor, time, and cost. In contrast to manually annotating all the training samples, separately collecting uni-modal datasets is immensely easier, e.g., a large-scale image dataset and a sen… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: Journal extension of our EMNLP 2019 paper (arXiv:1909.02201)

  30. arXiv:2210.08457  [pdf, other

    cs.CV cs.AI cs.LG

    Scratching Visual Transformer's Back with Uniform Attention

    Authors: Nam Hyeon-Woo, Kim Yu-Ji, Byeongho Heo, Dongyoon Han, Seong Joon Oh, Tae-Hyun Oh

    Abstract: The favorable performance of Vision Transformers (ViTs) is often attributed to the multi-head self-attention (MSA). The MSA enables global interactions at each layer of a ViT model, which is a contrasting feature against Convolutional Neural Networks (CNNs) that gradually increase the range of interaction across multiple layers. We study the role of the density of the attention. Our preliminary an… ▽ More

    Submitted 16 October, 2022; originally announced October 2022.

  31. arXiv:2208.06787  [pdf, other

    cs.CV cs.AI cs.GR

    HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields

    Authors: Kim Jun-Seong, Kim Yu-Ji, Moon Ye-Bin, Tae-Hyun Oh

    Abstract: We propose high dynamic range (HDR) radiance fields, HDR-Plenoxels, that learn a plenoptic function of 3D HDR radiance fields, geometry information, and varying camera settings inherent in 2D low dynamic range (LDR) images. Our voxel-based volume rendering pipeline reconstructs HDR radiance fields with only multi-view LDR images taken from varying camera settings in an end-to-end manner and has a… ▽ More

    Submitted 18 November, 2022; v1 submitted 14 August, 2022; originally announced August 2022.

    Comments: Accepted at ECCV 2022. [Project page] https://hdr-plenoxels.github.io [Code] https://github.com/postech-ami/HDR-Plenoxels

  32. arXiv:2207.13820  [pdf, other

    cs.CV cs.AI cs.LG

    Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers

    Authors: Junhyeong Cho, Kim Youwang, Tae-Hyun Oh

    Abstract: Transformer encoder architectures have recently achieved state-of-the-art results on monocular 3D human mesh reconstruction, but they require a substantial number of parameters and expensive computations. Due to the large memory overhead and slow inference speed, it is difficult to deploy such models for practical use. In this paper, we propose a novel transformer encoder-decoder architecture for… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022, Code: https://github.com/postech-ami/FastMETRO

  33. arXiv:2206.04382  [pdf, other

    cs.CV cs.AI cs.GR

    CLIP-Actor: Text-Driven Recommendation and Stylization for Animating Human Meshes

    Authors: Kim Youwang, Kim Ji-Yeon, Tae-Hyun Oh

    Abstract: We propose CLIP-Actor, a text-driven motion recommendation and neural mesh stylization system for human mesh animation. CLIP-Actor animates a 3D human mesh to conform to a text prompt by recommending a motion sequence and optimizing mesh style attributes. We build a text-driven human motion recommendation system by leveraging a large-scale human motion dataset with language labels. Given a natural… ▽ More

    Submitted 21 July, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: Accepted at ECCV 2022. [Project page] https://clip-actor.github.io [Code] https://github.com/postech-ami/CLIP-Actor

  34. arXiv:2202.05961  [pdf, other

    cs.CV eess.IV

    Audio-Visual Fusion Layers for Event Type Aware Video Recognition

    Authors: Arda Senocak, Junsik Kim, Tae-Hyun Oh, Hyeonggon Ryu, Dingzeyu Li, In So Kweon

    Abstract: Human brain is continuously inundated with the multisensory information and their complex interactions coming from the outside world at any given moment. Such information is automatically analyzed by binding or segregating in our brain. While this task might seem effortless for human brains, it is extremely challenging to build a machine that can perform similar tasks since complex interactions ca… ▽ More

    Submitted 11 February, 2022; originally announced February 2022.

  35. arXiv:2111.02450  [pdf, other

    cs.CV cs.AI

    Unified 3D Mesh Recovery of Humans and Animals by Learning Animal Exercise

    Authors: Kim Youwang, Kim Ji-Yeon, Kyungdon Joo, Tae-Hyun Oh

    Abstract: We propose an end-to-end unified 3D mesh recovery of humans and quadruped animals trained in a weakly-supervised way. Unlike recent work focusing on a single target class only, we aim to recover 3D mesh of broader classes with a single multi-task model. However, there exists no dataset that can directly enable multi-task learning due to the absence of both human and animal annotations for a single… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

    Comments: BMVC 2021, 10 pages excluding reference page

  36. arXiv:2110.00740  [pdf, other

    cs.CV cs.AI

    FICGAN: Facial Identity Controllable GAN for De-identification

    Authors: Yonghyun Jeong, Jooyoung Choi, Sungwon Kim, Youngmin Ro, Tae-Hyun Oh, Doyeon Kim, Heonseok Ha, Sungroh Yoon

    Abstract: In this work, we present Facial Identity Controllable GAN (FICGAN) for not only generating high-quality de-identified face images with ensured privacy protection, but also detailed controllability on attribute preservation for enhanced data utility. We tackle the less-explored yet desired functionality in face de-identification based on the two factors. First, we focus on the challenging issue to… ▽ More

    Submitted 2 October, 2021; originally announced October 2021.

  37. arXiv:2108.06098  [pdf, other

    cs.LG cs.CV

    FedPara: Low-Rank Hadamard Product for Communication-Efficient Federated Learning

    Authors: Nam Hyeon-Woo, Moon Ye-Bin, Tae-Hyun Oh

    Abstract: In this work, we propose a communication-efficient parameterization, FedPara, for federated learning (FL) to overcome the burdens on frequent model uploads and downloads. Our method re-parameterizes weight parameters of layers using low-rank weights followed by the Hadamard product. Compared to the conventional low-rank parameterization, our FedPara method is not restricted to low-rank constraints… ▽ More

    Submitted 19 January, 2023; v1 submitted 13 August, 2021; originally announced August 2021.

    Comments: Accepted at ICLR 2022

  38. arXiv:2105.09680  [pdf, other

    cs.CL

    KLUE: Korean Language Understanding Evaluation

    Authors: Sungjoon Park, Jihyung Moon, Sungdong Kim, Won Ik Cho, Jiyoon Han, Jangwon Park, Chisung Song, Junseong Kim, Yongsook Song, Taehwan Oh, Joohong Lee, Juhyun Oh, Sungwon Lyu, Younghoon Jeong, Inkwon Lee, Sangwoo Seo, Dongjun Lee, Hyunwoo Kim, Myeonghwa Lee, Seongbo Jang, Seungwon Do, Sunkyoung Kim, Kyungtae Lim, Jongwon Lee, Kyumin Park , et al. (6 additional authors not shown)

    Abstract: We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE is a collection of 8 Korean natural language understanding (NLU) tasks, including Topic Classification, SemanticTextual Similarity, Natural Language Inference, Named Entity Recognition, Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking. We build all of the tasks from scrat… ▽ More

    Submitted 2 November, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

    Comments: 76 pages, 10 figures, 36 tables

  39. arXiv:2010.03855  [pdf, other

    cs.CV cs.AI cs.CL

    Dense Relational Image Captioning via Multi-task Triple-Stream Networks

    Authors: Dong-Jin Kim, Tae-Hyun Oh, Jinsoo Choi, In So Kweon

    Abstract: We introduce dense relational captioning, a novel image captioning task which aims to generate multiple captions with respect to relational information between objects in a visual scene. Relational captioning provides explicit descriptions for each relationship between object combinations. This framework is advantageous in both diversity and amount of information, leading to a comprehensive image… ▽ More

    Submitted 11 October, 2021; v1 submitted 8 October, 2020; originally announced October 2020.

    Comments: IEEE TPAMI accepted. Journal extension of our CVPR 2019 paper ( arXiv:1903.05942 ). Source code : https://github.com/Dong-JinKim/DenseRelationalCaptioning

  40. arXiv:2008.10542  [pdf, other

    eess.IV cs.CV

    Automatic LiDAR Extrinsic Calibration System using Photodetector and Planar Board for Large-scale Applications

    Authors: Ji-Hwan You, Seon Taek Oh, Jae-Eun Park, Azim Eskandarian, Young-Keun Kim

    Abstract: This paper presents a novel automatic calibration system to estimate the extrinsic parameters of LiDAR mounted on a mobile platform for sensor misalignment inspection in the large-scale production of highly automated vehicles. To obtain subdegree and subcentimeter accuracy levels of extrinsic calibration, this study proposed a new concept of a target board with embedded photodetector arrays, named… ▽ More

    Submitted 24 August, 2020; originally announced August 2020.

    Comments: prepost for IEEE journal

  41. arXiv:2008.10247  [pdf, other

    cs.CV cs.GR cs.LG

    Monocular Reconstruction of Neural Face Reflectance Fields

    Authors: Mallikarjun B R., Ayush Tewari, Tae-Hyun Oh, Tim Weyrich, Bernd Bickel, Hans-Peter Seidel, Hanspeter Pfister, Wojciech Matusik, Mohamed Elgharib, Christian Theobalt

    Abstract: The reflectance field of a face describes the reflectance properties responsible for complex lighting effects including diffuse, specular, inter-reflection and self shadowing. Most existing methods for estimating the face reflectance from a monocular image assume faces to be diffuse with very few approaches adding a specular component. This still leaves out important perceptual aspects of reflecta… ▽ More

    Submitted 24 August, 2020; originally announced August 2020.

    Comments: Project page - http://gvv.mpi-inf.mpg.de/projects/FaceReflectanceFields/

  42. arXiv:2005.12898  [pdf, other

    cs.CL

    Analysis of the Penn Korean Universal Dependency Treebank (PKT-UD): Manual Revision to Build Robust Parsing Model in Korean

    Authors: Tae Hwan Oh, Ji Yoon Han, Hyonsu Choe, Seokwon Park, Han He, Jinho D. Choi, Na-Rae Han, Jena D. Hwang, Hansaem Kim

    Abstract: In this paper, we first open on important issues regarding the Penn Korean Universal Treebank (PKT-UD) and address these issues by revising the entire corpus manually with the aim of producing cleaner UD annotations that are more faithful to Korean grammar. For compatibility to the rest of UD corpora, we follow the UDv2 guidelines, and extensively revise the part-of-speech tags and the dependency… ▽ More

    Submitted 26 May, 2020; originally announced May 2020.

    Comments: Accepted by The 16th International Conference on Parsing Technologies, IWPT 2020

  43. arXiv:2003.08264  [pdf, other

    cs.CV

    Cross-domain Self-supervised Learning for Domain Adaptation with Few Source Labels

    Authors: Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko

    Abstract: Existing unsupervised domain adaptation methods aim to transfer knowledge from a label-rich source domain to an unlabeled target domain. However, obtaining labels for some source domains may be very expensive, making complete labeling as used in prior work impractical. In this work, we investigate a new domain adaptation scenario with sparsely labeled source data, where only a few examples in the… ▽ More

    Submitted 18 March, 2020; originally announced March 2020.

  44. arXiv:1912.04487  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Listen to Look: Action Recognition by Previewing Audio

    Authors: Ruohan Gao, Tae-Hyun Oh, Kristen Grauman, Lorenzo Torresani

    Abstract: In the face of the video data deluge, today's expensive clip-level classifiers are increasingly impractical. We propose a framework for efficient action recognition in untrimmed video that uses audio as a preview mechanism to eliminate both short-term and long-term visual redundancies. First, we devise an ImgAud2Vid framework that hallucinates clip-level features by distilling from lighter modalit… ▽ More

    Submitted 28 March, 2020; v1 submitted 9 December, 2019; originally announced December 2019.

    Comments: Appears in CVPR 2020; Project page: http://vision.cs.utexas.edu/projects/listen_to_look/

  45. arXiv:1911.09649  [pdf, other

    cs.CV

    Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications

    Authors: Arda Senocak, Tae-Hyun Oh, Junsik Kim, Ming-Hsuan Yang, In So Kweon

    Abstract: Visual events are usually accompanied by sounds in our daily lives. However, can the machines learn to correlate the visual scene and sound, as well as localize the sound source only by observing them like humans? To investigate its empirical learnability, in this work we first present a novel unsupervised algorithm to address the problem of localizing sound sources in visual scenes. In order to a… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

    Comments: To appear in TPAMI. arXiv admin note: substantial text overlap with arXiv:1803.03849

  46. arXiv:1911.03446  [pdf, other

    quant-ph cond-mat.stat-mech cs.ET

    Scaling advantage in quantum simulation of geometrically frustrated magnets

    Authors: Andrew D. King, Jack Raymond, Trevor Lanting, Sergei V. Isakov, Masoud Mohseni, Gabriel Poulin-Lamarre, Sara Ejtemaee, William Bernoudy, Isil Ozfidan, Anatoly Yu. Smirnov, Mauricio Reis, Fabio Altomare, Michael Babcock, Catia Baron, Andrew J. Berkley, Kelly Boothby, Paul I. Bunyk, Holly Christiani, Colin Enderud, Bram Evert, Richard Harris, Emile Hoskinson, Shuiyuan Huang, Kais Jooya, Ali Khodabandelou , et al. (29 additional authors not shown)

    Abstract: The promise of quantum computing lies in harnessing programmable quantum devices for practical applications such as efficient simulation of quantum materials and condensed matter systems. One important task is the simulation of geometrically frustrated magnets in which topological phenomena can emerge from competition between quantum and thermal fluctuations. Here we report on experimental observa… ▽ More

    Submitted 8 November, 2019; originally announced November 2019.

    Comments: 7 pages, 4 figures, 22 pages of supplemental material with 18 figures

  47. arXiv:1909.06979  [pdf, other

    cs.CV

    Visuomotor Understanding for Representation Learning of Driving Scenes

    Authors: Seokju Lee, Junsik Kim, Tae-Hyun Oh, Yongseop Jeong, Donggeun Yoo, Stephen Lin, In So Kweon

    Abstract: Dashboard cameras capture a tremendous amount of driving scene video each day. These videos are purposefully coupled with vehicle sensing data, such as from the speedometer and inertial sensors, providing an additional sensing modality for free. In this work, we leverage the large-scale unlabeled yet naturally paired data for visual representation learning in the driving scenario. A representation… ▽ More

    Submitted 16 September, 2019; originally announced September 2019.

    Comments: BMVC 2019. Supplementary material: https://bmvc2019.org/wp-content/uploads/papers/0002-supplementary.zip Dataset: http://github.com/SeokjuLee/driving-dataset-doc

  48. Image Captioning with Very Scarce Supervised Data: Adversarial Semi-Supervised Learning Approach

    Authors: Dong-Jin Kim, Jinsoo Choi, Tae-Hyun Oh, In So Kweon

    Abstract: Constructing an organized dataset comprised of a large number of images and several captions for each image is a laborious task, which requires vast human effort. On the other hand, collecting a large number of images and sentences separately may be immensely easier. In this paper, we develop a novel data-efficient semi-supervised framework for training an image captioning model. We leverage massi… ▽ More

    Submitted 21 November, 2019; v1 submitted 5 September, 2019; originally announced September 2019.

    Comments: EMNLP 2019. Project page : https://sites.google.com/view/emnlp19scarcecaption

  49. arXiv:1905.09773  [pdf, other

    cs.CV cs.MM

    Speech2Face: Learning the Face Behind a Voice

    Authors: Tae-Hyun Oh, Tali Dekel, Changil Kim, Inbar Mosseri, William T. Freeman, Michael Rubinstein, Wojciech Matusik

    Abstract: How much can we infer about a person's looks from the way they speak? In this paper, we study the task of reconstructing a facial image of a person from a short audio recording of that person speaking. We design and train a deep neural network to perform this task using millions of natural Internet/YouTube videos of people speaking. During training, our model learns voice-face correlations that al… ▽ More

    Submitted 23 May, 2019; originally announced May 2019.

    Comments: To appear in CVPR2019. Project page: http://speech2face.github.io

  50. arXiv:1904.08482  [pdf, other

    cs.CV

    Variational Prototyping-Encoder: One-Shot Learning with Prototypical Images

    Authors: Junsik Kim, Tae-Hyun Oh, Seokju Lee, Fei Pan, In So Kweon

    Abstract: In daily life, graphic symbols, such as traffic signs and brand logos, are ubiquitously utilized around us due to its intuitive expression beyond language boundary. We tackle an open-set graphic symbol recognition problem by one-shot classification with prototypical images as a single training example for each novel class. We take an approach to learn a generalizable embedding space for novel task… ▽ More

    Submitted 17 April, 2019; originally announced April 2019.

    Comments: Accepted to CVPR 2019