Skip to main content

Showing 1–28 of 28 results for author: Cao, A

  1. arXiv:2407.02599  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Meta 3D Gen

    Authors: Raphael Bensadoun, Tom Monnier, Yanir Kleiman, Filippos Kokkinos, Yawar Siddiqui, Mahendra Kariya, Omri Harosh, Roman Shapovalov, Benjamin Graham, Emilien Garreau, Animesh Karnewar, Ang Cao, Idan Azuri, Iurii Makarov, Eric-Tuan Le, Antoine Toisoul, David Novotny, Oran Gafni, Natalia Neverova, Andrea Vedaldi

    Abstract: We introduce Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minute. It supports physically-based rendering (PBR), necessary for 3D asset relighting in real-world applications. Additionally, 3DGen supports generative retexturing of previously gener… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2405.16248  [pdf

    eess.IV cs.CV cs.LG q-bio.QM

    Combining Radiomics and Machine Learning Approaches for Objective ASD Diagnosis: Verifying White Matter Associations with ASD

    Authors: Junlin Song, Yuzhuo Chen, Yuan Yao, Zetong Chen, Renhao Guo, Lida Yang, Xinyi Sui, Qihang Wang, Xijiao Li, Aihua Cao, Wei Li

    Abstract: Autism Spectrum Disorder is a condition characterized by a typical brain development leading to impairments in social skills, communication abilities, repetitive behaviors, and sensory processing. There have been many studies combining brain MRI images with machine learning algorithms to achieve objective diagnosis of autism, but the correlation between white matter and autism has not been fully u… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  3. arXiv:2405.09150  [pdf, other

    cs.CV

    Curriculum Dataset Distillation

    Authors: Zhiheng Ma, Anjia Cao, Funing Yang, Xing Wei

    Abstract: Most dataset distillation methods struggle to accommodate large-scale datasets due to their substantial computational and memory requirements. In this paper, we present a curriculum-based dataset distillation framework designed to harmonize scalability with efficiency. This framework strategically distills synthetic images, adhering to a curriculum that transitions from simple to complex. By incor… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  4. arXiv:2404.19760  [pdf, other

    cs.CV cs.GR

    Lightplane: Highly-Scalable Components for Neural 3D Fields

    Authors: Ang Cao, Justin Johnson, Andrea Vedaldi, David Novotny

    Abstract: Contemporary 3D research, particularly in reconstruction and generation, heavily relies on 2D images for inputs or supervision. However, current designs for these 2D-3D mapping are memory-intensive, posing a significant bottleneck for existing methods and hindering new applications. In response, we propose a pair of highly scalable components for 3D neural fields: Lightplane Render and Splatter, w… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Project Page: https://lightplane.github.io/ Code: https://github.com/facebookresearch/lightplane

  5. arXiv:2404.10279  [pdf, other

    cs.CV

    EucliDreamer: Fast and High-Quality Texturing for 3D Models with Depth-Conditioned Stable Diffusion

    Authors: Cindy Le, Congrui Hetang, Chendi Lin, Ang Cao, Yihui He

    Abstract: We present EucliDreamer, a simple and effective method to generate textures for 3D models given text prompts and meshes. The texture is parametrized as an implicit function on the 3D surface, which is optimized with the Score Distillation Sampling (SDS) process and differentiable rendering. To generate high-quality textures, we leverage a depth-conditioned Stable Diffusion model guided by the dept… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Short version of arXiv:2311.15573

  6. arXiv:2404.02928  [pdf, other

    cs.CR cs.AI

    Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models

    Authors: Jiachen Ma, Anda Cao, Zhiqing Xiao, Jie Zhang, Chao Ye, Junbo Zhao

    Abstract: Text-to-Image (T2I) models have received widespread attention due to their remarkable generation capabilities. However, concerns have been raised about the ethical implications of the models in generating Not Safe for Work (NSFW) images because NSFW images may cause discomfort to people or be used for illegal purposes. To mitigate the generation of such images, T2I models deploy various types of s… ▽ More

    Submitted 2 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  7. arXiv:2312.17142  [pdf, other

    cs.CV cs.GR

    DreamGaussian4D: Generative 4D Gaussian Splatting

    Authors: Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, Ziwei Liu

    Abstract: 4D content generation has achieved remarkable progress recently. However, existing methods suffer from long optimization times, a lack of motion controllability, and a low quality of details. In this paper, we introduce DreamGaussian4D (DG4D), an efficient 4D generation framework that builds on Gaussian Splatting (GS). Our key insight is that combining explicit modeling of spatial transformations… ▽ More

    Submitted 10 June, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Technical report. Project page is at https://jiawei-ren.github.io/projects/dreamgaussian4d Code is at https://github.com/jiawei-ren/dreamgaussian4d

  8. arXiv:2312.08267  [pdf, other

    eess.IV cs.CV q-bio.QM

    TABSurfer: a Hybrid Deep Learning Architecture for Subcortical Segmentation

    Authors: Aaron Cao, Vishwanatha M. Rao, Kejia Liu, Xinru Liu, Andrew F. Laine, Jia Guo

    Abstract: Subcortical segmentation remains challenging despite its important applications in quantitative structural analysis of brain MRI scans. The most accurate method, manual segmentation, is highly labor intensive, so automated tools like FreeSurfer have been adopted to handle this task. However, these traditional pipelines are slow and inefficient for processing large datasets. In this study, we propo… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: 5 pages, 3 figures, 2 tables

  9. arXiv:2312.05279  [pdf

    eess.IV cs.CV

    Quantitative perfusion maps using a novelty spatiotemporal convolutional neural network

    Authors: Anbo Cao, Pin-Yu Le, Zhonghui Qie, Haseeb Hassan, Yingwei Guo, Asim Zaman, Jiaxi Lu, Xueqiang Zeng, Huihui Yang, Xiaoqiang Miao, Taiyu Han, Guangtao Huang, Yan Kang, Yu Luo, Jia Guo

    Abstract: Dynamic susceptibility contrast magnetic resonance imaging (DSC-MRI) is widely used to evaluate acute ischemic stroke to distinguish salvageable tissue and infarct core. For this purpose, traditional methods employ deconvolution techniques, like singular value decomposition, which are known to be vulnerable to noise, potentially distorting the derived perfusion parameters. However, deep learning t… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  10. arXiv:2312.02158  [pdf, other

    cs.CV cs.AI

    PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness

    Authors: Anh-Quan Cao, Angela Dai, Raoul de Charette

    Abstract: We propose the task of Panoptic Scene Completion (PSC) which extends the recently popular Semantic Scene Completion (SSC) task with instance-level information to produce a richer understanding of the 3D scene. Our PSC proposal utilizes a hybrid mask-based technique on the non-empty voxels from sparse multi-scale completions. Whereas the SSC literature overlooks uncertainty which is critical for ro… ▽ More

    Submitted 25 May, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 Oral - Best paper award candidate. Project page: https://astra-vision.github.io/PaSCo

  11. arXiv:2311.15573  [pdf, other

    cs.CV cs.GR

    EucliDreamer: Fast and High-Quality Texturing for 3D Models with Stable Diffusion Depth

    Authors: Cindy Le, Congrui Hetang, Chendi Lin, Ang Cao, Yihui He

    Abstract: This paper presents a novel method to generate textures for 3D models given text prompts and 3D meshes. Additional depth information is taken into account to perform the Score Distillation Sampling (SDS) process with depth conditional Stable Diffusion. We ran our model over the open-source dataset Objaverse and conducted a user study to compare the results with those of various 3D texturing method… ▽ More

    Submitted 13 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  12. arXiv:2310.01037  [pdf, other

    physics.geo-ph cs.LG

    SeisT: A foundational deep learning model for earthquake monitoring tasks

    Authors: Sen Li, Xu Yang, Anye Cao, Changbin Wang, Yaoqi Liu, Yapeng Liu, Qiang Niu

    Abstract: Seismograms, the fundamental seismic records, have revolutionized earthquake research and monitoring. Recent advancements in deep learning have further enhanced seismic signal processing, leading to even more precise and effective earthquake monitoring capabilities. This paper introduces a foundational deep learning model, the Seismogram Transformer (SeisT), designed for a variety of earthquake mo… ▽ More

    Submitted 26 December, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Journal ref: IEEE Transactions on Geoscience and Remote Sensing, 2024

  13. arXiv:2305.01151  [pdf, ps, other

    cs.LG

    Early Classifying Multimodal Sequences

    Authors: Alexander Cao, Jean Utke, Diego Klabjan

    Abstract: Often pieces of information are received sequentially over time. When did one collect enough such pieces to classify? Trading wait time for decision certainty leads to early classification problems that have recently gained attention as a means of adapting classification to more dynamic environments. However, so far results have been limited to unimodal sequences. In this pilot study, we expand in… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: 7 pages, 5 figures

  14. arXiv:2304.03463  [pdf, ps, other

    cs.LG

    A Policy for Early Sequence Classification

    Authors: Alexander Cao, Jean Utke, Diego Klabjan

    Abstract: Sequences are often not received in their entirety at once, but instead, received incrementally over time, element by element. Early predictions yielding a higher benefit, one aims to classify a sequence as accurately as possible, as soon as possible, without having to wait for the last element. For this early sequence classification, we introduce our novel classifier-induced stopping. While previ… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: 12 pages, 6 figures

  15. arXiv:2303.11989  [pdf, other

    cs.CV

    Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models

    Authors: Lukas Höllein, Ang Cao, Andrew Owens, Justin Johnson, Matthias Nießner

    Abstract: We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input. To this end, we leverage pre-trained 2D text-to-image models to synthesize a sequence of images from different poses. In order to lift these outputs into a consistent 3D scene representation, we combine monocular depth estimation with a text-conditioned inpainting model. The core idea of… ▽ More

    Submitted 10 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: Accepted to ICCV 2023 (Oral) video: https://youtu.be/fjRnFL91EZc project page: https://lukashoel.github.io/text-to-room/ code: https://github.com/lukasHoel/text2room

  16. arXiv:2301.09632  [pdf, other

    cs.CV

    HexPlane: A Fast Representation for Dynamic Scenes

    Authors: Ang Cao, Justin Johnson

    Abstract: Modeling and re-rendering dynamic 3D scenes is a challenging task in 3D vision. Prior approaches build on NeRF and rely on implicit representations. This is slow since it requires many MLP evaluations, constraining real-world applications. We show that dynamic 3D scenes can be explicitly represented by six planes of learned features, leading to an elegant solution we call HexPlane. A HexPlane comp… ▽ More

    Submitted 27 March, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

    Comments: CVPR 2023, Camera Ready Project page: https://caoang327.github.io/HexPlane

  17. arXiv:2212.02501  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields

    Authors: Anh-Quan Cao, Raoul de Charette

    Abstract: 3D reconstruction from a single 2D image was extensively covered in the literature but relies on depth supervision at training time, which limits its applicability. To relax the dependence to depth we propose SceneRF, a self-supervised monocular scene reconstruction method using only posed image sequences for training. Fueled by the recent progress in neural radiance fields (NeRF) we optimize a ra… ▽ More

    Submitted 24 August, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

    Comments: ICCV 2023. Project page: https://astra-vision.github.io/SceneRF

  18. arXiv:2210.01784  [pdf, other

    cs.CV

    COARSE3D: Class-Prototypes for Contrastive Learning in Weakly-Supervised 3D Point Cloud Segmentation

    Authors: Rong Li, Anh-Quan Cao, Raoul de Charette

    Abstract: Annotation of large-scale 3D data is notoriously cumbersome and costly. As an alternative, weakly-supervised learning alleviates such a need by reducing the annotation by several order of magnitudes. We propose COARSE3D, a novel architecture-agnostic contrastive learning strategy for 3D segmentation. Since contrastive learning requires rich and diverse examples as keys and anchors, we leverage a p… ▽ More

    Submitted 7 October, 2022; v1 submitted 4 October, 2022; originally announced October 2022.

  19. arXiv:2206.08355  [pdf, other

    cs.CV

    FWD: Real-time Novel View Synthesis with Forward Warping and Depth

    Authors: Ang Cao, Chris Rockwell, Justin Johnson

    Abstract: Novel view synthesis (NVS) is a challenging task requiring systems to generate photorealistic images of scenes from new viewpoints, where both quality and speed are important for applications. Previous image-based rendering (IBR) methods are fast, but have poor quality when input views are sparse. Recent Neural Radiance Fields (NeRF) and generalizable variants give impressive results but are not r… ▽ More

    Submitted 5 August, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: CVPR 2022. Project website https://caoang327.github.io/FWD/

  20. arXiv:2201.02923  [pdf, ps, other

    cs.LG

    Open-Set Recognition of Breast Cancer Treatments

    Authors: Alexander Cao, Diego Klabjan, Yuan Luo

    Abstract: Open-set recognition generalizes a classification task by classifying test samples as one of the known classes from training or "unknown." As novel cancer drug cocktails with improved treatment are continually discovered, predicting cancer treatments can naturally be formulated in terms of an open-set recognition problem. Drawbacks, due to modeling unknown samples during training, arise from strai… ▽ More

    Submitted 8 January, 2022; originally announced January 2022.

    Comments: 22 pages, 9 figures and 9 tables

  21. arXiv:2112.00726  [pdf, other

    cs.CV cs.AI cs.RO

    MonoScene: Monocular 3D Semantic Scene Completion

    Authors: Anh-Quan Cao, Raoul de Charette

    Abstract: MonoScene proposes a 3D Semantic Scene Completion (SSC) framework, where the dense geometry and semantics of a scene are inferred from a single monocular RGB image. Different from the SSC literature, relying on 2.5 or 3D input, we solve the complex problem of 2D to 3D scene reconstruction while jointly inferring its semantics. Our framework relies on successive 2D and 3D UNets bridged by a novel 2… ▽ More

    Submitted 29 March, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: Accepted at CVPR 2022. Project page: https://cv-rits.github.io/MonoScene/

  22. arXiv:2110.01269  [pdf, other

    cs.CV

    PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds

    Authors: Anh-Quan Cao, Gilles Puy, Alexandre Boulch, Renaud Marlet

    Abstract: Rigid registration of point clouds with partial overlaps is a longstanding problem usually solved in two steps: (a) finding correspondences between the point clouds; (b) filtering these correspondences to keep only the most reliable ones to estimate the transformation. Recently, several deep nets have been proposed to solve these steps jointly. We built upon these works and propose PCAM: a neural… ▽ More

    Submitted 4 October, 2021; originally announced October 2021.

    Comments: ICCV21

  23. arXiv:2106.13933  [pdf, other

    cs.CV

    Inverting and Understanding Object Detectors

    Authors: Ang Cao, Justin Johnson

    Abstract: As a core problem in computer vision, the performance of object detection has improved drastically in the past few years. Despite their impressive performance, object detectors suffer from a lack of interpretability. Visualization techniques have been developed and widely applied to introspect the decisions made by other kinds of deep learning models; however, visualizing object detectors has been… ▽ More

    Submitted 25 June, 2021; originally announced June 2021.

    Comments: Preprints

  24. arXiv:2006.02003  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Open-Set Recognition with Gaussian Mixture Variational Autoencoders

    Authors: Alexander Cao, Yuan Luo, Diego Klabjan

    Abstract: In inference, open-set classification is to either classify a sample into a known class from training or reject it as an unknown class. Existing deep open-set classifiers train explicit closed-set classifiers, in some cases disjointly utilizing reconstruction, which we find dilutes the latent representation's ability to distinguish unknown classes. In contrast, we train our model to cooperatively… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

    Comments: 12 pages including 8 figures and 4 tables, plus 6 pages of supplementary material

  25. arXiv:2005.05389  [pdf

    cs.DL

    Citations versus expert opinions: Citation analysis of Featured Reviews of the American Mathematical Society

    Authors: Lawrence Smolinsky, Daniel S. Sage, Aaron J. Lercher, Aaron Cao

    Abstract: Peer review and citation metrics are two means of gauging the value of scientific research, but the lack of publicly available peer review data makes the comparison of these methods difficult. Mathematics can serve as a useful laboratory for considering these questions because as an exact science, there is a narrow range of reasons for citations. In mathematics, virtually all published articles ar… ▽ More

    Submitted 16 December, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

    Comments: 21 pages, 3 figures, 4 tables

  26. arXiv:1912.05590  [pdf, other

    cs.NI cs.LG

    Peek Inside the Closed World: Evaluating Autoencoder-Based Detection of DDoS to Cloud

    Authors: Hang Guo, Xun Fan, Anh Cao, Geoff Outhred, John Heidemann

    Abstract: Machine-learning-based anomaly detection (ML-based AD) has been successful at detecting DDoS events in the lab. However published evaluations of ML-based AD have used only limited data and provided minimal insight into why it works. To address limited evaluation against real-world data, we apply autoencoder, an existing ML-AD model, to 57 DDoS attack events captured at 5 cloud IPs from a major clo… ▽ More

    Submitted 20 June, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

  27. arXiv:1908.03237  [pdf, other

    cs.CV

    Image-based marker tracking and registration for intraoperative 3D image-guided interventions using augmented reality

    Authors: Andong Cao, Ali Dhanaliwala, Jianbo Shi, Terence Gade, Brian Park

    Abstract: Augmented reality has the potential to improve operating room workflow by allowing physicians to "see" inside a patient through the projection of imaging directly onto the surgical field. For this to be useful the acquired imaging must be quickly and accurately registered with patient and the registration must be maintained. Here we describe a method for projecting a CT scan with Microsoft Hololen… ▽ More

    Submitted 8 August, 2019; originally announced August 2019.

  28. arXiv:1907.06143  [pdf, other

    cs.LG cs.CV

    Neural Embedding for Physical Manipulations

    Authors: Lingzhi Zhang, Andong Cao, Rui Li, Jianbo Shi

    Abstract: In common real-world robotic operations, action and state spaces can be vast and sometimes unknown, and observations are often relatively sparse. How do we learn the full topology of action and state spaces when given only few and sparse observations? Inspired by the properties of grid cells in mammalian brains, we build a generative model that enforces a normalized pairwise distance constraint be… ▽ More

    Submitted 13 July, 2019; originally announced July 2019.