Skip to main content

Showing 1–12 of 12 results for author: Monnier, T

  1. arXiv:2407.02599  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Meta 3D Gen

    Authors: Raphael Bensadoun, Tom Monnier, Yanir Kleiman, Filippos Kokkinos, Yawar Siddiqui, Mahendra Kariya, Omri Harosh, Roman Shapovalov, Benjamin Graham, Emilien Garreau, Animesh Karnewar, Ang Cao, Idan Azuri, Iurii Makarov, Eric-Tuan Le, Antoine Toisoul, David Novotny, Oran Gafni, Natalia Neverova, Andrea Vedaldi

    Abstract: We introduce Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minute. It supports physically-based rendering (PBR), necessary for 3D asset relighting in real-world applications. Additionally, 3DGen supports generative retexturing of previously gener… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2407.02445  [pdf, other

    cs.CV cs.AI cs.GR

    Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials

    Authors: Yawar Siddiqui, Tom Monnier, Filippos Kokkinos, Mahendra Kariya, Yanir Kleiman, Emilien Garreau, Oran Gafni, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny

    Abstract: We present Meta 3D AssetGen (AssetGen), a significant advancement in text-to-3D generation which produces faithful, high-quality meshes with texture and material control. Compared to works that bake shading in the 3D object's appearance, AssetGen outputs physically-based rendering (PBR) materials, supporting realistic relighting. AssetGen generates first several views of the object with factored s… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Project Page: https://assetgen.github.io

  3. arXiv:2312.08744  [pdf, other

    cs.CV cs.GR

    GOEmbed: Gradient Origin Embeddings for Representation Agnostic 3D Feature Learning

    Authors: Animesh Karnewar, Roman Shapovalov, Tom Monnier, Andrea Vedaldi, Niloy J. Mitra, David Novotny

    Abstract: Encoding information from 2D views of an object into a 3D representation is crucial for generalized 3D feature extraction. Such features can then enable 3D reconstruction, 3D generation, and other applications. We propose GOEmbed (Gradient Origin Embeddings) that encodes input 2D images into any 3D representation, without requiring a pre-trained image feature extractor; unlike typical prior approa… ▽ More

    Submitted 15 July, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: ECCV 2024 conference; project page at: https://holodiffusion.github.io/goembed/

  4. arXiv:2307.05473  [pdf, other

    cs.CV

    Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives

    Authors: Tom Monnier, Jake Austin, Angjoo Kanazawa, Alexei A. Efros, Mathieu Aubry

    Abstract: Given a set of calibrated images of a scene, we present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives. While many approaches focus on recovering high-fidelity 3D scenes, we focus on parsing a scene into mid-level 3D representations made of a small set of textured primitives. Such representations are interpretable, easy to manipulate a… ▽ More

    Submitted 26 December, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

    Comments: Project webpage with code and videos: https://www.tmonnier.com/DBW. V2 update includes comparisons based on NeuS, hyperparameter analysis and failure cases

  5. arXiv:2303.03315  [pdf, other

    cs.CV cs.AI cs.RO

    MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision

    Authors: Antoine Guédon, Tom Monnier, Pascal Monasse, Vincent Lepetit

    Abstract: We introduce a method that simultaneously learns to explore new large environments and to reconstruct them in 3D from color images only. This is closely related to the Next Best View problem (NBV), where one has to identify where to move the camera next to improve the coverage of an unknown scene. However, most of the current NBV methods rely on depth sensors, need 3D supervision and/or do not sca… ▽ More

    Submitted 13 June, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: To appear at CVPR 2023. Project Webpage: https://imagine.enpc.fr/~guedona/MACARONS/

  6. arXiv:2302.01660  [pdf, other

    cs.CV

    The Learnable Typewriter: A Generative Approach to Text Analysis

    Authors: Ioannis Siglidis, Nicolas Gonthier, Julien Gaubil, Tom Monnier, Mathieu Aubry

    Abstract: We present a generative document-specific approach to character analysis and recognition in text lines. Our main idea is to build on unsupervised multi-object segmentation methods and in particular those that reconstruct images based on a limited amount of visual elements, called sprites. Taking as input a set of text lines with similar font or handwriting, our approach can learn a large number of… ▽ More

    Submitted 14 April, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: For the code and a quick-overview visit the project webpage at http://imagine.enpc.fr/~siglidii/learnable-typewriter

  7. arXiv:2212.10292  [pdf, other

    cs.CV cs.AI

    Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know How to Reason?

    Authors: Monika Wysoczańska, Tom Monnier, Tomasz Trzciński, David Picard

    Abstract: Recent advances in visual representation learning allowed to build an abundance of powerful off-the-shelf features that are ready-to-use for numerous downstream tasks. This work aims to assess how well these features preserve information about the objects, such as their spatial location, their visual properties and their relative relationships. We propose to do so by evaluating them in the context… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

  8. arXiv:2204.10310  [pdf, other

    cs.CV cs.GR

    Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency

    Authors: Tom Monnier, Matthew Fisher, Alexei A. Efros, Mathieu Aubry

    Abstract: Approaches for single-view reconstruction typically rely on viewpoint annotations, silhouettes, the absence of background, multiple views of the same instance, a template shape, or symmetry. We avoid all such supervision and assumptions by explicitly leveraging the consistency between images of different object instances. As a result, our method can learn from large collections of unlabelled image… ▽ More

    Submitted 25 July, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

    Comments: ECCV 2022. Project webpage with code and videos: http://imagine.enpc.fr/~monniert/UNICORN/

  9. arXiv:2109.01605  [pdf, other

    cs.CV

    Representing Shape Collections with Alignment-Aware Linear Models

    Authors: Romain Loiseau, Tom Monnier, Mathieu Aubry, Loïc Landrieu

    Abstract: In this paper, we revisit the classical representation of 3D point clouds as linear shape models. Our key insight is to leverage deep learning to represent a collection of shapes as affine transformations of low-dimensional linear shape models. Each linear model is characterized by a shape prototype, a low-dimensional shape basis and two neural networks. The networks take as input a point cloud an… ▽ More

    Submitted 17 December, 2021; v1 submitted 3 September, 2021; originally announced September 2021.

    Comments: Accepted to 3DV 2021. 17 pages, 10 figures. Code and data are available at: https://romainloiseau.github.io/deep-linear-shapes

  10. arXiv:2104.14575  [pdf, other

    cs.CV

    Unsupervised Layered Image Decomposition into Object Prototypes

    Authors: Tom Monnier, Elliot Vincent, Jean Ponce, Mathieu Aubry

    Abstract: We present an unsupervised learning framework for decomposing images into layers of automatically discovered object models. Contrary to recent approaches that model image layers with autoencoder networks, we represent them as explicit transformations of a small set of prototypical images. Our model has three main components: (i) a set of object prototypes in the form of learnable images with a tra… ▽ More

    Submitted 23 August, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: Accepted at ICCV 2021. Project webpage: https://imagine.enpc.fr/~monniert/DTI-Sprites

  11. docExtractor: An off-the-shelf historical document element extraction

    Authors: Tom Monnier, Mathieu Aubry

    Abstract: We present docExtractor, a generic approach for extracting visual elements such as text lines or illustrations from historical documents without requiring any real data annotation. We demonstrate it provides high-quality performances as an off-the-shelf system across a wide variety of datasets and leads to results on par with state-of-the-art when fine-tuned. We argue that the performance obtained… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

    Comments: Accepted at 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR) (oral). Project webpage: http://imagine.enpc.fr/~monniert/docExtractor/

  12. arXiv:2006.11132  [pdf, other

    cs.CV cs.LG stat.ML

    Deep Transformation-Invariant Clustering

    Authors: Tom Monnier, Thibault Groueix, Mathieu Aubry

    Abstract: Recent advances in image clustering typically focus on learning better deep representations. In contrast, we present an orthogonal approach that does not rely on abstract features but instead learns to predict image transformations and performs clustering directly in image space. This learning process naturally fits in the gradient-based training of K-means and Gaussian mixture model, without requ… ▽ More

    Submitted 27 October, 2020; v1 submitted 19 June, 2020; originally announced June 2020.

    Comments: Accepted at NeurIPS 2020 (oral). Project webpage: http://imagine.enpc.fr/~monniert/DTIClustering/