Skip to main content

Showing 1–34 of 34 results for author: Neverova, N

  1. arXiv:2407.02599  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Meta 3D Gen

    Authors: Raphael Bensadoun, Tom Monnier, Yanir Kleiman, Filippos Kokkinos, Yawar Siddiqui, Mahendra Kariya, Omri Harosh, Roman Shapovalov, Benjamin Graham, Emilien Garreau, Animesh Karnewar, Ang Cao, Idan Azuri, Iurii Makarov, Eric-Tuan Le, Antoine Toisoul, David Novotny, Oran Gafni, Natalia Neverova, Andrea Vedaldi

    Abstract: We introduce Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minute. It supports physically-based rendering (PBR), necessary for 3D asset relighting in real-world applications. Additionally, 3DGen supports generative retexturing of previously gener… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2407.02445  [pdf, other

    cs.CV cs.AI cs.GR

    Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials

    Authors: Yawar Siddiqui, Tom Monnier, Filippos Kokkinos, Mahendra Kariya, Yanir Kleiman, Emilien Garreau, Oran Gafni, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny

    Abstract: We present Meta 3D AssetGen (AssetGen), a significant advancement in text-to-3D generation which produces faithful, high-quality meshes with texture and material control. Compared to works that bake shading in the 3D object's appearance, AssetGen outputs physically-based rendering (PBR) materials, supporting realistic relighting. AssetGen generates first several views of the object with factored s… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Project Page: https://assetgen.github.io

  3. arXiv:2407.02430  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Meta 3D TextureGen: Fast and Consistent Texture Generation for 3D Objects

    Authors: Raphael Bensadoun, Yanir Kleiman, Idan Azuri, Omri Harosh, Andrea Vedaldi, Natalia Neverova, Oran Gafni

    Abstract: The recent availability and adaptability of text-to-image models has sparked a new era in many related domains that benefit from the learned text priors as well as high-quality and fast generation capabilities, one of which is texture generation for 3D objects. Although recent texture generation methods achieve impressive results by using text-to-image networks, the combination of global consisten… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  4. arXiv:2402.08682  [pdf, other

    cs.CV cs.AI cs.LG

    IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation

    Authors: Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Natalia Neverova, Andrea Vedaldi, Oran Gafni, Filippos Kokkinos

    Abstract: Most text-to-3D generators build upon off-the-shelf text-to-image models trained on billions of images. They use variants of Score Distillation Sampling (SDS), which is slow, somewhat unstable, and prone to artifacts. A mitigation is to fine-tune the 2D generator to be multi-view aware, which can help distillation or can be combined with reconstruction networks to output 3D objects directly. In th… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  5. arXiv:2312.09222  [pdf, other

    cs.CV cs.GR

    Mosaic-SDF for 3D Generative Models

    Authors: Lior Yariv, Omri Puny, Natalia Neverova, Oran Gafni, Yaron Lipman

    Abstract: Current diffusion or flow-based generative models for 3D shapes divide to two: distilling pre-trained 2D image diffusion models, and training directly on 3D shapes. When training a diffusion or flow models on 3D shapes a crucial design choice is the shape representation. An effective shape representation needs to adhere three design principles: it should allow an efficient conversion of large 3D d… ▽ More

    Submitted 24 April, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: More results and details can be found at https://lioryariv.github.io/msdf

  6. arXiv:2307.12067  [pdf, other

    cs.CV

    Replay: Multi-modal Multi-view Acted Videos for Casual Holography

    Authors: Roman Shapovalov, Yanir Kleiman, Ignacio Rocco, David Novotny, Andrea Vedaldi, Changan Chen, Filippos Kokkinos, Ben Graham, Natalia Neverova

    Abstract: We introduce Replay, a collection of multi-view, multi-modal videos of humans interacting socially. Each scene is filmed in high production quality, from different viewpoints with several static cameras, as well as wearable action cameras, and recorded with a large array of microphones at different positions in the room. Overall, the dataset contains over 4000 minutes of footage and over 7 million… ▽ More

    Submitted 22 July, 2023; originally announced July 2023.

    Comments: Accepted for ICCV 2023. Roman, Yanir, and Ignacio contributed equally

  7. arXiv:2307.07635  [pdf, other

    cs.CV

    CoTracker: It is Better to Track Together

    Authors: Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht

    Abstract: We introduce CoTracker, a transformer-based model that tracks dense points in a frame jointly across a video sequence. This differs from most existing state-of-the-art approaches that track points independently, ignoring their correlation. We show that joint tracking results in a significantly higher tracking accuracy and robustness. We also provide several technical innovations, including the con… ▽ More

    Submitted 26 December, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: Code and model weights are available at: https://co-tracker.github.io/

  8. arXiv:2305.02296  [pdf, other

    cs.CV cs.AI

    DynamicStereo: Consistent Dynamic Depth from Stereo Videos

    Authors: Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht

    Abstract: We consider the problem of reconstructing a dynamic scene observed from a stereo camera. Most existing methods for depth from stereo treat different stereo frames independently, leading to temporally inconsistent depth predictions. Temporal consistency is especially important for immersive AR or VR scenarios, where flickering greatly diminishes the user experience. We propose DynamicStereo, a nove… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: CVPR 2023; project page available at https://dynamic-stereo.github.io/

  9. arXiv:2303.11898  [pdf, other

    cs.CV cs.GR

    Real-time volumetric rendering of dynamic humans

    Authors: Ignacio Rocco, Iurii Makarov, Filippos Kokkinos, David Novotny, Benjamin Graham, Natalia Neverova, Andrea Vedaldi

    Abstract: We present a method for fast 3D reconstruction and real-time rendering of dynamic humans from monocular videos with accompanying parametric body fits. Our method can reconstruct a dynamic human in less than 3h using a single GPU, compared to recent state-of-the-art alternatives that take up to 72h. These speedups are obtained by using a lightweight deformation model solely based on linear blend sk… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: Project page: https://real-time-humans.github.io/

  10. arXiv:2301.08730  [pdf, other

    cs.CV cs.SD eess.AS

    Novel-View Acoustic Synthesis

    Authors: Changan Chen, Alexander Richard, Roman Shapovalov, Vamsi Krishna Ithapu, Natalia Neverova, Kristen Grauman, Andrea Vedaldi

    Abstract: We introduce the novel-view acoustic synthesis (NVAS) task: given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint? We propose a neural rendering approach: Visually-Guided Acoustic Synthesis (ViGAS) network that learns to synthesize the sound of an arbitrary point in space by analyzing the input audio-visual cues. To benc… ▽ More

    Submitted 24 October, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

    Comments: Accepted at CVPR 2023. Project page: https://vision.cs.utexas.edu/projects/nvas

  11. arXiv:2212.03236  [pdf, other

    cs.CV

    Self-Supervised Correspondence Estimation via Multiview Registration

    Authors: Mohamed El Banani, Ignacio Rocco, David Novotny, Andrea Vedaldi, Natalia Neverova, Justin Johnson, Benjamin Graham

    Abstract: Video provides us with the spatio-temporal consistency needed for visual learning. Recent approaches have utilized this signal to learn correspondence estimation from close-by frame pairs. However, by only relying on close-by frame pairs, those approaches miss out on the richer long-range consistency between distant overlapping frames. To address this, we propose a self-supervised approach for cor… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: Accepted to WACV 2023. Project page: https://mbanani.github.io/syncmatch/

  12. arXiv:2211.03889  [pdf, other

    cs.CV

    Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories

    Authors: Samarth Sinha, Roman Shapovalov, Jeremy Reizenstein, Ignacio Rocco, Natalia Neverova, Andrea Vedaldi, David Novotny

    Abstract: Obtaining photorealistic reconstructions of objects from sparse views is inherently ambiguous and can only be achieved by learning suitable reconstruction priors. Earlier works on sparse rigid object reconstruction successfully learned such priors from large datasets such as CO3D. In this paper, we extend this approach to dynamic objects. We use cats and dogs as a representative example and introd… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

  13. arXiv:2202.00368  [pdf, other

    cs.CV cs.LG

    Filtered-CoPhy: Unsupervised Learning of Counterfactual Physics in Pixel Space

    Authors: Steeven Janny, Fabien Baradel, Natalia Neverova, Madiha Nadri, Greg Mori, Christian Wolf

    Abstract: Learning causal relationships in high-dimensional data (images, videos) is a hard task, as they are often defined on low dimensional manifolds and must be extracted from complex signals dominated by appearance, lighting, textures and also spurious correlations in the data. We present a method for learning counterfactual reasoning of physical processes in pixel space, which requires the prediction… ▽ More

    Submitted 1 February, 2022; originally announced February 2022.

    Journal ref: International Conference on Learning Representation (2022)

  14. arXiv:2112.12761  [pdf, other

    cs.CV cs.GR

    BANMo: Building Animatable 3D Neural Models from Many Casual Videos

    Authors: Gengshan Yang, Minh Vo, Natalia Neverova, Deva Ramanan, Andrea Vedaldi, Hanbyul Joo

    Abstract: Prior work for articulated 3D shape reconstruction often relies on specialized sensors (e.g., synchronized multi-camera systems), or pre-built 3D deformable models (e.g., SMAL or SMPL). Such methods are not able to scale to diverse sets of objects in the wild. We present BANMo, a method that requires neither a specialized sensor nor a pre-defined template shape. BANMo builds high-fidelity, articul… ▽ More

    Submitted 3 April, 2023; v1 submitted 23 December, 2021; originally announced December 2021.

    Comments: CVPR 2022 camera-ready version (last update: May 2022)

  15. arXiv:2106.09758  [pdf, other

    cs.CV

    Discovering Relationships between Object Categories via Universal Canonical Maps

    Authors: Natalia Neverova, Artsiom Sanakoyeu, Patrick Labatut, David Novotny, Andrea Vedaldi

    Abstract: We tackle the problem of learning the geometry of multiple categories of deformable objects jointly. Recent work has shown that it is possible to learn a unified dense pose predictor for several categories of related objects. However, training such models requires to initialize inter-category correspondences by hand. This is suboptimal and the resulting models fail to maintain correct corresponden… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: Accepted at CVPR 2021; Project page: https://gdude.de/discovering-3d-obj-rel

  16. arXiv:2106.09681  [pdf, other

    cs.CV cs.LG

    XCiT: Cross-Covariance Image Transformers

    Authors: Alaaeldin El-Nouby, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Hervé Jegou

    Abstract: Following their success in natural language processing, transformers have recently shown much promise for computer vision. The self-attention operation underlying transformers yields global interactions between all tokens ,i.e. words or image patches, and enables flexible modelling of image data beyond the local interactions of convolutions. This flexibility, however, comes with a quadratic comple… ▽ More

    Submitted 18 June, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

  17. arXiv:2106.09431  [pdf, other

    cs.CV

    NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go

    Authors: Marvin Eisenberger, David Novotny, Gael Kerchenbaum, Patrick Labatut, Natalia Neverova, Daniel Cremers, Andrea Vedaldi

    Abstract: We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes and produces in one go, i.e. in a single feed forward pass, a smooth interpolation and point-to-point correspondences between them. The interpolation, expressed as a deformation field, changes the pose of the source shape to resemble the target, but leaves the object identity unchanged. NeuroMorph uses an el… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: Published at the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021

  18. arXiv:2104.08108  [pdf, other

    cs.CV cs.CL

    Cross-Modal Retrieval Augmentation for Multi-Modal Classification

    Authors: Shir Gur, Natalia Neverova, Chris Stauffer, Ser-Nam Lim, Douwe Kiela, Austin Reiter

    Abstract: Recent advances in using retrieval components over external knowledge sources have shown impressive results for a variety of downstream tasks in natural language processing. Here, we explore the use of unstructured external knowledge sources of images and their corresponding captions for improving visual question answering (VQA). First, we train a novel alignment model for embedding images and cap… ▽ More

    Submitted 16 April, 2021; originally announced April 2021.

  19. arXiv:2102.05644  [pdf, other

    cs.CV

    Training Vision Transformers for Image Retrieval

    Authors: Alaaeldin El-Nouby, Natalia Neverova, Ivan Laptev, Hervé Jégou

    Abstract: Transformers have shown outstanding results for natural language understanding and, more recently, for image classification. We here extend this work and propose a transformer-based approach for image retrieval: we adopt vision transformers for generating image descriptors and train the resulting model with a metric learning objective, which combines a contrastive loss with a differential entropy… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.

  20. arXiv:2011.12438  [pdf, other

    cs.CV

    Continuous Surface Embeddings

    Authors: Natalia Neverova, David Novotny, Vasil Khalidov, Marc Szafraniec, Patrick Labatut, Andrea Vedaldi

    Abstract: In this work, we focus on the task of learning and representing dense correspondences in deformable object categories. While this problem has been considered before, solutions so far have been rather ad-hoc for specific object types (i.e., humans), often with significant manual work involved. However, scaling the geometry understanding to all objects in nature requires more automated approaches th… ▽ More

    Submitted 24 November, 2020; originally announced November 2020.

    Comments: NeurIPS, 2020

  21. arXiv:2004.03686  [pdf, other

    cs.CV

    Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation

    Authors: Hanbyul Joo, Natalia Neverova, Andrea Vedaldi

    Abstract: Differently from 2D image datasets such as COCO, large-scale human datasets with 3D ground-truth annotations are very difficult to obtain in the wild. In this paper, we address this problem by augmenting existing 2D datasets with high-quality 3D pose fits. Remarkably, the resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-… ▽ More

    Submitted 21 October, 2021; v1 submitted 7 April, 2020; originally announced April 2020.

  22. arXiv:2003.00080  [pdf, other

    cs.CV

    Transferring Dense Pose to Proximal Animal Classes

    Authors: Artsiom Sanakoyeu, Vasil Khalidov, Maureen S. McCarthy, Andrea Vedaldi, Natalia Neverova

    Abstract: Recent contributions have demonstrated that it is possible to recognize the pose of humans densely and accurately given a large dataset of poses annotated in detail. In principle, the same approach could be extended to any animal class, but the effort required for collecting new annotations for each case makes this strategy impractical, despite important applications in natural conservation, scien… ▽ More

    Submitted 28 February, 2020; originally announced March 2020.

    Comments: Accepted at CVPR 2020; Project page: https://asanakoy.github.io/densepose-evolution

  23. arXiv:1909.12000  [pdf, other

    cs.CV

    CoPhy: Counterfactual Learning of Physical Dynamics

    Authors: Fabien Baradel, Natalia Neverova, Julien Mille, Greg Mori, Christian Wolf

    Abstract: Understanding causes and effects in mechanical systems is an essential component of reasoning in the physical world. This work poses a new problem of counterfactual learning of object mechanics from visual input. We develop the CoPhy benchmark to assess the capacity of the state-of-the-art models for causal physical reasoning in a synthetic 3D environment and propose a model for learning the physi… ▽ More

    Submitted 7 April, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

    Comments: ICLR 2020 -Spotlight presentation

  24. arXiv:1909.02533  [pdf, other

    cs.CV cs.GR

    C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion

    Authors: David Novotny, Nikhila Ravi, Benjamin Graham, Natalia Neverova, Andrea Vedaldi

    Abstract: We propose C3DPO, a method for extracting 3D models of deformable objects from 2D keypoint annotations in unconstrained images. We do so by learning a deep network that reconstructs a 3D object from a single view at a time, accounting for partial occlusions, and explicitly factoring the effects of viewpoint changes and object deformations. In order to achieve this factorization, we introduce a nov… ▽ More

    Submitted 15 October, 2019; v1 submitted 5 September, 2019; originally announced September 2019.

    Comments: Added a link to the source code into the abstract

    Journal ref: IEEE/CVF International Conference on Computer Vision 2019

  25. arXiv:1906.05706  [pdf, other

    cs.CV

    Slim DensePose: Thrifty Learning from Sparse Annotations and Motion Cues

    Authors: Natalia Neverova, James Thewlis, Rıza Alp Güler, Iasonas Kokkinos, Andrea Vedaldi

    Abstract: DensePose supersedes traditional landmark detectors by densely mapping image pixels to body surface coordinates. This power, however, comes at a greatly increased annotation time, as supervising the model requires to manually label hundreds of points per pose instance. In this work, we thus seek methods to significantly slim down the DensePose annotations, proposing more efficient data collection… ▽ More

    Submitted 13 June, 2019; originally announced June 2019.

    Comments: CVPR 2019

  26. arXiv:1809.01995  [pdf, other

    cs.CV

    Dense Pose Transfer

    Authors: Natalia Neverova, Riza Alp Guler, Iasonas Kokkinos

    Abstract: In this work we integrate ideas from surface-based modeling with neural synthesis: we propose a combination of surface-based pose estimation and deep generative models that allows us to perform accurate pose transfer, i.e. synthesize a new image of a person based on a single image of that person and the image of a pose donor. We use a dense pose estimation system that maps pixels from both images… ▽ More

    Submitted 6 September, 2018; originally announced September 2018.

    Comments: ECCV 2018

  27. arXiv:1806.06157  [pdf, other

    cs.CV

    Object Level Visual Reasoning in Videos

    Authors: Fabien Baradel, Natalia Neverova, Christian Wolf, Julien Mille, Greg Mori

    Abstract: Human activity recognition is typically addressed by detecting key concepts like global and local motion, features related to object classes present in the scene, as well as features related to the global context. The next open challenges in activity recognition require a level of understanding that pushes beyond this and call for models with capabilities for fine distinction and detailed comprehe… ▽ More

    Submitted 20 September, 2018; v1 submitted 15 June, 2018; originally announced June 2018.

    Comments: Accepted at ECCV 2018 - long version (16 pages + ref)

    Journal ref: ECCV 2018

  28. arXiv:1802.00434  [pdf, other

    cs.CV

    DensePose: Dense Human Pose Estimation In The Wild

    Authors: Rıza Alp Güler, Natalia Neverova, Iasonas Kokkinos

    Abstract: In this work, we establish dense correspondences between RGB image and a surface-based representation of the human body, a task we refer to as dense human pose estimation. We first gather dense correspondences for 50K persons appearing in the COCO dataset by introducing an efficient annotation pipeline. We then use our dataset to train CNN-based systems that deliver dense correspondence 'in the wi… ▽ More

    Submitted 1 February, 2018; originally announced February 2018.

  29. arXiv:1708.03816  [pdf, other

    cs.CV

    Mass Displacement Networks

    Authors: Natalia Neverova, Iasonas Kokkinos

    Abstract: Despite the large improvements in performance attained by using deep learning in computer vision, one can often further improve results with some additional post-processing that exploits the geometric nature of the underlying task. This commonly involves displacing the posterior distribution of a CNN in a way that makes it more appropriate for the task at hand, e.g. better aligned with local image… ▽ More

    Submitted 12 August, 2017; originally announced August 2017.

    Comments: 12 pages, 4 figures

  30. arXiv:1707.05373  [pdf, other

    stat.ML cs.AI cs.CR cs.CV cs.LG

    Houdini: Fooling Deep Structured Prediction Models

    Authors: Moustapha Cisse, Yossi Adi, Natalia Neverova, Joseph Keshet

    Abstract: Generating adversarial examples is a critical step for evaluating and improving the robustness of learning machines. So far, most existing methods only work for classification and are not designed to alter the true performance measure of the problem at hand. We introduce a novel flexible approach named Houdini for generating adversarial examples specifically tailored for the final performance meas… ▽ More

    Submitted 17 July, 2017; originally announced July 2017.

    Comments: 12 pages, 8 figures, under review

  31. arXiv:1703.07684  [pdf, other

    cs.CV cs.LG

    Predicting Deeper into the Future of Semantic Segmentation

    Authors: Pauline Luc, Natalia Neverova, Camille Couprie, Jakob Verbeek, Yann LeCun

    Abstract: The ability to predict and therefore to anticipate the future is an important attribute of intelligence. It is also of utmost importance in real-time systems, e.g. in robotics or autonomous driving, which depend on visual scene understanding for decision making. While prediction of the raw RGB pixel values in future video frames has been studied in previous work, here we introduce the novel task o… ▽ More

    Submitted 8 August, 2017; v1 submitted 22 March, 2017; originally announced March 2017.

    Comments: Accepted to ICCV 2017. Supplementary material available on the authors' webpages

  32. arXiv:1511.06728  [pdf, other

    cs.CV cs.AI cs.LG

    Hand Pose Estimation through Semi-Supervised and Weakly-Supervised Learning

    Authors: Natalia Neverova, Christian Wolf, Florian Nebout, Graham Taylor

    Abstract: We propose a method for hand pose estimation based on a deep regressor trained on two different kinds of input. Raw depth data is fused with an intermediate representation in the form of a segmentation of the hand into parts. This intermediate representation contains important topological information and provides useful cues for reasoning about joint locations. The mapping from raw depth to segmen… ▽ More

    Submitted 15 September, 2017; v1 submitted 20 November, 2015; originally announced November 2015.

    Comments: 13 pages, 10 figures, 4 tables

  33. arXiv:1511.03908  [pdf, other

    cs.LG cs.CV cs.NE

    Learning Human Identity from Motion Patterns

    Authors: Natalia Neverova, Christian Wolf, Griffin Lacey, Lex Fridman, Deepak Chandra, Brandon Barbello, Graham Taylor

    Abstract: We present a large-scale study exploring the capability of temporal deep neural networks to interpret natural human kinematics and introduce the first method for active biometric authentication with mobile inertial sensors. At Google, we have created a first-of-its-kind dataset of human movements, passively collected by 1500 volunteers using their smartphones daily over several months. We (1) comp… ▽ More

    Submitted 21 April, 2016; v1 submitted 12 November, 2015; originally announced November 2015.

    Comments: 10 pages, 6 figures, 2 tables

  34. arXiv:1501.00102  [pdf, other

    cs.CV cs.HC cs.LG

    ModDrop: adaptive multi-modal gesture recognition

    Authors: Natalia Neverova, Christian Wolf, Graham W. Taylor, Florian Nebout

    Abstract: We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalit… ▽ More

    Submitted 6 June, 2015; v1 submitted 31 December, 2014; originally announced January 2015.

    Comments: 14 pages, 7 figures