Graphics
See recent articles
- [1] arXiv:2407.12786 (cross-list from cs.HC) [pdf, html, other]
-
Title: Cube2Pipes : Investigating Hybrid Gameplay Using AR and a Tangible 3D PuzzleSubjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)
We present our game, Cube2Pipes, as an attempt to investigate a unique gameplay design where we use a tangible 3D spatial puzzle, in the form of a 2X2 Rubik's Cube, as an interface to a tabletop mobile augmented reality (AR) game. The game interface adapts to user movement and interaction with both virtual and tangible elements via computer vision based tracking. This game can be seen as an instance of generic interactive hybrid systems as it involves interaction with both virtual and real, tangible elements. We present a thorough user evaluation about various aspects of the gameplay in order to answer the question as to whether hybrid gameplay involving both real and virtual interfaces and elements is more captivating and preferred by users, than standard (baseline) gameplay with only virtual elements. We use multiple industry standard user study questionnaires to try and answer this question. We also try to determine whether the game facilitates understanding of the spatial moves required to solve a Rubik's Cube, and the efficacy of a tangible puzzle interface to a tabletop AR game.
- [2] arXiv:2407.12800 (cross-list from cs.HC) [pdf, other]
-
Title: Digital Storytelling for Competence Development in GamesSubjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)
The acquisition of complex knowledge and competences raises difficult challenges for the supporting tools within the corporate environment, which digital storytelling presents a potential solution. Traditionally, a driving goal of digital storytelling is the generation of dramatic stories with human significance, but for learning purposes, the need for drama is complemented by the requirement of achieving particular learning outcomes. This paper presents a narrative engine that supports emergent storytelling to support the development of complex competences in the learning domains of project management and innovation. The approach is based on the adaptation on the Fabula model combined with cases representing situated contexts associated to particular competences. These cases are then triggered to influence the unfolding of the story such that a learner encounters dramatic points in the narrative where the associated competences need to be used. In addition to the description of the approach and corresponding narrative engine, an illustration is presented of how the competence 'conflict management' influences a story.
- [3] arXiv:2407.12884 (cross-list from cs.LG) [pdf, html, other]
-
Title: SurroFlow: A Flow-Based Surrogate Model for Parameter Space Exploration and Uncertainty QuantificationComments: To be published in Proc. IEEE VIS 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Human-Computer Interaction (cs.HC)
Existing deep learning-based surrogate models facilitate efficient data generation, but fall short in uncertainty quantification, efficient parameter space exploration, and reverse prediction. In our work, we introduce SurroFlow, a novel normalizing flow-based surrogate model, to learn the invertible transformation between simulation parameters and simulation outputs. The model not only allows accurate predictions of simulation outcomes for a given simulation parameter but also supports uncertainty quantification in the data generation process. Additionally, it enables efficient simulation parameter recommendation and exploration. We integrate SurroFlow and a genetic algorithm as the backend of a visual interface to support effective user-guided ensemble simulation exploration and visualization. Our framework significantly reduces the computational costs while enhancing the reliability and exploration capabilities of scientific surrogate models.
- [4] arXiv:2407.13677 (cross-list from cs.CV) [pdf, html, other]
-
Title: PASTA: Controllable Part-Aware Shape Generation with Autoregressive TransformersSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
The increased demand for tools that automate the 3D content creation process led to tremendous progress in deep generative models that can generate diverse 3D objects of high fidelity. In this paper, we present PASTA, an autoregressive transformer architecture for generating high quality 3D shapes. PASTA comprises two main components: An autoregressive transformer that generates objects as a sequence of cuboidal primitives and a blending network, implemented with a transformer decoder that composes the sequences of cuboids and synthesizes high quality meshes for each object. Our model is trained in two stages: First we train our autoregressive generative model using only annotated cuboidal parts as supervision and next, we train our blending network using explicit 3D supervision, in the form of watertight meshes. Evaluations on various ShapeNet objects showcase the ability of our model to perform shape generation from diverse inputs \eg from scratch, from a partial object, from text and images, as well size-guided generation, by explicitly conditioning on a bounding box that defines the object's boundaries. Moreover, as our model considers the underlying part-based structure of a 3D object, we are able to select a specific part and produce shapes with meaningful variations of this part. As evidenced by our experiments, our model generates 3D shapes that are both more realistic and diverse than existing part-based and non part-based methods, while at the same time is simpler to implement and train.
- [5] arXiv:2407.13759 (cross-list from cs.CV) [pdf, html, other]
-
Title: Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video DiffusionComments: *Equal Contributions, Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
We present a method for generating Streetscapes-long sequences of views through an on-the-fly synthesized city-scale scene. Our generation is conditioned by language input (e.g., city name, weather), as well as an underlying map/layout hosting the desired trajectory. Compared to recent models for video generation or 3D view synthesis, our method can scale to much longer-range camera trajectories, spanning several city blocks, while maintaining visual quality and consistency. To achieve this goal, we build on recent work on video diffusion, used within an autoregressive framework that can easily scale to long sequences. In particular, we introduce a new temporal imputation method that prevents our autoregressive approach from drifting from the distribution of realistic city imagery. We train our Streetscapes system on a compelling source of data-posed imagery from Google Street View, along with contextual map data-which allows users to generate city views conditioned on any desired city layout, with controllable camera poses. Please see more results at our project page at this https URL.
Cross submissions for Friday, 19 July 2024 (showing 5 of 5 entries )
- [6] arXiv:2405.05967 (replaced) [pdf, html, other]
-
Title: Distilling Diffusion Models into Conditional GANsMinguk Kang, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, Taesung ParkComments: Project page: this https URL (ECCV2024)Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference, while preserving image quality. Our approach interprets diffusion distillation as a paired image-to-image translation task, using noise-to-image pairs of the diffusion model's ODE trajectory. For efficient regression loss computation, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space, utilizing an ensemble of augmentations. Furthermore, we adapt a diffusion model to construct a multi-scale discriminator with a text alignment loss to build an effective conditional GAN-based formulation. E-LatentLPIPS converges more efficiently than many existing distillation methods, even accounting for dataset construction costs. We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models -- DMD, SDXL-Turbo, and SDXL-Lightning -- on the zero-shot COCO benchmark.
- [7] arXiv:2406.09413 (replaced) [pdf, html, other]
-
Title: Interpreting the Weight Space of Customized Diffusion ModelsAmil Dravid, Yossi Gandelsman, Kuan-Chieh Wang, Rameen Abdal, Gordon Wetzstein, Alexei A. Efros, Kfir AbermanComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
We investigate the space of weights spanned by a large collection of customized diffusion models. We populate this space by creating a dataset of over 60,000 models, each of which is a base model fine-tuned to insert a different person's visual identity. We model the underlying manifold of these weights as a subspace, which we term weights2weights. We demonstrate three immediate applications of this space -- sampling, editing, and inversion. First, as each point in the space corresponds to an identity, sampling a set of weights from it results in a model encoding a novel identity. Next, we find linear directions in this space corresponding to semantic edits of the identity (e.g., adding a beard). These edits persist in appearance across generated samples. Finally, we show that inverting a single image into this space reconstructs a realistic identity, even if the input image is out of distribution (e.g., a painting). Our results indicate that the weight space of fine-tuned diffusion models behaves as an interpretable latent space of identities.