subscribe to arXiv mailings

Magic Insert: Style-Aware Drag-and-Drop

Authors: Nataniel Ruiz, Yuanzhen Li, Neal Wadhwa, Yael Pritch, Michael Rubinstein, David E. Jacobs, Shlomi Fruchter

Abstract: We present Magic Insert, a method for dragging-and-dropping subjects from a user-provided image into a target image of a different style in a physically plausible manner while matching the style of the target image. This work formalizes the problem of style-aware drag-and-drop and presents a method for tackling it by addressing two sub-problems: style-aware personalization and realistic object ins… ▽ More We present Magic Insert, a method for dragging-and-dropping subjects from a user-provided image into a target image of a different style in a physically plausible manner while matching the style of the target image. This work formalizes the problem of style-aware drag-and-drop and presents a method for tackling it by addressing two sub-problems: style-aware personalization and realistic object insertion in stylized images. For style-aware personalization, our method first fine-tunes a pretrained text-to-image diffusion model using LoRA and learned text tokens on the subject image, and then infuses it with a CLIP representation of the target style. For object insertion, we use Bootstrapped Domain Adaption to adapt a domain-specific photorealistic object insertion model to the domain of diverse artistic styles. Overall, the method significantly outperforms traditional approaches such as inpainting. Finally, we present a dataset, SubjectPlop, to facilitate evaluation and future progress in this area. Project page: https://magicinsert.github.io/ △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Project page: https://magicinsert.github.io/

arXiv:2405.17401 [pdf, other]

RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control

Authors: Litu Rout, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu

Abstract: We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. Existing training-free approaches exhibit difficulties in (a) style extraction from reference images in the absence of additional style or content text descriptions, (b) unwanted content leakage from reference style images, and (c) effective composition of styl… ▽ More We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. Existing training-free approaches exhibit difficulties in (a) style extraction from reference images in the absence of additional style or content text descriptions, (b) unwanted content leakage from reference style images, and (c) effective composition of style and content. RB-Modulation is built on a novel stochastic optimal controller where a style descriptor encodes the desired attributes through a terminal cost. The resulting drift not only overcomes the difficulties above, but also ensures high fidelity to the reference style and adheres to the given text prompt. We also introduce a cross-attention-based feature aggregation scheme that allows RB-Modulation to decouple content and style from the reference image. With theoretical justification and empirical evidence, our framework demonstrates precise extraction and control of content and style in a training-free manner. Further, our method allows a seamless composition of content and style, which marks a departure from the dependency on external adapters or ControlNets. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Preprint. Under review

arXiv:2402.16094 [pdf]

Bistochastically private release of data streams with zero delay

Authors: Nicolas Ruiz

Abstract: Although the bulk of the research in privacy and statistical disclosure control is designed for static data, more and more data are often collected as continuous streams, and extensions of popular privacy tools and models have been proposed for this scenario. However, most of these proposals require buffers, where incoming individuals are momentarily stored, anonymized, and then released following… ▽ More Although the bulk of the research in privacy and statistical disclosure control is designed for static data, more and more data are often collected as continuous streams, and extensions of popular privacy tools and models have been proposed for this scenario. However, most of these proposals require buffers, where incoming individuals are momentarily stored, anonymized, and then released following a delay, thus considering a data stream as a succession of batches while it is by nature continuous. Having a delay unavoidably alters data freshness but also, more critically, inordinately exerts constraints on what can be achieved in terms of protection and information preservation. By considering randomized response, and specifically its recent bistochastic extension, in the context of dynamic data, this paper proposes a protocol for the anonymization of data streams that achieves zero delay while exhibiting formal privacy guarantees. Using a new tool in the privacy literature that introduces the concept of elementary plausible deniability, we show that it is feasible to achieve an atomic processing of individuals entering a stream, in-stead of proceeding by batches. We illustrate the application of the proposed approach by an empirical example. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2311.13600 [pdf, other]

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Authors: Viraj Shah, Nataniel Ruiz, Forrester Cole, Erika Lu, Svetlana Lazebnik, Yuanzhen Li, Varun Jampani

Abstract: Methods for finetuning generative models for concept-driven personalization generally achieve strong results for subject-driven or style-driven generation. Recently, low-rank adaptations (LoRA) have been proposed as a parameter-efficient way of achieving concept-driven personalization. While recent work explores the combination of separate LoRAs to achieve joint generation of learned styles and su… ▽ More Methods for finetuning generative models for concept-driven personalization generally achieve strong results for subject-driven or style-driven generation. Recently, low-rank adaptations (LoRA) have been proposed as a parameter-efficient way of achieving concept-driven personalization. While recent work explores the combination of separate LoRAs to achieve joint generation of learned styles and subjects, existing techniques do not reliably address the problem; they often compromise either subject fidelity or style fidelity. We propose ZipLoRA, a method to cheaply and effectively merge independently trained style and subject LoRAs in order to achieve generation of any user-provided subject in any user-provided style. Experiments on a wide range of subject and style combinations show that ZipLoRA can generate compelling results with meaningful improvements over baselines in subject and style fidelity while preserving the ability to recontextualize. Project page: https://ziplora.github.io △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: Project page: https://ziplora.github.io

arXiv:2309.16668 [pdf, other]

doi 10.1145/3658237

RealFill: Reference-Driven Generation for Authentic Image Completion

Authors: Luming Tang, Nataniel Ruiz, Qinghao Chu, Yuanzhen Li, Aleksander Holynski, David E. Jacobs, Bharath Hariharan, Yael Pritch, Neal Wadhwa, Kfir Aberman, Michael Rubinstein

Abstract: Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions. However, the content these models hallucinate is necessarily inauthentic, since they are unaware of the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of a… ▽ More Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions. However, the content these models hallucinate is necessarily inauthentic, since they are unaware of the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of an image with the content that should have been there. RealFill is a generative inpainting model that is personalized using only a few reference images of a scene. These reference images do not have to be aligned with the target image, and can be taken with drastically varying viewpoints, lighting conditions, camera apertures, or image styles. Once personalized, RealFill is able to complete a target image with visually compelling contents that are faithful to the original scene. We evaluate RealFill on a new image completion benchmark that covers a set of diverse and challenging scenarios, and find that it outperforms existing approaches by a large margin. Project page: https://realfill.github.io △ Less

Submitted 14 May, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

Comments: SIGGRAPH 2024 (Journal Track). Project page: https://realfill.github.io

arXiv:2308.07317 [pdf, other]

Platypus: Quick, Cheap, and Powerful Refinement of LLMs

Authors: Ariel N. Lee, Cole J. Hunter, Nataniel Ruiz

Abstract: We present $\textbf{Platypus}$, a family of fine-tuned and merged Large Language Models (LLMs) that achieves the strongest performance and currently stands at first place in HuggingFace's Open LLM Leaderboard as of the release date of this work. In this work we describe (1) our curated dataset $\textbf{Open-Platypus}$, that is a subset of other open datasets and which… ▽ More We present $\textbf{Platypus}$, a family of fine-tuned and merged Large Language Models (LLMs) that achieves the strongest performance and currently stands at first place in HuggingFace's Open LLM Leaderboard as of the release date of this work. In this work we describe (1) our curated dataset $\textbf{Open-Platypus}$, that is a subset of other open datasets and which $\textit{we release to the public}$ (2) our process of fine-tuning and merging LoRA modules in order to conserve the strong prior of pretrained LLMs, while bringing specific domain knowledge to the surface (3) our efforts in checking for test data leaks and contamination in the training data, which can inform future research. Specifically, the Platypus family achieves strong performance in quantitative LLM metrics across model sizes, topping the global Open LLM leaderboard while using just a fraction of the fine-tuning data and overall compute that are required for other state-of-the-art fine-tuned LLMs. In particular, a 13B Platypus model can be trained on $\textit{a single}$ A100 GPU using 25k questions in 5 hours. This is a testament of the quality of our Open-Platypus dataset, and opens opportunities for more improvements in the field. Project page: https://platypus-llm.github.io △ Less

Submitted 14 March, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

Comments: Workshop on Instruction Tuning and Instruction Following at NeurIPS 2023

arXiv:2307.06949 [pdf, other]

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

Authors: Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Wei Wei, Tingbo Hou, Yael Pritch, Neal Wadhwa, Michael Rubinstein, Kfir Aberman

Abstract: Personalization has emerged as a prominent aspect within the field of generative AI, enabling the synthesis of individuals in diverse contexts and styles, while retaining high-fidelity to their identities. However, the process of personalization presents inherent challenges in terms of time and memory requirements. Fine-tuning each personalized model needs considerable GPU time investment, and sto… ▽ More Personalization has emerged as a prominent aspect within the field of generative AI, enabling the synthesis of individuals in diverse contexts and styles, while retaining high-fidelity to their identities. However, the process of personalization presents inherent challenges in terms of time and memory requirements. Fine-tuning each personalized model needs considerable GPU time investment, and storing a personalized model per subject can be demanding in terms of storage capacity. To overcome these challenges, we propose HyperDreamBooth-a hypernetwork capable of efficiently generating a small set of personalized weights from a single image of a person. By composing these weights into the diffusion model, coupled with fast finetuning, HyperDreamBooth can generate a person's face in various contexts and styles, with high subject details while also preserving the model's crucial knowledge of diverse styles and semantic modifications. Our method achieves personalization on faces in roughly 20 seconds, 25x faster than DreamBooth and 125x faster than Textual Inversion, using as few as one reference image, with the same quality and style diversity as DreamBooth. Also our method yields a model that is 10000x smaller than a normal DreamBooth model. Project page: https://hyperdreambooth.github.io △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: project page: https://hyperdreambooth.github.io

arXiv:2306.17848 [pdf, other]

Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

Authors: Ariel N. Lee, Sarah Adel Bargal, Janavi Kasera, Stan Sclaroff, Kate Saenko, Nataniel Ruiz

Abstract: Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional neural networks (CNNs). Although the jury is still out on which model type is superior, each has unique inductive biases that shape their learning and generalization performance. For example, ViTs have interesting propert… ▽ More Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional neural networks (CNNs). Although the jury is still out on which model type is superior, each has unique inductive biases that shape their learning and generalization performance. For example, ViTs have interesting properties with respect to early layer non-local feature dependence, as well as self-attention mechanisms which enhance learning flexibility, enabling them to ignore out-of-context image information more effectively. We hypothesize that this power to ignore out-of-context information (which we name $\textit{patch selectivity}$), while integrating in-context information in a non-local manner in early layers, allows ViTs to more easily handle occlusion. In this study, our aim is to see whether we can have CNNs $\textit{simulate}$ this ability of patch selectivity by effectively hardwiring this inductive bias using Patch Mixing data augmentation, which consists of inserting patches from another image onto a training image and interpolating labels between the two image classes. Specifically, we use Patch Mixing to train state-of-the-art ViTs and CNNs, assessing its impact on their ability to ignore out-of-context patches and handle natural occlusions. We find that ViTs do not improve nor degrade when trained using Patch Mixing, but CNNs acquire new capabilities to ignore out-of-context information and improve on occlusion benchmarks, leaving us to conclude that this training method is a way of simulating in CNNs the abilities that ViTs already possess. We will release our Patch Mixing implementation and proposed datasets for public use. Project page: https://arielnlee.github.io/PatchMixing/ △ Less

Submitted 30 June, 2023; originally announced June 2023.

arXiv:2306.00983 [pdf, other]

StyleDrop: Text-to-Image Generation in Any Style

Authors: Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein, Dilip Krishnan

Abstract: Pre-trained large text-to-image models synthesize impressive images with an appropriate use of text prompts. However, ambiguities inherent in natural language and out-of-distribution effects make it hard to synthesize image styles, that leverage a specific design pattern, texture or material. In this paper, we introduce StyleDrop, a method that enables the synthesis of images that faithfully follo… ▽ More Pre-trained large text-to-image models synthesize impressive images with an appropriate use of text prompts. However, ambiguities inherent in natural language and out-of-distribution effects make it hard to synthesize image styles, that leverage a specific design pattern, texture or material. In this paper, we introduce StyleDrop, a method that enables the synthesis of images that faithfully follow a specific style using a text-to-image model. The proposed method is extremely versatile and captures nuances and details of a user-provided style, such as color schemes, shading, design patterns, and local and global effects. It efficiently learns a new style by fine-tuning very few trainable parameters (less than $1\%$ of total model parameters) and improving the quality via iterative training with either human or automated feedback. Better yet, StyleDrop is able to deliver impressive results even when the user supplies only a single image that specifies the desired style. An extensive study shows that, for the task of style tuning text-to-image models, StyleDrop implemented on Muse convincingly outperforms other methods, including DreamBooth and textual inversion on Imagen or Stable Diffusion. More results are available at our project website: https://styledrop.github.io △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: Preprint. Project page at https://styledrop.github.io

arXiv:2304.00186 [pdf, other]

Subject-driven Text-to-Image Generation via Apprenticeship Learning

Authors: Wenhu Chen, Hexiang Hu, Yandong Li, Nataniel Ruiz, Xuhui Jia, Ming-Wei Chang, William W. Cohen

Abstract: Recent text-to-image generation models like DreamBooth have made remarkable progress in generating highly customized images of a target subject, by fine-tuning an ``expert model'' for a given subject from a few examples. However, this process is expensive, since a new expert model must be learned for each subject. In this paper, we present SuTI, a Subject-driven Text-to-Image generator that replac… ▽ More Recent text-to-image generation models like DreamBooth have made remarkable progress in generating highly customized images of a target subject, by fine-tuning an ``expert model'' for a given subject from a few examples. However, this process is expensive, since a new expert model must be learned for each subject. In this paper, we present SuTI, a Subject-driven Text-to-Image generator that replaces subject-specific fine tuning with in-context learning. Given a few demonstrations of a new subject, SuTI can instantly generate novel renditions of the subject in different scenes, without any subject-specific optimization. SuTI is powered by apprenticeship learning, where a single apprentice model is learned from data generated by a massive number of subject-specific expert models. Specifically, we mine millions of image clusters from the Internet, each centered around a specific visual subject. We adopt these clusters to train a massive number of expert models, each specializing in a different subject. The apprentice model SuTI then learns to imitate the behavior of these fine-tuned experts. SuTI can generate high-quality and customized subject-specific images 20x faster than optimization-based SoTA methods. On the challenging DreamBench and DreamBench-v2, our human evaluation shows that SuTI significantly outperforms existing models like InstructPix2Pix, Textual Inversion, Imagic, Prompt2Prompt, Re-Imagen and DreamBooth, especially on the subject and text alignment aspects. △ Less

Submitted 2 October, 2023; v1 submitted 31 March, 2023; originally announced April 2023.

Comments: Accepted at NeurIPS 2023. Model Service to be appear as Google Vertex AI - Instant Tuning (https://cloud.google.com/vertex-ai/docs/generative-ai/image/fine-tune-model). The link to demo video: https://www.youtube.com/watch?v=Q2xQ91D_dhM&t=2071s&ab_channel=GoogleCloud

arXiv:2303.13508 [pdf, other]

DreamBooth3D: Subject-Driven Text-to-3D Generation

Authors: Amit Raj, Srinivas Kaza, Ben Poole, Michael Niemeyer, Nataniel Ruiz, Ben Mildenhall, Shiran Zada, Kfir Aberman, Michael Rubinstein, Jonathan Barron, Yuanzhen Li, Varun Jampani

Abstract: We present DreamBooth3D, an approach to personalize text-to-3D generative models from as few as 3-6 casually captured images of a subject. Our approach combines recent advances in personalizing text-to-image models (DreamBooth) with text-to-3D generation (DreamFusion). We find that naively combining these methods fails to yield satisfactory subject-specific 3D assets due to personalized text-to-im… ▽ More We present DreamBooth3D, an approach to personalize text-to-3D generative models from as few as 3-6 casually captured images of a subject. Our approach combines recent advances in personalizing text-to-image models (DreamBooth) with text-to-3D generation (DreamFusion). We find that naively combining these methods fails to yield satisfactory subject-specific 3D assets due to personalized text-to-image models overfitting to the input viewpoints of the subject. We overcome this through a 3-stage optimization strategy where we jointly leverage the 3D consistency of neural radiance fields together with the personalization capability of text-to-image models. Our method can produce high-quality, subject-specific 3D assets with text-driven modifications such as novel poses, colors and attributes that are not seen in any of the input images of the subject. △ Less

Submitted 27 March, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: Project page at https://dreambooth3d.github.io/ Video Summary at https://youtu.be/kKVDrbfvOoA

arXiv:2303.09438 [pdf, other]

Trustera: A Live Conversation Redaction System

Authors: Evandro Gouvêa, Ali Dadgar, Shahab Jalalvand, Rathi Chengalvarayan, Badrinath Jayakumar, Ryan Price, Nicholas Ruiz, Jennifer McGovern, Srinivas Bangalore, Ben Stern

Abstract: Trustera, the first functional system that redacts personally identifiable information (PII) in real-time spoken conversations to remove agents' need to hear sensitive information while preserving the naturalness of live customer-agent conversations. As opposed to post-call redaction, audio masking starts as soon as the customer begins speaking to a PII entity. This significantly reduces the risk… ▽ More Trustera, the first functional system that redacts personally identifiable information (PII) in real-time spoken conversations to remove agents' need to hear sensitive information while preserving the naturalness of live customer-agent conversations. As opposed to post-call redaction, audio masking starts as soon as the customer begins speaking to a PII entity. This significantly reduces the risk of PII being intercepted or stored in insecure data storage. Trustera's architecture consists of a pipeline of automatic speech recognition, natural language understanding, and a live audio redactor module. The system's goal is three-fold: redact entities that are PII, mask the audio that goes to the agent, and at the same time capture the entity, so that the captured PII can be used for a payment transaction or caller identification. Trustera is currently being used by thousands of agents to secure customers' sensitive information. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: 5

arXiv:2211.16499 [pdf, other]

Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing

Authors: Nataniel Ruiz, Sarah Adel Bargal, Cihang Xie, Kate Saenko, Stan Sclaroff

Abstract: Modern deep neural networks tend to be evaluated on static test sets. One shortcoming of this is the fact that these deep neural networks cannot be easily evaluated for robustness issues with respect to specific scene variations. For example, it is hard to study the robustness of these networks to variations of object scale, object pose, scene lighting and 3D occlusions. The main reason is that co… ▽ More Modern deep neural networks tend to be evaluated on static test sets. One shortcoming of this is the fact that these deep neural networks cannot be easily evaluated for robustness issues with respect to specific scene variations. For example, it is hard to study the robustness of these networks to variations of object scale, object pose, scene lighting and 3D occlusions. The main reason is that collecting real datasets with fine-grained naturalistic variations of sufficient scale can be extremely time-consuming and expensive. In this work, we present Counterfactual Simulation Testing, a counterfactual framework that allows us to study the robustness of neural networks with respect to some of these naturalistic variations by building realistic synthetic scenes that allow us to ask counterfactual questions to the models, ultimately providing answers to questions such as "Would your classification still be correct if the object were viewed from the top?" or "Would your classification still be correct if the object were partially occluded by another object?". Our method allows for a fair comparison of the robustness of recently released, state-of-the-art Convolutional Neural Networks and Vision Transformers, with respect to these naturalistic variations. We find evidence that ConvNext is more robust to pose and scale variations than Swin, that ConvNext generalizes better to our simulated domain and that Swin handles partial occlusion better than ConvNext. We also find that robustness for all networks improves with network scale and with data scale and variety. We release the Naturalistic Variation Object Dataset (NVD), a large simulated dataset of 272k images of everyday objects with naturalistic variations such as object pose, scale, viewpoint, lighting and occlusions. Project page: https://counterfactualsimulation.github.io △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: Published at the Conference on Neural Information Processing Systems (NeurIPS) 2022

arXiv:2210.05667 [pdf, other]

Human Body Measurement Estimation with Adversarial Augmentation

Authors: Nataniel Ruiz, Miriam Bellver, Timo Bolkart, Ambuj Arora, Ming C. Lin, Javier Romero, Raja Bala

Abstract: We present a Body Measurement network (BMnet) for estimating 3D anthropomorphic measurements of the human body shape from silhouette images. Training of BMnet is performed on data from real human subjects, and augmented with a novel adversarial body simulator (ABS) that finds and synthesizes challenging body shapes. ABS is based on the skinned multiperson linear (SMPL) body model, and aims to maxi… ▽ More We present a Body Measurement network (BMnet) for estimating 3D anthropomorphic measurements of the human body shape from silhouette images. Training of BMnet is performed on data from real human subjects, and augmented with a novel adversarial body simulator (ABS) that finds and synthesizes challenging body shapes. ABS is based on the skinned multiperson linear (SMPL) body model, and aims to maximize BMnet measurement prediction error with respect to latent SMPL shape parameters. ABS is fully differentiable with respect to these parameters, and trained end-to-end via backpropagation with BMnet in the loop. Experiments show that ABS effectively discovers adversarial examples, such as bodies with extreme body mass indices (BMI), consistent with the rarity of extreme-BMI bodies in BMnet's training set. Thus ABS is able to reveal gaps in training data and potential failures in predicting under-represented body shapes. Results show that training BMnet with ABS improves measurement prediction accuracy on real bodies by up to 10%, when compared to no augmentation or random body shape sampling. Furthermore, our method significantly outperforms SOTA measurement estimation methods by as much as 3x. Finally, we release BodyM, the first challenging, large-scale dataset of photo silhouettes and body measurements of real human subjects, to further promote research in this area. Project website: https://adversarialbodysim.github.io △ Less

Submitted 11 October, 2022; originally announced October 2022.

Comments: Published at the International Conference on 3D Vision (3DV) 2022

arXiv:2208.12242 [pdf, other]

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

Authors: Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman

Abstract: Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a new approach for "personalization" of text-to-image… ▽ More Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a new approach for "personalization" of text-to-image diffusion models. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can be used to synthesize novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering, all while preserving the subject's key features. We also provide a new dataset and evaluation protocol for this new task of subject-driven generation. Project page: https://dreambooth.github.io/ △ Less

Submitted 15 March, 2023; v1 submitted 25 August, 2022; originally announced August 2022.

Comments: Published at CVPR 2023. Project page: https://dreambooth.github.io/

arXiv:2207.03940 [pdf]

Bistochastic privacy

Authors: Nicolas Ruiz, Josep Domingo-Ferrer

Abstract: We introduce a new privacy model relying on bistochastic matrices, that is, matrices whose components are nonnegative and sum to 1 both row-wise and column-wise. This class of matrices is used to both define privacy guarantees and a tool to apply protection on a data set. The bistochasticity assumption happens to connect several fields of the privacy literature, including the two most popular mode… ▽ More We introduce a new privacy model relying on bistochastic matrices, that is, matrices whose components are nonnegative and sum to 1 both row-wise and column-wise. This class of matrices is used to both define privacy guarantees and a tool to apply protection on a data set. The bistochasticity assumption happens to connect several fields of the privacy literature, including the two most popular models, k-anonymity and differential privacy. Moreover, it establishes a bridge with information theory, which simplifies the thorny issue of evaluating the utility of a protected data set. Bistochastic privacy also clarifies the trade-off between protection and utility by using bits, which can be viewed as a natural currency to comprehend and operationalize this trade-off, in the same way than bits are used in information theory to capture uncertainty. A discussion on the suitable parameterization of bistochastic matrices to achieve the privacy guarantees of this new model is also provided. △ Less

Submitted 8 July, 2022; originally announced July 2022.

Comments: To be published in Lecture Notes in Artificial Intelligence vol 13408, Modeling Decisions for Artificial Intelligence 19th International Conference MDAI 2022, Sant Cugat, Catalonia, August 30 - 2 September 2022

arXiv:2107.09126 [pdf, other]

Examining the Human Perceptibility of Black-Box Adversarial Attacks on Face Recognition

Authors: Benjamin Spetter-Goldstein, Nataniel Ruiz, Sarah Adel Bargal

Abstract: The modern open internet contains billions of public images of human faces across the web, especially on social media websites used by half the world's population. In this context, Face Recognition (FR) systems have the potential to match faces to specific names and identities, creating glaring privacy concerns. Adversarial attacks are a promising way to grant users privacy from FR systems by disr… ▽ More The modern open internet contains billions of public images of human faces across the web, especially on social media websites used by half the world's population. In this context, Face Recognition (FR) systems have the potential to match faces to specific names and identities, creating glaring privacy concerns. Adversarial attacks are a promising way to grant users privacy from FR systems by disrupting their capability to recognize faces. Yet, such attacks can be perceptible to human observers, especially under the more challenging black-box threat model. In the literature, the justification for the imperceptibility of such attacks hinges on bounding metrics such as $\ell_p$ norms. However, there is not much research on how these norms match up with human perception. Through examining and measuring both the effectiveness of recent black-box attacks in the face recognition setting and their corresponding human perceptibility through survey data, we demonstrate the trade-offs in perceptibility that occur as attacks become more aggressive. We also show how the $\ell_2$ norm and other metrics do not correlate with human perceptibility in a linear fashion, thus making these norms suboptimal at measuring adversarial attack perceptibility. △ Less

Submitted 19 July, 2021; originally announced July 2021.

Comments: 5 pages, 5 figures, submitted to AdvML @ ICML 2021

arXiv:2106.06811 [pdf, other]

Case Study on Detecting COVID-19 Health-Related Misinformation in Social Media

Authors: Mir Mehedi A. Pritom, Rosana Montanez Rodriguez, Asad Ali Khan, Sebastian A. Nugroho, Esra'a Alrashydah, Beatrice N. Ruiz, Anthony Rios

Abstract: COVID-19 pandemic has generated what public health officials called an infodemic of misinformation. As social distancing and stay-at-home orders came into effect, many turned to social media for socializing. This increase in social media usage has made it a prime vehicle for the spreading of misinformation. This paper presents a mechanism to detect COVID-19 health-related misinformation in social… ▽ More COVID-19 pandemic has generated what public health officials called an infodemic of misinformation. As social distancing and stay-at-home orders came into effect, many turned to social media for socializing. This increase in social media usage has made it a prime vehicle for the spreading of misinformation. This paper presents a mechanism to detect COVID-19 health-related misinformation in social media following an interdisciplinary approach. Leveraging social psychology as a foundation and existing misinformation frameworks, we defined misinformation themes and associated keywords incorporated into the misinformation detection mechanism using applied machine learning techniques. Next, using the Twitter dataset, we explored the performance of the proposed methodology using multiple state-of-the-art machine learning classifiers. Our method shows promising results with at most 78% accuracy in classifying health-related misinformation versus true information using uni-gram-based NLP feature generations from tweets and the Decision Tree classifier. We also provide suggestions on alternatives for countering misinformation and ethical consideration for the study. △ Less

Submitted 12 June, 2021; originally announced June 2021.

Comments: 10 pages

arXiv:2106.04569 [pdf, other]

Simulated Adversarial Testing of Face Recognition Models

Authors: Nataniel Ruiz, Adam Kortylewski, Weichao Qiu, Cihang Xie, Sarah Adel Bargal, Alan Yuille, Stan Sclaroff

Abstract: Most machine learning models are validated and tested on fixed datasets. This can give an incomplete picture of the capabilities and weaknesses of the model. Such weaknesses can be revealed at test time in the real world. The risks involved in such failures can be loss of profits, loss of time or even loss of life in certain critical applications. In order to alleviate this issue, simulators can b… ▽ More Most machine learning models are validated and tested on fixed datasets. This can give an incomplete picture of the capabilities and weaknesses of the model. Such weaknesses can be revealed at test time in the real world. The risks involved in such failures can be loss of profits, loss of time or even loss of life in certain critical applications. In order to alleviate this issue, simulators can be controlled in a fine-grained manner using interpretable parameters to explore the semantic image manifold. In this work, we propose a framework for learning how to test machine learning algorithms using simulators in an adversarial manner in order to find weaknesses in the model before deploying it in critical scenarios. We apply this method in a face recognition setup. We show that certain weaknesses of models trained on real data can be discovered using simulated samples. Using our proposed method, we can find adversarial synthetic faces that fool contemporary face recognition models. This demonstrates the fact that these models have weaknesses that are not measured by commonly used validation datasets. We hypothesize that this type of adversarial examples are not isolated, but usually lie in connected spaces in the latent space of the simulator. We present a method to find these adversarial regions as opposed to the typical adversarial points found in the adversarial example literature. △ Less

Submitted 31 May, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

Comments: Published at IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022

arXiv:2012.05225 [pdf, other]

MorphGAN: One-Shot Face Synthesis GAN for Detecting Recognition Bias

Authors: Nataniel Ruiz, Barry-John Theobald, Anurag Ranjan, Ahmed Hussein Abdelaziz, Nicholas Apostoloff

Abstract: To detect bias in face recognition networks, it can be useful to probe a network under test using samples in which only specific attributes vary in some controlled way. However, capturing a sufficiently large dataset with specific control over the attributes of interest is difficult. In this work, we describe a simulator that applies specific head pose and facial expression adjustments to images o… ▽ More To detect bias in face recognition networks, it can be useful to probe a network under test using samples in which only specific attributes vary in some controlled way. However, capturing a sufficiently large dataset with specific control over the attributes of interest is difficult. In this work, we describe a simulator that applies specific head pose and facial expression adjustments to images of previously unseen people. The simulator first fits a 3D morphable model to a provided image, applies the desired head pose and facial expression controls, then renders the model into an image. Next, a conditional Generative Adversarial Network (GAN) conditioned on the original image and the rendered morphable model is used to produce the image of the original person with the new facial expression and head pose. We call this conditional GAN -- MorphGAN. Images generated using MorphGAN conserve the identity of the person in the original image, and the provided control over head pose and facial expression allows test sets to be created to identify robustness issues of a facial recognition deep network with respect to pose and expression. Images generated by MorphGAN can also serve as data augmentation when training data are scarce. We show that by augmenting small datasets of faces with new poses and expressions improves the recognition performance by up to 9% depending on the augmentation and data scarcity. △ Less

Submitted 10 December, 2020; v1 submitted 9 December, 2020; originally announced December 2020.

arXiv:2006.06493 [pdf, other]

Protecting Against Image Translation Deepfakes by Leaking Universal Perturbations from Black-Box Neural Networks

Authors: Nataniel Ruiz, Sarah Adel Bargal, Stan Sclaroff

Abstract: In this work, we develop efficient disruptions of black-box image translation deepfake generation systems. We are the first to demonstrate black-box deepfake generation disruption by presenting image translation formulations of attacks initially proposed for classification models. Nevertheless, a naive adaptation of classification black-box attacks results in a prohibitive number of queries for im… ▽ More In this work, we develop efficient disruptions of black-box image translation deepfake generation systems. We are the first to demonstrate black-box deepfake generation disruption by presenting image translation formulations of attacks initially proposed for classification models. Nevertheless, a naive adaptation of classification black-box attacks results in a prohibitive number of queries for image translation systems in the real-world. We present a frustratingly simple yet highly effective algorithm Leaking Universal Perturbations (LUP), that significantly reduces the number of queries needed to attack an image. LUP consists of two phases: (1) a short leaking phase where we attack the network using traditional black-box attacks and gather information on successful attacks on a small dataset and (2) and an exploitation phase where we leverage said information to subsequently attack the network with improved efficiency. Our attack reduces the total number of queries necessary to attack GANimation and StarGAN by 30%. △ Less

Submitted 11 June, 2020; originally announced June 2020.

arXiv:2003.02501 [pdf, other]

Detecting Attended Visual Targets in Video

Authors: Eunji Chong, Yongxin Wang, Nataniel Ruiz, James M. Rehg

Abstract: We address the problem of detecting attention targets in video. Our goal is to identify where each person in each frame of a video is looking, and correctly handle the case where the gaze target is out-of-frame. Our novel architecture models the dynamic interaction between the scene and head features and infers time-varying attention targets. We introduce a new annotated dataset, VideoAttentionTar… ▽ More We address the problem of detecting attention targets in video. Our goal is to identify where each person in each frame of a video is looking, and correctly handle the case where the gaze target is out-of-frame. Our novel architecture models the dynamic interaction between the scene and head features and infers time-varying attention targets. We introduce a new annotated dataset, VideoAttentionTarget, containing complex and dynamic patterns of real-world gaze behavior. Our experiments show that our model can effectively infer dynamic attention in videos. In addition, we apply our predicted attention maps to two social gaze behavior recognition tasks, and show that the resulting classifiers significantly outperform existing methods. We achieve state-of-the-art performance on three datasets: GazeFollow (static images), VideoAttentionTarget (videos), and VideoCoAtt (videos), and obtain the first results for automatically classifying clinically-relevant gaze behavior without wearable cameras or eye trackers. △ Less

Submitted 30 March, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

Comments: Accepted to CVPR 2020

arXiv:2003.01279 [pdf, other]

Disrupting Deepfakes: Adversarial Attacks Against Conditional Image Translation Networks and Facial Manipulation Systems

Authors: Nataniel Ruiz, Sarah Adel Bargal, Stan Sclaroff

Abstract: Face modification systems using deep learning have become increasingly powerful and accessible. Given images of a person's face, such systems can generate new images of that same person under different expressions and poses. Some systems can also modify targeted attributes such as hair color or age. This type of manipulated images and video have been coined Deepfakes. In order to prevent a malicio… ▽ More Face modification systems using deep learning have become increasingly powerful and accessible. Given images of a person's face, such systems can generate new images of that same person under different expressions and poses. Some systems can also modify targeted attributes such as hair color or age. This type of manipulated images and video have been coined Deepfakes. In order to prevent a malicious user from generating modified images of a person without their consent we tackle the new problem of generating adversarial attacks against such image translation systems, which disrupt the resulting output image. We call this problem disrupting deepfakes. Most image translation architectures are generative models conditioned on an attribute (e.g. put a smile on this person's face). We are first to propose and successfully apply (1) class transferable adversarial attacks that generalize to different classes, which means that the attacker does not need to have knowledge about the conditioning class, and (2) adversarial training for generative adversarial networks (GANs) as a first step towards robust image translation networks. Finally, in gray-box scenarios, blurring can mount a successful defense against disruption. We present a spread-spectrum adversarial attack, which evades blur defenses. Our open-source code can be found at https://github.com/natanielruiz/disrupting-deepfakes. △ Less

Submitted 27 April, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

Comments: Accepted at CVPR 2020 Workshop on Adversarial Machine Learning in Computer Vision

arXiv:2002.05242 [pdf, other]

Leveraging Affect Transfer Learning for Behavior Prediction in an Intelligent Tutoring System

Authors: Nataniel Ruiz, Hao Yu, Danielle A. Allessio, Mona Jalal, Ajjen Joshi, Thomas Murray, John J. Magee, Jacob R. Whitehill, Vitaly Ablavsky, Ivon Arroyo, Beverly P. Woolf, Stan Sclaroff, Margrit Betke

Abstract: In this work, we propose a video-based transfer learning approach for predicting problem outcomes of students working with an intelligent tutoring system (ITS). By analyzing a student's face and gestures, our method predicts the outcome of a student answering a problem in an ITS from a video feed. Our work is motivated by the reasoning that the ability to predict such outcomes enables tutoring sys… ▽ More In this work, we propose a video-based transfer learning approach for predicting problem outcomes of students working with an intelligent tutoring system (ITS). By analyzing a student's face and gestures, our method predicts the outcome of a student answering a problem in an ITS from a video feed. Our work is motivated by the reasoning that the ability to predict such outcomes enables tutoring systems to adjust interventions, such as hints and encouragement, and to ultimately yield improved student learning. We collected a large labeled dataset of student interactions with an intelligent online math tutor consisting of 68 sessions, where 54 individual students solved 2,749 problems. The dataset is public and available at https://www.cs.bu.edu/faculty/betke/research/learning/ . Working with this dataset, our transfer-learning challenge was to design a representation in the source domain of pictures obtained "in the wild" for the task of facial expression analysis, and transferring this learned representation to the task of human behavior prediction in the domain of webcam videos of students in a classroom environment. We developed a novel facial affect representation and a user-personalized training scheme that unlocks the potential of this representation. We designed several variants of a recurrent neural network that models the temporal structure of video sequences of students solving math problems. Our final model, named ATL-BP for Affect Transfer Learning for Behavior Prediction, achieves a relative increase in mean F-score of 50% over the state-of-the-art method on this new dataset. △ Less

Submitted 8 April, 2022; v1 submitted 12 February, 2020; originally announced February 2020.

Comments: Published at IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2021 - Best Poster Award (4% award rate)

arXiv:1904.11024 [pdf, other]

doi 10.1109/ASRU.2015.7404808

Phonetically-Oriented Word Error Alignment for Speech Recognition Error Analysis in Speech Translation

Authors: Nicholas Ruiz, Marcello Federico

Abstract: We propose a variation to the commonly used Word Error Rate (WER) metric for speech recognition evaluation which incorporates the alignment of phonemes, in the absence of time boundary information. After computing the Levenshtein alignment on words in the reference and hypothesis transcripts, spans of adjacent errors are converted into phonemes with word and syllable boundaries and a phonetic Leve… ▽ More We propose a variation to the commonly used Word Error Rate (WER) metric for speech recognition evaluation which incorporates the alignment of phonemes, in the absence of time boundary information. After computing the Levenshtein alignment on words in the reference and hypothesis transcripts, spans of adjacent errors are converted into phonemes with word and syllable boundaries and a phonetic Levenshtein alignment is performed. The aligned phonemes are recombined into aligned words that adjust the word alignment labels in each error region. We demonstrate that our Phonetically-Oriented Word Error Rate (POWER) yields similar scores to WER with the added advantages of better word alignments and the ability to capture one-to-many word alignments corresponding to homophonic errors in speech recognition hypotheses. These improved alignments allow us to better trace the impact of Levenshtein error types on downstream tasks such as speech translation. △ Less

Submitted 24 April, 2019; originally announced April 2019.

Comments: IEEE Workshop on Automatic Speech Recognition and Understanding, December 2015

arXiv:1904.10997 [pdf, ps, other]

doi 10.21437/Interspeech.2017-1690

Assessing the Tolerance of Neural Machine Translation Systems Against Speech Recognition Errors

Authors: Nicholas Ruiz, Mattia Antonino Di Gangi, Nicola Bertoldi, Marcello Federico

Abstract: Machine translation systems are conventionally trained on textual resources that do not model phenomena that occur in spoken language. While the evaluation of neural machine translation systems on textual inputs is actively researched in the literature , little has been discovered about the complexities of translating spoken language data with neural models. We introduce and motivate interesting p… ▽ More Machine translation systems are conventionally trained on textual resources that do not model phenomena that occur in spoken language. While the evaluation of neural machine translation systems on textual inputs is actively researched in the literature , little has been discovered about the complexities of translating spoken language data with neural models. We introduce and motivate interesting problems one faces when considering the translation of automatic speech recognition (ASR) outputs on neural machine translation (NMT) systems. We test the robustness of sentence encoding approaches for NMT encoder-decoder modeling, focusing on word-based over byte-pair encoding. We compare the translation of utterances containing ASR errors in state-of-the-art NMT encoder-decoder systems against a strong phrase-based machine translation baseline in order to better understand which phenomena present in ASR outputs are better represented under the NMT framework than approaches that represent translation as a linear model. △ Less

Submitted 24 April, 2019; originally announced April 2019.

Comments: Interspeech 2017

arXiv:1902.00607 [pdf, other]

Detecting Gaze Towards Eyes in Natural Social Interactions and its Use in Child Assessment

Authors: Eunji Chong, Katha Chanda, Zhefan Ye, Audrey Southerland, Nataniel Ruiz, Rebecca M. Jones, Agata Rozga, James M. Rehg

Abstract: Eye contact is a crucial element of non-verbal communication that signifies interest, attention, and participation in social interactions. As a result, measures of eye contact arise in a variety of applications such as the assessment of the social communication skills of children at risk for developmental disorders such as autism, or the analysis of turn-taking and social roles during group meetin… ▽ More Eye contact is a crucial element of non-verbal communication that signifies interest, attention, and participation in social interactions. As a result, measures of eye contact arise in a variety of applications such as the assessment of the social communication skills of children at risk for developmental disorders such as autism, or the analysis of turn-taking and social roles during group meetings. However, the automated measurement of visual attention during naturalistic social interactions is challenging due to the difficulty of estimating a subject's looking direction from video. This paper proposes a novel approach to eye contact detection during adult-child social interactions in which the adult wears a point-of-view camera which captures an egocentric view of the child's behavior. By analyzing the child's face regions and inferring their head pose we can accurately identify the onset and duration of the child's looks to their social partner's eyes. We introduce the Pose-Implicit CNN, a novel deep learning architecture that predicts eye contact while implicitly estimating the head pose. We present a fully automated system for eye contact detection that solves the sub-problems of end-to-end feature learning and pose estimation using deep neural networks. To train our models, we use a dataset comprising 22 hours of 156 play session videos from over 100 children, half of whom are diagnosed with Autism Spectrum Disorder. We report an overall precision of 0.76, recall of 0.80, and an area under the precision-recall curve of 0.79, all of which are significant improvements over existing methods. △ Less

Submitted 1 February, 2019; originally announced February 2019.

Comments: Published in Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) Volume 1. Winner of IMWUT Distinguished Paper Award

arXiv:1810.02513 [pdf, other]

Learning To Simulate

Authors: Nataniel Ruiz, Samuel Schulter, Manmohan Chandraker

Abstract: Simulation is a useful tool in situations where training data for machine learning models is costly to annotate or even hard to acquire. In this work, we propose a reinforcement learning-based method for automatically adjusting the parameters of any (non-differentiable) simulator, thereby controlling the distribution of synthesized data in order to maximize the accuracy of a model trained on that… ▽ More Simulation is a useful tool in situations where training data for machine learning models is costly to annotate or even hard to acquire. In this work, we propose a reinforcement learning-based method for automatically adjusting the parameters of any (non-differentiable) simulator, thereby controlling the distribution of synthesized data in order to maximize the accuracy of a model trained on that data. In contrast to prior art that hand-crafts these simulation parameters or adjusts only parts of the available parameters, our approach fully controls the simulator with the actual underlying goal of maximizing accuracy, rather than mimicking the real data distribution or randomly generating a large volume of data. We find that our approach (i) quickly converges to the optimal simulation parameters in controlled experiments and (ii) can indeed discover good sets of parameters for an image rendering simulator in actual computer vision applications. △ Less

Submitted 13 May, 2019; v1 submitted 5 October, 2018; originally announced October 2018.

Comments: Published at International Conference on Learning Representations (ICLR) 2019

arXiv:1809.08381 [pdf, other]

Learning to Localize and Align Fine-Grained Actions to Sparse Instructions

Authors: Meera Hahn, Nataniel Ruiz, Jean-Baptiste Alayrac, Ivan Laptev, James M. Rehg

Abstract: Automatic generation of textual video descriptions that are time-aligned with video content is a long-standing goal in computer vision. The task is challenging due to the difficulty of bridging the semantic gap between the visual and natural language domains. This paper addresses the task of automatically generating an alignment between a set of instructions and a first person video demonstrating… ▽ More Automatic generation of textual video descriptions that are time-aligned with video content is a long-standing goal in computer vision. The task is challenging due to the difficulty of bridging the semantic gap between the visual and natural language domains. This paper addresses the task of automatically generating an alignment between a set of instructions and a first person video demonstrating an activity. The sparse descriptions and ambiguity of written instructions create significant alignment challenges. The key to our approach is the use of egocentric cues to generate a concise set of action proposals, which are then matched to recipe steps using object recognition and computational linguistic techniques. We obtain promising results on both the Extended GTEA Gaze+ dataset and the Bristol Egocentric Object Interactions Dataset. △ Less

Submitted 22 September, 2018; originally announced September 2018.

arXiv:1807.10437 [pdf, other]

Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency

Authors: Eunji Chong, Nataniel Ruiz, Yongxin Wang, Yun Zhang, Agata Rozga, James Rehg

Abstract: This paper addresses the challenging problem of estimating the general visual attention of people in images. Our proposed method is designed to work across multiple naturalistic social scenarios and provides a full picture of the subject's attention and gaze. In contrast, earlier works on gaze and attention estimation have focused on constrained problems in more specific contexts. In particular, o… ▽ More This paper addresses the challenging problem of estimating the general visual attention of people in images. Our proposed method is designed to work across multiple naturalistic social scenarios and provides a full picture of the subject's attention and gaze. In contrast, earlier works on gaze and attention estimation have focused on constrained problems in more specific contexts. In particular, our model explicitly represents the gaze direction and handles out-of-frame gaze targets. We leverage three different datasets using a multi-task learning approach. We evaluate our method on widely used benchmarks for single-tasks such as gaze angle estimation and attention-within-an-image, as well as on the new challenging task of generalized visual attention prediction. In addition, we have created extended annotations for the MMDB and GazeFollow datasets which are used in our experiments, which we will publicly release. △ Less

Submitted 27 July, 2018; originally announced July 2018.

Comments: Appears in: European Conference on Computer Vision (ECCV) 2018

arXiv:1805.04453 [pdf, other]

Bootstrapping Multilingual Intent Models via Machine Translation for Dialog Automation

Authors: Nicholas Ruiz, Srinivas Bangalore, John Chen

Abstract: With the resurgence of chat-based dialog systems in consumer and enterprise applications, there has been much success in developing data-driven and rule-based natural language models to understand human intent. Since these models require large amounts of data and in-domain knowledge, expanding an equivalent service into new markets is disrupted by language barriers that inhibit dialog automation.… ▽ More With the resurgence of chat-based dialog systems in consumer and enterprise applications, there has been much success in developing data-driven and rule-based natural language models to understand human intent. Since these models require large amounts of data and in-domain knowledge, expanding an equivalent service into new markets is disrupted by language barriers that inhibit dialog automation. This paper presents a user study to evaluate the utility of out-of-the-box machine translation technology to (1) rapidly bootstrap multilingual spoken dialog systems and (2) enable existing human analysts to understand foreign language utterances. We additionally evaluate the utility of machine translation in human assisted environments, where a portion of the traffic is processed by analysts. In English->Spanish experiments, we observe a high potential for dialog automation, as well as the potential for human analysts to process foreign language utterances with high accuracy. △ Less

Submitted 11 May, 2018; originally announced May 2018.

Comments: 6 pages, 3 figures, accepted for publication at the 2018 European Association for Machine Translation Conference (EAMT 2018)

arXiv:1712.02557 [pdf]

A general cipher for individual data anonymization

Authors: Nicolas Ruiz

Abstract: Over the years, the literature on individual data anonymization has burgeoned in many directions. Borrowing from several areas of other sciences, the current diversity of concepts, models and tools available contributes to understanding and fostering individual data dissemination in a privacy-preserving way, as well as unleashing new sources of information for the benefits of society at large. How… ▽ More Over the years, the literature on individual data anonymization has burgeoned in many directions. Borrowing from several areas of other sciences, the current diversity of concepts, models and tools available contributes to understanding and fostering individual data dissemination in a privacy-preserving way, as well as unleashing new sources of information for the benefits of society at large. However, such diversity doesn't come without some difficulties. Currently, the task of selecting the optimal analytical environment to conduct anonymization is complicated by the multitude of available choices. Based on recent contributions from the literature and inspired by cryptography, this paper proposes the first cipher for data anonymization. The functioning of this cipher shows that, in fact, every anonymization method can be viewed as a general form of rank swapping with unconstrained permutation structures. Beyond all the currently existing methods that it can mimic, this cipher offers a new way to practice data anonymization, notably by performing anonymization in an ex ante way, instead of being engaged in several ex post evaluations and iterations to reach the protection and information properties sought after. Moreover, the properties of this cipher point to some previously unknown general insights into the task of data anonymization considered at a general level of functioning. Finally, and to make the cipher operational, this paper proposes the introduction of permutation menus in data anonymization, where recently developed universal measures of disclosure risk and information loss are used ex ante for the calibration of permutation keys. To justify the relevance of their uses, a theoretical characterization of these measures is also proposed. △ Less

Submitted 7 December, 2017; originally announced December 2017.

arXiv:1712.02549 [pdf]

A multiplicative masking method for preserving the skewness of the original micro-records

Authors: Nicolas Ruiz

Abstract: Masking methods for the safe dissemination of microdata consist of distorting the original data while preserving a pre-defined set of statistical properties in the microdata. For continuous variables, available methodologies rely essentially on matrix masking and in particular on adding noise to the original values, using more or less refined procedures depending on the extent of information that… ▽ More Masking methods for the safe dissemination of microdata consist of distorting the original data while preserving a pre-defined set of statistical properties in the microdata. For continuous variables, available methodologies rely essentially on matrix masking and in particular on adding noise to the original values, using more or less refined procedures depending on the extent of information that one seeks to preserve. Almost all of these methods make use of the critical assumption that the original datasets follow a normal distribution and/or that the noise has such a distribution. This assumption is, however, restrictive in the sense that few variables follow empirically a Gaussian pattern: the distribution of household income, for example, is positively skewed, and this skewness is essential information that has to be considered and preserved. This paper addresses these issues by presenting a simple multiplicative masking method that preserves skewness of the original data while offering a sufficient level of disclosure risk control. Numerical examples are provided, leading to the suggestion that this method could be well-suited for the dissemination of a broad range of microdata, including those based on administrative and business records. △ Less

Submitted 7 December, 2017; originally announced December 2017.

arXiv:1710.00925 [pdf, other]

Fine-Grained Head Pose Estimation Without Keypoints

Authors: Nataniel Ruiz, Eunji Chong, James M. Rehg

Abstract: Estimating the head pose of a person is a crucial problem that has a large amount of applications such as aiding in gaze estimation, modeling attention, fitting 3D models to video and performing face alignment. Traditionally head pose is computed by estimating some keypoints from the target face and solving the 2D to 3D correspondence problem with a mean human head model. We argue that this is a f… ▽ More Estimating the head pose of a person is a crucial problem that has a large amount of applications such as aiding in gaze estimation, modeling attention, fitting 3D models to video and performing face alignment. Traditionally head pose is computed by estimating some keypoints from the target face and solving the 2D to 3D correspondence problem with a mean human head model. We argue that this is a fragile method because it relies entirely on landmark detection performance, the extraneous head model and an ad-hoc fitting step. We present an elegant and robust way to determine pose by training a multi-loss convolutional neural network on 300W-LP, a large synthetically expanded dataset, to predict intrinsic Euler angles (yaw, pitch and roll) directly from image intensities through joint binned pose classification and regression. We present empirical tests on common in-the-wild pose benchmark datasets which show state-of-the-art results. Additionally we test our method on a dataset usually used for pose estimation using depth and start to close the gap with state-of-the-art depth pose methods. We open-source our training and testing code as well as release our pre-trained models. △ Less

Submitted 13 April, 2018; v1 submitted 2 October, 2017; originally announced October 2017.

Comments: Accepted to Computer Vision and Pattern Recognition Workshops (CVPRW), 2018 IEEE Conference on. IEEE, 2018

Journal ref: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2018, pp. 2074-2083

arXiv:1708.04370 [pdf, other]

Dockerface: an Easy to Install and Use Faster R-CNN Face Detector in a Docker Container

Authors: Nataniel Ruiz, James M. Rehg

Abstract: Face detection is a very important task and a necessary pre-processing step for many applications such as facial landmark detection, pose estimation, sentiment analysis and face recognition. Not only is face detection an important pre-processing step in computer vision applications but also in computational psychology, behavioral imaging and other fields where researchers might not be initiated in… ▽ More Face detection is a very important task and a necessary pre-processing step for many applications such as facial landmark detection, pose estimation, sentiment analysis and face recognition. Not only is face detection an important pre-processing step in computer vision applications but also in computational psychology, behavioral imaging and other fields where researchers might not be initiated in computer vision frameworks and state-of-the-art detection applications. A large part of existing research that includes face detection as a pre-processing step uses existing out-of-the-box detectors such as the HoG-based dlib and the OpenCV Haar face detector which are no longer state-of-the-art - they are primarily used because of their ease of use and accessibility. We introduce Dockerface, a very accurate Faster R-CNN face detector in a Docker container which requires no training and is easy to install and use. △ Less

Submitted 5 April, 2018; v1 submitted 14 August, 2017; originally announced August 2017.

arXiv:1701.08419 [pdf]

On some consequences of the permutation paradigm for data anonymization: centrality of permutation matrices, universal measures of disclosure risk and information loss, evaluation by dominance

Authors: Nicolas Ruiz

Abstract: Recently, the permutation paradigm has been proposed in data anonymization to describe any micro data masking method as permutation, paving the way for performing meaningful analytical comparisons of methods, something that is difficult currently in statistical disclosure control research. This paper explores some consequences of this paradigm by establishing some class of universal measures of di… ▽ More Recently, the permutation paradigm has been proposed in data anonymization to describe any micro data masking method as permutation, paving the way for performing meaningful analytical comparisons of methods, something that is difficult currently in statistical disclosure control research. This paper explores some consequences of this paradigm by establishing some class of universal measures of disclosure risk and information loss that can be used for the evaluation and comparison of any method, under any parametrization and independently of the characteristics of the data to be anonymized. These measures lead to the introduction in data anonymization of the concepts of dominance in disclosure risk and information loss, which formalise the fact that different parties involved in micro data transaction can all have different sensitivities to privacy and information. △ Less

Submitted 29 January, 2017; originally announced January 2017.

arXiv:0801.4150 [pdf]

e-Science perspectives in Venezuela

Authors: G. Diaz, J. Florez-Lopez, V. Hamar, H. Hoeger, C. Mendoza, Z. Mendez, L. A. Nunez, N. Ruiz, R. Torrens, M. Uzcategui

Abstract: We describe the e-Science strategy in Venezuela, in particular initiatives by the Centro Nacional de Calculo Cientifico Universidad de Los Andes (CECALCULA), Merida, the Universidad de Los Andes (ULA), Merida, and the Instituto Venezolano de Investigaciones Cientificas (IVIC), Caracas. We present the plans for the Venezuelan Academic Grid and the current status of Grid ULA supported by Internet2… ▽ More We describe the e-Science strategy in Venezuela, in particular initiatives by the Centro Nacional de Calculo Cientifico Universidad de Los Andes (CECALCULA), Merida, the Universidad de Los Andes (ULA), Merida, and the Instituto Venezolano de Investigaciones Cientificas (IVIC), Caracas. We present the plans for the Venezuelan Academic Grid and the current status of Grid ULA supported by Internet2. We show different web-based scientific applications that are being developed in quantum chemistry, atomic physics, structural damage analysis, biomedicine and bioclimate within the framework of the E-Infrastructure shared between Europe and Latin America (EELA) △ Less

Submitted 27 January, 2008; originally announced January 2008.

Comments: Presented at the Third Conference of the EELA Project in Catania, Italy, Dec 2007

Journal ref: Proceedings of the Third EELA Conference, R. Gavela, B. Marechal, R. Barbera, L.N. Ciuffo, R. Mayo. (Editors), CIEMAT, Madrid, Spain (2007), pp 131-139

arXiv:0707.3531 [pdf, other]

e-Science initiatives in Venezuela

Authors: J. L. Chaves, G. Diaz, V. Hamar, R. Isea, F. Rojas, N. Ruiz, R. Torrens, M. Uzcategui, J. Florez-Lopez, H. Hoeger, C. Mendoza, L. A. Nunez

Abstract: Within the context of the nascent e-Science infrastructure in Venezuela, we describe several web-based scientific applications developed at the Centro Nacional de Calculo Cientifico Universidad de Los Andes (CeCalCULA), Merida, and at the Instituto Venezolano de Investigaciones Cientificas (IVIC), Caracas. The different strategies that have been followed for implementing quantum chemistry and at… ▽ More Within the context of the nascent e-Science infrastructure in Venezuela, we describe several web-based scientific applications developed at the Centro Nacional de Calculo Cientifico Universidad de Los Andes (CeCalCULA), Merida, and at the Instituto Venezolano de Investigaciones Cientificas (IVIC), Caracas. The different strategies that have been followed for implementing quantum chemistry and atomic physics applications are presented. We also briefly discuss a damage portal based on dynamic, nonlinear, finite elements of lumped damage mechanics and a biomedical portal developed within the framework of the \textit{E-Infrastructure shared between Europe and Latin America} (EELA) initiative for searching common sequences and inferring their functions in parasitic diseases such as leishmaniasis, chagas and malaria. △ Less

Submitted 24 July, 2007; originally announced July 2007.

Comments: 9 pages, 4 figures

Journal ref: Procceedings Spanish Conference on e-Science Grid Computing, J. Casado, R. Mayo y R. Munoz (Editors) CIEMAT, Madrid Spain (2007), pp 45 - 52

Showing 1–38 of 38 results for author: Ruiz, N