subscribe to arXiv mailings

Nonlinear Craig Interpolant Generation over Unbounded Domains by Separating Semialgebraic Sets

Authors: Hao Wu, Jie Wang, Bican Xia, Xiakun Li, Naijun Zhan, Ting Gan

Abstract: Interpolation-based techniques become popular in recent years, as they can improve the scalability of existing verification techniques due to their inherent modularity and local reasoning capabilities. Synthesizing Craig interpolants is the cornerstone of these techniques. In this paper, we investigate nonlinear Craig interpolant synthesis for two polynomial formulas of the general form, essenti… ▽ More Interpolation-based techniques become popular in recent years, as they can improve the scalability of existing verification techniques due to their inherent modularity and local reasoning capabilities. Synthesizing Craig interpolants is the cornerstone of these techniques. In this paper, we investigate nonlinear Craig interpolant synthesis for two polynomial formulas of the general form, essentially corresponding to the underlying mathematical problem to separate two disjoint semialgebraic sets. By combining the homogenization approach with existing techniques, we prove the existence of a novel class of non-polynomial interpolants called semialgebraic interpolants. These semialgebraic interpolants subsume polynomial interpolants as a special case. To the best of our knowledge, this is the first existence result of this kind. Furthermore, we provide complete sum-of-squares characterizations for both polynomial and semialgebraic interpolants, which can be efficiently solved as semidefinite programs. Examples are provided to demonstrate the effectiveness and efficiency of our approach. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 21 pages (with appendix); accepted by the 26th International Symposium on Formal Methods (FM2024)

arXiv:2404.14066 [pdf, other]

SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval

Authors: Xuzheng Yu, Chen Jiang, Xingning Dong, Tian Gan, Ming Yang, Qingpei Guo

Abstract: The user base of short video apps has experienced unprecedented growth in recent years, resulting in a significant demand for video content analysis. In particular, text-video retrieval, which aims to find the top matching videos given text descriptions from a vast video corpus, is an essential function, the primary challenge of which is to bridge the modality gap. Nevertheless, most existing appr… ▽ More The user base of short video apps has experienced unprecedented growth in recent years, resulting in a significant demand for video content analysis. In particular, text-video retrieval, which aims to find the top matching videos given text descriptions from a vast video corpus, is an essential function, the primary challenge of which is to bridge the modality gap. Nevertheless, most existing approaches treat texts merely as discrete tokens and neglect their syntax structures. Moreover, the abundant spatial and temporal clues in videos are often underutilized due to the lack of interaction with text. To address these issues, we argue that using texts as guidance to focus on relevant temporal frames and spatial regions within videos is beneficial. In this paper, we propose a novel Syntax-Hierarchy-Enhanced text-video retrieval method (SHE-Net) that exploits the inherent semantic and syntax hierarchy of texts to bridge the modality gap from two perspectives. First, to facilitate a more fine-grained integration of visual content, we employ the text syntax hierarchy, which reveals the grammatical structure of text descriptions, to guide the visual representations. Second, to further enhance the multi-modal interaction and alignment, we also utilize the syntax hierarchy to guide the similarity calculation. We evaluated our method on four public text-video retrieval datasets of MSR-VTT, MSVD, DiDeMo, and ActivityNet. The experimental results and ablation studies confirm the advantages of our proposed method. △ Less

Submitted 6 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

arXiv:2403.11035 [pdf]

Multiplane Quantitative Phase Imaging Using a Wavelength-Multiplexed Diffractive Optical Processor

Authors: Che-Yung Shen, Jingxi Li, Tianyi Gan, Yuhang Li, Langxing Bai, Mona Jarrahi, Aydogan Ozcan

Abstract: Quantitative phase imaging (QPI) is a label-free technique that provides optical path length information for transparent specimens, finding utility in biology, materials science, and engineering. Here, we present quantitative phase imaging of a 3D stack of phase-only objects using a wavelength-multiplexed diffractive optical processor. Utilizing multiple spatially engineered diffractive layers tra… ▽ More Quantitative phase imaging (QPI) is a label-free technique that provides optical path length information for transparent specimens, finding utility in biology, materials science, and engineering. Here, we present quantitative phase imaging of a 3D stack of phase-only objects using a wavelength-multiplexed diffractive optical processor. Utilizing multiple spatially engineered diffractive layers trained through deep learning, this diffractive processor can transform the phase distributions of multiple 2D objects at various axial positions into intensity patterns, each encoded at a unique wavelength channel. These wavelength-multiplexed patterns are projected onto a single field-of-view (FOV) at the output plane of the diffractive processor, enabling the capture of quantitative phase distributions of input objects located at different axial planes using an intensity-only image sensor. Based on numerical simulations, we show that our diffractive processor could simultaneously achieve all-optical quantitative phase imaging across several distinct axial planes at the input by scanning the illumination wavelength. A proof-of-concept experiment with a 3D-fabricated diffractive processor further validated our approach, showcasing successful imaging of two distinct phase objects at different axial positions by scanning the illumination wavelength in the terahertz spectrum. Diffractive network-based multiplane QPI designs can open up new avenues for compact on-chip phase imaging and sensing devices. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: 27 Pages, 9 Figures

arXiv:2402.02397 [pdf]

doi 10.1002/lpor.202400238

Multiplexed all-optical permutation operations using a reconfigurable diffractive optical network

Authors: Guangdong Ma, Xilin Yang, Bijie Bai, Jingxi Li, Yuhang Li, Tianyi Gan, Che-Yung Shen, Yijie Zhang, Yuzhu Li, Mona Jarrahi, Aydogan Ozcan

Abstract: Large-scale and high-dimensional permutation operations are important for various applications in e.g., telecommunications and encryption. Here, we demonstrate the use of all-optical diffractive computing to execute a set of high-dimensional permutation operations between an input and output field-of-view through layer rotations in a diffractive optical network. In this reconfigurable multiplexed… ▽ More Large-scale and high-dimensional permutation operations are important for various applications in e.g., telecommunications and encryption. Here, we demonstrate the use of all-optical diffractive computing to execute a set of high-dimensional permutation operations between an input and output field-of-view through layer rotations in a diffractive optical network. In this reconfigurable multiplexed material designed by deep learning, every diffractive layer has four orientations: 0, 90, 180, and 270 degrees. Each unique combination of these rotatable layers represents a distinct rotation state of the diffractive design tailored for a specific permutation operation. Therefore, a K-layer rotatable diffractive material is capable of all-optically performing up to 4^K independent permutation operations. The original input information can be decrypted by applying the specific inverse permutation matrix to output patterns, while applying other inverse operations will lead to loss of information. We demonstrated the feasibility of this reconfigurable multiplexed diffractive design by approximating 256 randomly selected permutation matrices using K=4 rotatable diffractive layers. We also experimentally validated this reconfigurable diffractive network using terahertz radiation and 3D-printed diffractive layers, providing a decent match to our numerical results. The presented rotation-multiplexed diffractive processor design is particularly useful due to its mechanical reconfigurability, offering multifunctional representation through a single fabrication process. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: 37 Pages, 10 Figures

Journal ref: Laser & Photonics Reviews (2024)

arXiv:2401.17773 [pdf, other]

doi 10.1109/TCSVT.2023.3303945

SNP-S3: Shared Network Pre-training and Significant Semantic Strengthening for Various Video-Text Tasks

Authors: Xingning Dong, Qingpei Guo, Tian Gan, Qing Wang, Jianlong Wu, Xiangyuan Ren, Yuan Cheng, Wei Chu

Abstract: We present a framework for learning cross-modal video representations by directly pre-training on raw data to facilitate various downstream video-text tasks. Our main contributions lie in the pre-training framework and proxy tasks. First, based on the shortcomings of two mainstream pixel-level pre-training architectures (limited applications or less efficient), we propose Shared Network Pre-traini… ▽ More We present a framework for learning cross-modal video representations by directly pre-training on raw data to facilitate various downstream video-text tasks. Our main contributions lie in the pre-training framework and proxy tasks. First, based on the shortcomings of two mainstream pixel-level pre-training architectures (limited applications or less efficient), we propose Shared Network Pre-training (SNP). By employing one shared BERT-type network to refine textual and cross-modal features simultaneously, SNP is lightweight and could support various downstream applications. Second, based on the intuition that people always pay attention to several "significant words" when understanding a sentence, we propose the Significant Semantic Strengthening (S3) strategy, which includes a novel masking and matching proxy task to promote the pre-training performance. Experiments conducted on three downstream video-text tasks and six datasets demonstrate that, we establish a new state-of-the-art in pixel-level video-text pre-training; we also achieve a satisfactory balance between the pre-training efficiency and the fine-tuning performance. The codebase are available at https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/snps3_vtp. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Comments: Accepted by TCSVT (IEEE Transactions on Circuits and Systems for Video Technology)

arXiv:2401.16779 [pdf]

doi 10.1038/s41377-024-01482-6

All-optical complex field imaging using diffractive processors

Authors: Jingxi Li, Yuhang Li, Tianyi Gan, Che-Yung Shen, Mona Jarrahi, Aydogan Ozcan

Abstract: Complex field imaging, which captures both the amplitude and phase information of input optical fields or objects, can offer rich structural insights into samples, such as their absorption and refractive index distributions. However, conventional image sensors are intensity-based and inherently lack the capability to directly measure the phase distribution of a field. This limitation can be overco… ▽ More Complex field imaging, which captures both the amplitude and phase information of input optical fields or objects, can offer rich structural insights into samples, such as their absorption and refractive index distributions. However, conventional image sensors are intensity-based and inherently lack the capability to directly measure the phase distribution of a field. This limitation can be overcome using interferometric or holographic methods, often supplemented by iterative phase retrieval algorithms, leading to a considerable increase in hardware complexity and computational demand. Here, we present a complex field imager design that enables snapshot imaging of both the amplitude and quantitative phase information of input fields using an intensity-based sensor array without any digital processing. Our design utilizes successive deep learning-optimized diffractive surfaces that are structured to collectively modulate the input complex field, forming two independent imaging channels that perform amplitude-to-amplitude and phase-to-intensity transformations between the input and output planes within a compact optical design, axially spanning ~100 wavelengths. The intensity distributions of the output fields at these two channels on the sensor plane directly correspond to the amplitude and quantitative phase profiles of the input complex field, eliminating the need for any digital image reconstruction algorithms. We experimentally validated the efficacy of our complex field diffractive imager designs through 3D-printed prototypes operating at the terahertz spectrum, with the output amplitude and phase channel images closely aligning with our numerical simulations. We envision that this complex field imager will have various applications in security, biomedical imaging, sensing and material science, among others. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: 25 Pages, 6 Figures

Journal ref: Light: Science & Applications (2024)

arXiv:2401.08923 [pdf]

doi 10.1186/s43593-024-00067-5

Subwavelength Imaging using a Solid-Immersion Diffractive Optical Processor

Authors: Jingtian Hu, Kun Liao, Niyazi Ulas Dinc, Carlo Gigli, Bijie Bai, Tianyi Gan, Xurong Li, Hanlong Chen, Xilin Yang, Yuhang Li, Cagatay Isil, Md Sadman Sakib Rahman, Jingxi Li, Xiaoyong Hu, Mona Jarrahi, Demetri Psaltis, Aydogan Ozcan

Abstract: Phase imaging is widely used in biomedical imaging, sensing, and material characterization, among other fields. However, direct imaging of phase objects with subwavelength resolution remains a challenge. Here, we demonstrate subwavelength imaging of phase and amplitude objects based on all-optical diffractive encoding and decoding. To resolve subwavelength features of an object, the diffractive im… ▽ More Phase imaging is widely used in biomedical imaging, sensing, and material characterization, among other fields. However, direct imaging of phase objects with subwavelength resolution remains a challenge. Here, we demonstrate subwavelength imaging of phase and amplitude objects based on all-optical diffractive encoding and decoding. To resolve subwavelength features of an object, the diffractive imager uses a thin, high-index solid-immersion layer to transmit high-frequency information of the object to a spatially-optimized diffractive encoder, which converts/encodes high-frequency information of the input into low-frequency spatial modes for transmission through air. The subsequent diffractive decoder layers (in air) are jointly designed with the encoder using deep-learning-based optimization, and communicate with the encoder layer to create magnified images of input objects at its output, revealing subwavelength features that would otherwise be washed away due to diffraction limit. We demonstrate that this all-optical collaboration between a diffractive solid-immersion encoder and the following decoder layers in air can resolve subwavelength phase and amplitude features of input objects in a highly compact design. To experimentally demonstrate its proof-of-concept, we used terahertz radiation and developed a fabrication method for creating monolithic multi-layer diffractive processors. Through these monolithically fabricated diffractive encoder-decoder pairs, we demonstrated phase-to-intensity transformations and all-optically reconstructed subwavelength phase features of input objects by directly transforming them into magnified intensity features at the output. This solid-immersion-based diffractive imager, with its compact and cost-effective design, can find wide-ranging applications in bioimaging, endoscopy, sensing and materials characterization. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 32 Pages, 9 Figures

Journal ref: eLight (2024)

arXiv:2401.07856 [pdf]

doi 10.1126/sciadv.adn9420

Information hiding cameras: optical concealment of object information into ordinary images

Authors: Bijie Bai, Ryan Lee, Yuhang Li, Tianyi Gan, Yuntian Wang, Mona Jarrahi, Aydogan Ozcan

Abstract: Data protection methods like cryptography, despite being effective, inadvertently signal the presence of secret communication, thereby drawing undue attention. Here, we introduce an optical information hiding camera integrated with an electronic decoder, optimized jointly through deep learning. This information hiding-decoding system employs a diffractive optical processor as its front-end, which… ▽ More Data protection methods like cryptography, despite being effective, inadvertently signal the presence of secret communication, thereby drawing undue attention. Here, we introduce an optical information hiding camera integrated with an electronic decoder, optimized jointly through deep learning. This information hiding-decoding system employs a diffractive optical processor as its front-end, which transforms and hides input images in the form of ordinary-looking patterns that deceive/mislead human observers. This information hiding transformation is valid for infinitely many combinations of secret messages, all of which are transformed into ordinary-looking output patterns, achieved all-optically through passive light-matter interactions within the optical processor. By processing these ordinary-looking output images, a jointly-trained electronic decoder neural network accurately reconstructs the original information hidden within the deceptive output pattern. We numerically demonstrated our approach by designing an information hiding diffractive camera along with a jointly-optimized convolutional decoder neural network. The efficacy of this system was demonstrated under various lighting conditions and noise levels, showing its robustness. We further extended this information hiding camera to multi-spectral operation, allowing the concealment and decoding of multiple images at different wavelengths, all performed simultaneously in a single feed-forward operation. The feasibility of our framework was also demonstrated experimentally using THz radiation. This optical encoder-electronic decoder-based co-design provides a novel information hiding camera interface that is both high-speed and energy-efficient, offering an intriguing solution for visual information security. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: 26 Pages, 8 Figures

Journal ref: Science Advances (2024)

arXiv:2401.04354 [pdf, other]

Knowledge-enhanced Multi-perspective Video Representation Learning for Scene Recognition

Authors: Xuzheng Yu, Chen Jiang, Wei Zhang, Tian Gan, Linlin Chao, Jianan Zhao, Yuan Cheng, Qingpei Guo, Wei Chu

Abstract: With the explosive growth of video data in real-world applications, a comprehensive representation of videos becomes increasingly important. In this paper, we address the problem of video scene recognition, whose goal is to learn a high-level video representation to classify scenes in videos. Due to the diversity and complexity of video contents in realistic scenarios, this task remains a challeng… ▽ More With the explosive growth of video data in real-world applications, a comprehensive representation of videos becomes increasingly important. In this paper, we address the problem of video scene recognition, whose goal is to learn a high-level video representation to classify scenes in videos. Due to the diversity and complexity of video contents in realistic scenarios, this task remains a challenge. Most existing works identify scenes for videos only from visual or textual information in a temporal perspective, ignoring the valuable information hidden in single frames, while several earlier studies only recognize scenes for separate images in a non-temporal perspective. We argue that these two perspectives are both meaningful for this task and complementary to each other, meanwhile, externally introduced knowledge can also promote the comprehension of videos. We propose a novel two-stream framework to model video representations from multiple perspectives, i.e. temporal and non-temporal perspectives, and integrate the two perspectives in an end-to-end manner by self-distillation. Besides, we design a knowledge-enhanced feature fusion and label prediction method that contributes to naturally introducing knowledge into the task of video scene recognition. Experiments conducted on a real-world dataset demonstrate the effectiveness of our proposed method. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2312.07942 [pdf, other]

Learning Diffusions under Uncertainty

Authors: Hao Huang, Qian Yan, Keqi Han, Ting Gan, Jiawei Jiang, Quanqing Xu, Chuanhui Yan

Abstract: To infer a diffusion network based on observations from historical diffusion processes, existing approaches assume that observation data contain exact occurrence time of each node infection, or at least the eventual infection statuses of nodes in each diffusion process. They determine potential influence relationships between nodes by identifying frequent sequences, or statistical correlations, am… ▽ More To infer a diffusion network based on observations from historical diffusion processes, existing approaches assume that observation data contain exact occurrence time of each node infection, or at least the eventual infection statuses of nodes in each diffusion process. They determine potential influence relationships between nodes by identifying frequent sequences, or statistical correlations, among node infections. In some real-world settings, such as the spread of epidemics, tracing exact infection times is often infeasible due to a high cost; even obtaining precise infection statuses of nodes is a challenging task, since observable symptoms such as headache only partially reveal a node's true status. In this work, we investigate how to effectively infer a diffusion network from observation data with uncertainty. Provided with only probabilistic information about node infection statuses, we formulate the problem of diffusion network inference as a constrained nonlinear regression w.r.t. the probabilistic data. An alternating maximization method is designed to solve this regression problem iteratively, and the improvement of solution quality in each iteration can be theoretically guaranteed. Empirical studies are conducted on both synthetic and real-world networks, and the results verify the effectiveness and efficiency of our approach. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.00347 [pdf, other]

doi 10.1145/3581783.3612152

RTQ: Rethinking Video-language Understanding Based on Image-text Model

Authors: Xiao Wang, Yaoyu Li, Tian Gan, Zheng Zhang, Jingjing Lv, Liqiang Nie

Abstract: Recent advancements in video-language understanding have been established on the foundation of image-text models, resulting in promising outcomes due to the shared knowledge between images and videos. However, video-language understanding presents unique challenges due to the inclusion of highly complex semantic details, which result in information redundancy, temporal dependency, and scene comple… ▽ More Recent advancements in video-language understanding have been established on the foundation of image-text models, resulting in promising outcomes due to the shared knowledge between images and videos. However, video-language understanding presents unique challenges due to the inclusion of highly complex semantic details, which result in information redundancy, temporal dependency, and scene complexity. Current techniques have only partially tackled these issues, and our quantitative analysis indicates that some of these methods are complementary. In light of this, we propose a novel framework called RTQ (Refine, Temporal model, and Query), which addresses these challenges simultaneously. The approach involves refining redundant information within frames, modeling temporal relations among frames, and querying task-specific information from the videos. Remarkably, our model demonstrates outstanding performance even in the absence of video-language pre-training, and the results are comparable with or superior to those achieved by state-of-the-art pre-training methods. Code is available at https://github.com/SCZwangxiao/RTQ-MM2023. △ Less

Submitted 17 December, 2023; v1 submitted 30 November, 2023; originally announced December 2023.

Comments: Accepted by ACM MM 2023 as Oral representation

Journal ref: In International Conference on Multimedia. ACM, 557--566 (2023)

arXiv:2311.04473 [pdf]

doi 10.1038/s41467-024-49304-y

All-Optical Phase Conjugation Using Diffractive Wavefront Processing

Authors: Che-Yung Shen, Jingxi Li, Tianyi Gan, Mona Jarrahi, Aydogan Ozcan

Abstract: Optical phase conjugation (OPC) is a nonlinear technique used for counteracting wavefront distortions, with various applications ranging from imaging to beam focusing. Here, we present the design of a diffractive wavefront processor to approximate all-optical phase conjugation operation for input fields with phase aberrations. Leveraging deep learning, a set of passive diffractive layers was optim… ▽ More Optical phase conjugation (OPC) is a nonlinear technique used for counteracting wavefront distortions, with various applications ranging from imaging to beam focusing. Here, we present the design of a diffractive wavefront processor to approximate all-optical phase conjugation operation for input fields with phase aberrations. Leveraging deep learning, a set of passive diffractive layers was optimized to all-optically process an arbitrary phase-aberrated coherent field from an input aperture, producing an output field with a phase distribution that is the conjugate of the input wave. We experimentally validated the efficacy of this wavefront processor by 3D fabricating diffractive layers trained using deep learning and performing OPC on phase distortions never seen by the diffractive processor during its training. Employing terahertz radiation, our physical diffractive processor successfully performed the OPC task through a shallow spatially-engineered volume that axially spans tens of wavelengths. In addition to this transmissive OPC configuration, we also created a diffractive phase-conjugate mirror by combining deep learning-optimized diffractive layers with a standard mirror. Given its compact, passive and scalable nature, our diffractive wavefront processor can be used for diverse OPC-related applications, e.g., turbidity suppression and aberration correction, and is also adaptable to different parts of the electromagnetic spectrum, especially those where cost-effective wavefront engineering solutions do not exist. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 34 Pages, 9 Figures

Journal ref: Nature Communications (2024)

arXiv:2310.10024 [pdf, other]

Managing Persuasion Robustly: The Optimality of Quota Rules

Authors: Dirk Bergemann, Tan Gan, Yingkai Li

Abstract: We study a sender-receiver model where the receiver can commit to a decision rule before the sender determines the information policy. The decision rule can depend on the signal structure and the signal realization that the sender adopts. This framework captures applications where a decision-maker (the receiver) solicit advice from an interested party (sender). In these applications, the receiver… ▽ More We study a sender-receiver model where the receiver can commit to a decision rule before the sender determines the information policy. The decision rule can depend on the signal structure and the signal realization that the sender adopts. This framework captures applications where a decision-maker (the receiver) solicit advice from an interested party (sender). In these applications, the receiver faces uncertainty regarding the sender's preferences and the set of feasible signal structures. Consequently, we adopt a unified robust analysis framework that includes max-min utility, min-max regret, and min-max approximation ratio as special cases. We show that it is optimal for the receiver to sacrifice ex-post optimality to perfectly align the sender's incentive. The optimal decision rule is a quota rule, i.e., the decision rule maximizes the receiver's ex-ante payoff subject to the constraint that the marginal distribution over actions adheres to a consistent quota, regardless of the sender's chosen signal structure. △ Less

Submitted 15 October, 2023; originally announced October 2023.

arXiv:2309.09215 [pdf]

doi 10.1038/s41377-024-01385-6

All-optical image denoising using a diffractive visual processor

Authors: Cagatay Isıl, Tianyi Gan, F. Onuralp Ardic, Koray Mentesoglu, Jagrit Digani, Huseyin Karaca, Hanlong Chen, Jingxi Li, Deniz Mengu, Mona Jarrahi, Kaan Akşit, Aydogan Ozcan

Abstract: Image denoising, one of the essential inverse problems, targets to remove noise/artifacts from input images. In general, digital image denoising algorithms, executed on computers, present latency due to several iterations implemented in, e.g., graphics processing units (GPUs). While deep learning-enabled methods can operate non-iteratively, they also introduce latency and impose a significant comp… ▽ More Image denoising, one of the essential inverse problems, targets to remove noise/artifacts from input images. In general, digital image denoising algorithms, executed on computers, present latency due to several iterations implemented in, e.g., graphics processing units (GPUs). While deep learning-enabled methods can operate non-iteratively, they also introduce latency and impose a significant computational burden, leading to increased power consumption. Here, we introduce an analog diffractive image denoiser to all-optically and non-iteratively clean various forms of noise and artifacts from input images - implemented at the speed of light propagation within a thin diffractive visual processor. This all-optical image denoiser comprises passive transmissive layers optimized using deep learning to physically scatter the optical modes that represent various noise features, causing them to miss the output image Field-of-View (FoV) while retaining the object features of interest. Our results show that these diffractive denoisers can efficiently remove salt and pepper noise and image rendering-related spatial artifacts from input phase or intensity images while achieving an output power efficiency of ~30-40%. We experimentally demonstrated the effectiveness of this analog denoiser architecture using a 3D-printed diffractive visual processor operating at the terahertz spectrum. Owing to their speed, power-efficiency, and minimal computational overhead, all-optical diffractive denoisers can be transformative for various image display and projection systems, including, e.g., holographic displays. △ Less

Submitted 17 September, 2023; originally announced September 2023.

Comments: 21 Pages, 7 Figures

Journal ref: Light: Science & Applications (2024)

arXiv:2308.15019 [pdf]

Pyramid diffractive optical networks for unidirectional magnification and demagnification

Authors: Bijie Bai, Xilin Yang, Tianyi Gan, Jingxi Li, Deniz Mengu, Mona Jarrahi, Aydogan Ozcan

Abstract: Diffractive deep neural networks (D2NNs) are composed of successive transmissive layers optimized using supervised deep learning to all-optically implement various computational tasks between an input and output field-of-view (FOV). Here, we present a pyramid-structured diffractive optical network design (which we term P-D2NN), optimized specifically for unidirectional image magnification and dema… ▽ More Diffractive deep neural networks (D2NNs) are composed of successive transmissive layers optimized using supervised deep learning to all-optically implement various computational tasks between an input and output field-of-view (FOV). Here, we present a pyramid-structured diffractive optical network design (which we term P-D2NN), optimized specifically for unidirectional image magnification and demagnification. In this P-D2NN design, the diffractive layers are pyramidally scaled in alignment with the direction of the image magnification or demagnification. Our analyses revealed the efficacy of this P-D2NN design in unidirectional image magnification and demagnification tasks, producing high-fidelity magnified or demagnified images in only one direction, while inhibiting the image formation in the opposite direction - confirming the desired unidirectional imaging operation. Compared to the conventional D2NN designs with uniform-sized successive diffractive layers, P-D2NN design achieves similar performance in unidirectional magnification tasks using only half of the diffractive degrees of freedom within the optical processor volume. Furthermore, it maintains its unidirectional image magnification/demagnification functionality across a large band of illumination wavelengths despite being trained with a single illumination wavelength. With this pyramidal architecture, we also designed a wavelength-multiplexed diffractive network, where a unidirectional magnifier and a unidirectional demagnifier operate simultaneously in opposite directions, at two distinct illumination wavelengths. The efficacy of the P-D2NN architecture was also validated experimentally using monochromatic terahertz illumination, successfully matching our numerical simulations. P-D2NN offers a physics-inspired strategy for designing task-specific visual processors. △ Less

Submitted 29 August, 2023; originally announced August 2023.

Comments: 26 Pages, 7 Figures

arXiv:2308.10648 [pdf, other]

EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints

Authors: Yutao Chen, Xingning Dong, Tian Gan, Chunluan Zhou, Ming Yang, Qingpei Guo

Abstract: Motivated by the superior performance of image diffusion models, more and more researchers strive to extend these models to the text-based video editing task. Nevertheless, current video editing tasks mainly suffer from the dilemma between the high fine-tuning cost and the limited generation capacity. Compared with images, we conjecture that videos necessitate more constraints to preserve the temp… ▽ More Motivated by the superior performance of image diffusion models, more and more researchers strive to extend these models to the text-based video editing task. Nevertheless, current video editing tasks mainly suffer from the dilemma between the high fine-tuning cost and the limited generation capacity. Compared with images, we conjecture that videos necessitate more constraints to preserve the temporal consistency during editing. Towards this end, we propose EVE, a robust and efficient zero-shot video editing method. Under the guidance of depth maps and temporal consistency constraints, EVE derives satisfactory video editing results with an affordable computational and time cost. Moreover, recognizing the absence of a publicly available video editing dataset for fair comparisons, we construct a new benchmark ZVE-50 dataset. Through comprehensive experimentation, we validate that EVE could achieve a satisfactory trade-off between performance and efficiency. We will release our dataset and codebase to facilitate future researchers. △ Less

Submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.07102 [pdf, other]

doi 10.1145/3581783.3612120

Temporal Sentence Grounding in Streaming Videos

Authors: Tian Gan, Xiao Wang, Yan Sun, Jianlong Wu, Qingpei Guo, Liqiang Nie

Abstract: This paper aims to tackle a novel task - Temporal Sentence Grounding in Streaming Videos (TSGSV). The goal of TSGSV is to evaluate the relevance between a video stream and a given sentence query. Unlike regular videos, streaming videos are acquired continuously from a particular source, and are always desired to be processed on-the-fly in many applications such as surveillance and live-stream anal… ▽ More This paper aims to tackle a novel task - Temporal Sentence Grounding in Streaming Videos (TSGSV). The goal of TSGSV is to evaluate the relevance between a video stream and a given sentence query. Unlike regular videos, streaming videos are acquired continuously from a particular source, and are always desired to be processed on-the-fly in many applications such as surveillance and live-stream analysis. Thus, TSGSV is challenging since it requires the model to infer without future frames and process long historical frames effectively, which is untouched in the early methods. To specifically address the above challenges, we propose two novel methods: (1) a TwinNet structure that enables the model to learn about upcoming events; and (2) a language-guided feature compressor that eliminates redundant visual frames and reinforces the frames that are relevant to the query. We conduct extensive experiments using ActivityNet Captions, TACoS, and MAD datasets. The results demonstrate the superiority of our proposed methods. A systematic ablation study also confirms their effectiveness. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: Accepted by ACM MM 2023

arXiv:2306.05912 [pdf, other]

Single-Image-Based Deep Learning for Segmentation of Early Esophageal Cancer Lesions

Authors: Haipeng Li, Dingrui Liu, Yu Zeng, Shuaicheng Liu, Tao Gan, Nini Rao, Jinlin Yang, Bing Zeng

Abstract: Accurate segmentation of lesions is crucial for diagnosis and treatment of early esophageal cancer (EEC). However, neither traditional nor deep learning-based methods up to today can meet the clinical requirements, with the mean Dice score - the most important metric in medical image analysis - hardly exceeding 0.75. In this paper, we present a novel deep learning approach for segmenting EEC lesio… ▽ More Accurate segmentation of lesions is crucial for diagnosis and treatment of early esophageal cancer (EEC). However, neither traditional nor deep learning-based methods up to today can meet the clinical requirements, with the mean Dice score - the most important metric in medical image analysis - hardly exceeding 0.75. In this paper, we present a novel deep learning approach for segmenting EEC lesions. Our approach stands out for its uniqueness, as it relies solely on a single image coming from one patient, forming the so-called "You-Only-Have-One" (YOHO) framework. On one hand, this "one-image-one-network" learning ensures complete patient privacy as it does not use any images from other patients as the training data. On the other hand, it avoids nearly all generalization-related problems since each trained network is applied only to the input image itself. In particular, we can push the training to "over-fitting" as much as possible to increase the segmentation accuracy. Our technical details include an interaction with clinical physicians to utilize their expertise, a geometry-based rendering of a single lesion image to generate the training set (the \emph{biggest} novelty), and an edge-enhanced UNet. We have evaluated YOHO over an EEC data-set created by ourselves and achieved a mean Dice score of 0.888, which represents a significant advance toward clinical applications. △ Less

Submitted 9 June, 2023; originally announced June 2023.

arXiv:2304.10087 [pdf]

doi 10.1038/s41467-023-42556-0

Learning Diffractive Optical Communication Around Arbitrary Opaque Occlusions

Authors: Md Sadman Sakib Rahman, Tianyi Gan, Emir Arda Deger, Cagatay Isil, Mona Jarrahi, Aydogan Ozcan

Abstract: Free-space optical systems are emerging for high data rate communication and transfer of information in indoor and outdoor settings. However, free-space optical communication becomes challenging when an occlusion blocks the light path. Here, we demonstrate, for the first time, a direct communication scheme, passing optical information around a fully opaque, arbitrarily shaped obstacle that partial… ▽ More Free-space optical systems are emerging for high data rate communication and transfer of information in indoor and outdoor settings. However, free-space optical communication becomes challenging when an occlusion blocks the light path. Here, we demonstrate, for the first time, a direct communication scheme, passing optical information around a fully opaque, arbitrarily shaped obstacle that partially or entirely occludes the transmitter's field-of-view. In this scheme, an electronic neural network encoder and a diffractive optical network decoder are jointly trained using deep learning to transfer the optical information or message of interest around the opaque occlusion of an arbitrary shape. The diffractive decoder comprises successive spatially-engineered passive surfaces that process optical information through light-matter interactions. Following its training, the encoder-decoder pair can communicate any arbitrary optical information around opaque occlusions, where information decoding occurs at the speed of light propagation. For occlusions that change their size and/or shape as a function of time, the encoder neural network can be retrained to successfully communicate with the existing diffractive decoder, without changing the physical layer(s) already deployed. We also validate this framework experimentally in the terahertz spectrum using a 3D-printed diffractive decoder to communicate around a fully opaque occlusion. Scalable for operation in any wavelength regime, this scheme could be particularly useful in emerging high data-rate free-space communication systems. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: 23 Pages, 9 Figures

Journal ref: Nature Communications (2023)

arXiv:2304.05724 [pdf]

doi 10.1002/adma.202303395

Universal Polarization Transformations: Spatial programming of polarization scattering matrices using a deep learning-designed diffractive polarization transformer

Authors: Yuhang Li, Jingxi Li, Yifan Zhao, Tianyi Gan, Jingtian Hu, Mona Jarrahi, Aydogan Ozcan

Abstract: We demonstrate universal polarization transformers based on an engineered diffractive volume, which can synthesize a large set of arbitrarily-selected, complex-valued polarization scattering matrices between the polarization states at different positions within its input and output field-of-views (FOVs). This framework comprises 2D arrays of linear polarizers with diverse angles, which are positio… ▽ More We demonstrate universal polarization transformers based on an engineered diffractive volume, which can synthesize a large set of arbitrarily-selected, complex-valued polarization scattering matrices between the polarization states at different positions within its input and output field-of-views (FOVs). This framework comprises 2D arrays of linear polarizers with diverse angles, which are positioned between isotropic diffractive layers, each containing tens of thousands of diffractive features with optimizable transmission coefficients. We demonstrate that, after its deep learning-based training, this diffractive polarization transformer could successfully implement N_i x N_o = 10,000 different spatially-encoded polarization scattering matrices with negligible error within a single diffractive volume, where N_i and N_o represent the number of pixels in the input and output FOVs, respectively. We experimentally validated this universal polarization transformation framework in the terahertz part of the spectrum by fabricating wire-grid polarizers and integrating them with 3D-printed diffractive layers to form a physical polarization transformer operating at 0.75 mm wavelength. Through this set-up, we demonstrated an all-optical polarization permutation operation of spatially-varying polarization fields, and simultaneously implemented distinct spatially-encoded polarization scattering matrices between the input and output FOVs of a compact diffractive processor that axially spans 200 wavelengths. This framework opens up new avenues for developing novel optical devices for universal polarization control, and may find various applications in, e.g., remote sensing, medical imaging, security, material inspection and machine vision. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: 33 Pages, 7 Figures

Journal ref: Advanced Materials (2023)

arXiv:2303.08318 [pdf, other]

doi 10.1145/3503161.3548098

Micro-video Tagging via Jointly Modeling Social Influence and Tag Relation

Authors: Xiao Wang, Tian Gan, Yinwei Wei, Jianlong Wu, Dai Meng, Liqiang Nie

Abstract: The last decade has witnessed the proliferation of micro-videos on various user-generated content platforms. According to our statistics, around 85.7\% of micro-videos lack annotation. In this paper, we focus on annotating micro-videos with tags. Existing methods mostly focus on analyzing video content, neglecting users' social influence and tag relation. Meanwhile, existing tag relation construct… ▽ More The last decade has witnessed the proliferation of micro-videos on various user-generated content platforms. According to our statistics, around 85.7\% of micro-videos lack annotation. In this paper, we focus on annotating micro-videos with tags. Existing methods mostly focus on analyzing video content, neglecting users' social influence and tag relation. Meanwhile, existing tag relation construction methods suffer from either deficient performance or low tag coverage. To jointly model social influence and tag relation, we formulate micro-video tagging as a link prediction problem in a constructed heterogeneous network. Specifically, the tag relation (represented by tag ontology) is constructed in a semi-supervised manner. Then, we combine tag relation, video-tag annotation, and user-follow relation to build the network. Afterward, a better video and tag representation are derived through Behavior Spread modeling and visual and linguistic knowledge aggregation. Finally, the semantic similarity between each micro-video and all candidate tags is calculated in this video-tag network. Extensive experiments on industrial datasets of three verticals verify the superiority of our model compared with several state-of-the-art baselines. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Accepted by Proceedings of the 30th ACM International Conference on Multimedia (2022)

arXiv:2212.08568 [pdf, other]

Biomedical image analysis competitions: The state of current participation practice

Authors: Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J. Adler, Patrick Godau, Veronika Cheplygina, Michal Kozubek, Sharib Ali, Anubha Gupta, Jan Kybic, Alison Noble, Carlos Ortiz de Solórzano, Samiksha Pachade, Caroline Petitjean, Daniel Sage, Donglai Wei, Elizabeth Wilden, Deepak Alapatt, Vincent Andrearczyk, Ujjwal Baid, Spyridon Bakas, Niranjan Balu, Sophia Bano , et al. (331 additional authors not shown)

Abstract: The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis,… ▽ More The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps. △ Less

Submitted 12 September, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

arXiv:2212.02025 [pdf]

doi 10.1126/sciadv.adg1505

Unidirectional Imaging using Deep Learning-Designed Materials

Authors: Jingxi Li, Tianyi Gan, Yifan Zhao, Bijie Bai, Che-Yung Shen, Songyu Sun, Mona Jarrahi, Aydogan Ozcan

Abstract: A unidirectional imager would only permit image formation along one direction, from an input field-of-view (FOV) A to an output FOV B, and in the reverse path, the image formation would be blocked. Here, we report the first demonstration of unidirectional imagers, presenting polarization-insensitive and broadband unidirectional imaging based on successive diffractive layers that are linear and iso… ▽ More A unidirectional imager would only permit image formation along one direction, from an input field-of-view (FOV) A to an output FOV B, and in the reverse path, the image formation would be blocked. Here, we report the first demonstration of unidirectional imagers, presenting polarization-insensitive and broadband unidirectional imaging based on successive diffractive layers that are linear and isotropic. These diffractive layers are optimized using deep learning and consist of hundreds of thousands of diffractive phase features, which collectively modulate the incoming fields and project an intensity image of the input onto an output FOV, while blocking the image formation in the reverse direction. After their deep learning-based training, the resulting diffractive layers are fabricated to form a unidirectional imager. As a reciprocal device, the diffractive unidirectional imager has asymmetric mode processing capabilities in the forward and backward directions, where the optical modes from B to A are selectively guided/scattered to miss the output FOV, whereas for the forward direction such modal losses are minimized, yielding an ideal imaging system between the input and output FOVs. Although trained using monochromatic illumination, the diffractive unidirectional imager maintains its functionality over a large spectral band and works under broadband illumination. We experimentally validated this unidirectional imager using terahertz radiation, very well matching our numerical results. Using the same deep learning-based design strategy, we also created a wavelength-selective unidirectional imager, where two unidirectional imaging operations, in reverse directions, are multiplexed through different illumination wavelengths. Diffractive unidirectional imaging using structured materials will have numerous applications in e.g., security, defense, telecommunications and privacy protection. △ Less

Submitted 4 December, 2022; originally announced December 2022.

Comments: 27 Pages, 10 Figures

Journal ref: Science Advances (2023)

arXiv:2205.13122 [pdf]

doi 10.1186/s43593-022-00021-3

To image, or not to image: Class-specific diffractive cameras with all-optical erasure of undesired objects

Authors: Bijie Bai, Yi Luo, Tianyi Gan, Jingtian Hu, Yuhang Li, Yifan Zhao, Deniz Mengu, Mona Jarrahi, Aydogan Ozcan

Abstract: Privacy protection is a growing concern in the digital era, with machine vision techniques widely used throughout public and private settings. Existing methods address this growing problem by, e.g., encrypting camera images or obscuring/blurring the imaged information through digital algorithms. Here, we demonstrate a camera design that performs class-specific imaging of target objects with instan… ▽ More Privacy protection is a growing concern in the digital era, with machine vision techniques widely used throughout public and private settings. Existing methods address this growing problem by, e.g., encrypting camera images or obscuring/blurring the imaged information through digital algorithms. Here, we demonstrate a camera design that performs class-specific imaging of target objects with instantaneous all-optical erasure of other classes of objects. This diffractive camera consists of transmissive surfaces structured using deep learning to perform selective imaging of target classes of objects positioned at its input field-of-view. After their fabrication, the thin diffractive layers collectively perform optical mode filtering to accurately form images of the objects that belong to a target data class or group of classes, while instantaneously erasing objects of the other data classes at the output field-of-view. Using the same framework, we also demonstrate the design of class-specific permutation cameras, where the objects of a target data class are pixel-wise permuted for all-optical class-specific encryption, while the other objects are irreversibly erased from the output image. The success of class-specific diffractive cameras was experimentally demonstrated using terahertz (THz) waves and 3D-printed diffractive layers that selectively imaged only one class of the MNIST handwritten digit dataset, all-optically erasing the other handwritten digits. This diffractive camera design can be scaled to different parts of the electromagnetic spectrum, including, e.g., the visible and infrared wavelengths, to provide transformative opportunities for privacy-preserving digital cameras and task-specific data-efficient imaging. △ Less

Submitted 25 May, 2022; originally announced May 2022.

Comments: 31 Pages, 7 Figures

Journal ref: eLight (2022)

arXiv:2203.09811 [pdf, other]

Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation

Authors: Xingning Dong, Tian Gan, Xuemeng Song, Jianlong Wu, Yuan Cheng, Liqiang Nie

Abstract: Scene Graph Generation, which generally follows a regular encoder-decoder pipeline, aims to first encode the visual contents within the given image and then parse them into a compact summary graph. Existing SGG approaches generally not only neglect the insufficient modality fusion between vision and language, but also fail to provide informative predicates due to the biased relationship prediction… ▽ More Scene Graph Generation, which generally follows a regular encoder-decoder pipeline, aims to first encode the visual contents within the given image and then parse them into a compact summary graph. Existing SGG approaches generally not only neglect the insufficient modality fusion between vision and language, but also fail to provide informative predicates due to the biased relationship predictions, leading SGG far from practical. Towards this end, in this paper, we first present a novel Stacked Hybrid-Attention network, which facilitates the intra-modal refinement as well as the inter-modal interaction, to serve as the encoder. We then devise an innovative Group Collaborative Learning strategy to optimize the decoder. Particularly, based upon the observation that the recognition capability of one classifier is limited towards an extremely unbalanced dataset, we first deploy a group of classifiers that are expert in distinguishing different subsets of classes, and then cooperatively optimize them from two aspects to promote the unbiased SGG. Experiments conducted on VG and GQA datasets demonstrate that, we not only establish a new state-of-the-art in the unbiased metric, but also nearly double the performance compared with two baselines. △ Less

Submitted 2 April, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

Comments: Accepted by CVPR 2022, the code is available at https://github.com/dongxingning/SHA-GCL-for-SGG

arXiv:2202.12031 [pdf, other]

Assessing generalisability of deep learning-based polyp detection and segmentation methods through a computer vision challenge

Authors: Sharib Ali, Noha Ghatwary, Debesh Jha, Ece Isik-Polat, Gorkem Polat, Chen Yang, Wuyang Li, Adrian Galdran, Miguel-Ángel González Ballester, Vajira Thambawita, Steven Hicks, Sahadev Poudel, Sang-Woong Lee, Ziyi Jin, Tianyuan Gan, ChengHui Yu, JiangPeng Yan, Doyeob Yeo, Hyunseok Lee, Nikhil Kumar Tomar, Mahmood Haithmi, Amr Ahmed, Michael A. Riegler, Christian Daul, Pål Halvorsen , et al. (7 additional authors not shown)

Abstract: Polyps are well-known cancer precursors identified by colonoscopy. However, variability in their size, location, and surface largely affect identification, localisation, and characterisation. Moreover, colonoscopic surveillance and removal of polyps (referred to as polypectomy ) are highly operator-dependent procedures. There exist a high missed detection rate and incomplete removal of colonic pol… ▽ More Polyps are well-known cancer precursors identified by colonoscopy. However, variability in their size, location, and surface largely affect identification, localisation, and characterisation. Moreover, colonoscopic surveillance and removal of polyps (referred to as polypectomy ) are highly operator-dependent procedures. There exist a high missed detection rate and incomplete removal of colonic polyps due to their variable nature, the difficulties to delineate the abnormality, the high recurrence rates, and the anatomical topography of the colon. There have been several developments in realising automated methods for both detection and segmentation of these polyps using machine learning. However, the major drawback in most of these methods is their ability to generalise to out-of-sample unseen datasets that come from different centres, modalities and acquisition systems. To test this hypothesis rigorously we curated a multi-centre and multi-population dataset acquired from multiple colonoscopy systems and challenged teams comprising machine learning experts to develop robust automated detection and segmentation methods as part of our crowd-sourcing Endoscopic computer vision challenge (EndoCV) 2021. In this paper, we analyse the detection results of the four top (among seven) teams and the segmentation results of the five top teams (among 16). Our analyses demonstrate that the top-ranking teams concentrated on accuracy (i.e., accuracy > 80% on overall Dice score on different validation sets) over real-time performance required for clinical applicability. We further dissect the methods and provide an experiment-based hypothesis that reveals the need for improved generalisability to tackle diversity present in multi-centre datasets. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: 26 pages

arXiv:2008.00395 [pdf, other]

Balancing Common Treatment and Epidemic Control in Medical Procurement during COVID-19: Transform-and-Divide Evolutionary Optimization

Authors: Yu-Jun Zheng, Xin Chen, Tie-Er Gan, Min-Xia Zhang, Wei-Guo Sheng, Ling Wang

Abstract: Balancing common disease treatment and epidemic control is a key objective of medical supplies procurement in hospitals during a pandemic such as COVID-19. This problem can be formulated as a bi-objective optimization problem for simultaneously optimizing the effects of common disease treatment and epidemic control. However, due to the large number of supplies, difficulties in evaluating the effec… ▽ More Balancing common disease treatment and epidemic control is a key objective of medical supplies procurement in hospitals during a pandemic such as COVID-19. This problem can be formulated as a bi-objective optimization problem for simultaneously optimizing the effects of common disease treatment and epidemic control. However, due to the large number of supplies, difficulties in evaluating the effects, and the strict budget constraint, it is difficult for existing evolutionary multiobjective algorithms to efficiently approximate the Pareto front of the problem. In this paper, we present an approach that first transforms the original high-dimensional, constrained multiobjective optimization problem to a low-dimensional, unconstrained multiobjective optimization problem, and then evaluates each solution to the transformed problem by solving a set of simple single-objective optimization subproblems, such that the problem can be efficiently solved by existing evolutionary multiobjective algorithms. We applied the transform-and-divide evolutionary optimization approach to six hospitals in Zhejiang Province, China, during the peak of COVID-19. Results showed that the proposed approach exhibits significantly better performance than that of directly solving the original problem. Our study has also shown that transform-and-divide evolutionary optimization based on problem-specific knowledge can be an efficient solution approach to many other complex problems and, therefore, enlarge the application field of evolutionary algorithms. △ Less

Submitted 2 August, 2020; originally announced August 2020.

arXiv:2007.13926 [pdf, other]

Intelligent Optimization of Diversified Community Prevention of COVID-19 using Traditional Chinese Medicine

Authors: Yu-Jun Zheng, Si-Lan Yu, Jun-Chao Yang, Tie-Er Gan, Qin Song, Jun Yang, Mumtaz Karatas

Abstract: Traditional Chinese medicine (TCM) has played an important role in the prevention and control of the novel coronavirus pneumonia (COVID-19), and community prevention has become the most essential part in reducing the spread risk and protecting populations. However, most communities use a uniform TCM prevention program for all residents, which violates the "treatment based on syndrome differentiati… ▽ More Traditional Chinese medicine (TCM) has played an important role in the prevention and control of the novel coronavirus pneumonia (COVID-19), and community prevention has become the most essential part in reducing the spread risk and protecting populations. However, most communities use a uniform TCM prevention program for all residents, which violates the "treatment based on syndrome differentiation" principle of TCM and limits the effectiveness of prevention. In this paper, we propose an intelligent optimization method to develop diversified TCM prevention programs for community residents. First, we use a fuzzy clustering method to divide the population based on both modern medicine and TCM health characteristics; we then use an interactive optimization method, in which TCM experts develop different TCM prevention programs for different clusters, and a heuristic algorithm is used to optimize the programs under the resource constraints. We demonstrate the computational efficiency of the proposed method and report its successful application to TCM-based prevention of COVID-19 in 12 communities in Zhejiang province, China, during the peak of the pandemic. △ Less

Submitted 27 July, 2020; originally announced July 2020.

arXiv:2004.03107 [pdf, other]

doi 10.1111/1756-2171.12407

The Economics of Social Data

Authors: Dirk Bergemann, Alessandro Bonatti, Tan Gan

Abstract: A data intermediary acquires signals from individual consumers regarding their preferences. The intermediary resells the information in a product market wherein firms and consumers tailor their choices to the demand data. The social dimension of the individual data -- whereby a consumer's data are predictive of others' behavior -- generates a data externality that can reduce the intermediary's cos… ▽ More A data intermediary acquires signals from individual consumers regarding their preferences. The intermediary resells the information in a product market wherein firms and consumers tailor their choices to the demand data. The social dimension of the individual data -- whereby a consumer's data are predictive of others' behavior -- generates a data externality that can reduce the intermediary's cost of acquiring the information. The intermediary optimally preserves the privacy of consumers' identities if and only if doing so increases social surplus. This policy enables the intermediary to capture the total value of the information as the number of consumers becomes large. △ Less

Submitted 28 September, 2022; v1 submitted 6 April, 2020; originally announced April 2020.

Journal ref: The RAND Journal of Economics, 53: 263-296 (2022)

arXiv:1903.01297 [pdf, other]

Nonlinear Craig Interpolant Generation

Authors: Ting Gan, Bican Xia, Bai Xue, Naijun Zhan, Liyun Dai

Abstract: Interpolation-based techniques have become popularized in recent years because of their inherently modular and local reasoning, which can scale up existing formal verification techniques like theorem proving, model-checking, abstraction interpretation, and so on, while the scalability is the bottleneck of these techniques. Craig interpolant generation plays a central role in interpolation-based te… ▽ More Interpolation-based techniques have become popularized in recent years because of their inherently modular and local reasoning, which can scale up existing formal verification techniques like theorem proving, model-checking, abstraction interpretation, and so on, while the scalability is the bottleneck of these techniques. Craig interpolant generation plays a central role in interpolation-based techniques, and therefore has drawn increasing attentions. In the literature, there are various works done on how to automatically synthesize interpolants for decidable fragments of first-order logic, linear arithmetic, array logic, equality logic with uninterpreted functions (EUF), etc., and their combinations. But Craig interpolant generation for non-linear theory and its combination with the aforementioned theories are still in infancy, although some attempts have been done. In this paper, we first prove that a polynomial interpolant of the form $h(\mathbf{x})>0$ exists for two mutually contradictory polynomial formulas $φ(\mathbf{x},\mathbf{y})$ and $ψ(\mathbf{x},\mathbf{z})$, with the form $f_1\ge0\wedge\cdots\wedge f_n\ge0$, where $f_i$ are polynomials in $\mathbf{x},\mathbf{y}$ or $\mathbf{x},\mathbf{z}$, and the quadratic module generated by $f_i$ is Archimedean. Then, we show that synthesizing such interpolant can be reduced to solving a semi-definite programming problem (${\rm SDP}$). In addition, we propose a verification approach to assure the validity of the synthesized interpolant and consequently avoid the unsoundness caused by numerical error in ${\rm SDP}$ solving. Finally, we discuss how to generalize our approach to general semi-algebraic formulas. △ Less

Submitted 10 May, 2020; v1 submitted 4 March, 2019; originally announced March 2019.

Comments: 32 pages, 4 figures

arXiv:1811.09386 [pdf, other]

Explicit Interaction Model towards Text Classification

Authors: Cunxiao Du, Zhaozheng Chin, Fuli Feng, Lei Zhu, Tian Gan, Liqiang Nie

Abstract: Text classification is one of the fundamental tasks in natural language processing. Recently, deep neural networks have achieved promising performance in the text classification task compared to shallow models. Despite of the significance of deep models, they ignore the fine-grained (matching signals between words and classes) classification clues since their classifications mainly rely on the tex… ▽ More Text classification is one of the fundamental tasks in natural language processing. Recently, deep neural networks have achieved promising performance in the text classification task compared to shallow models. Despite of the significance of deep models, they ignore the fine-grained (matching signals between words and classes) classification clues since their classifications mainly rely on the text-level representations. To address this problem, we introduce the interaction mechanism to incorporate word-level matching signals into the text classification task. In particular, we design a novel framework, EXplicit interAction Model (dubbed as EXAM), equipped with the interaction mechanism. We justified the proposed approach on several benchmark datasets including both multi-label and multi-class text classification tasks. Extensive experimental results demonstrate the superiority of the proposed method. As a byproduct, we have released the codes and parameter settings to facilitate other researches. △ Less

Submitted 23 November, 2018; originally announced November 2018.

Comments: 8 pages

Journal ref: AAAI 2019

arXiv:1601.04802 [pdf, other]

Interpolation synthesis for quadratic polynomial inequalities and combination with EUF

Authors: Ting Gan, Liyun Dai, Bican Xia, Naijun Zhan, Deepak Kapur, Mingshuai Chen

Abstract: An algorithm for generating interpolants for formulas which are conjunctions of quadratic polynomial inequalities (both strict and nonstrict) is proposed. The algorithm is based on a key observation that quadratic polynomial inequalities can be linearized if they are concave. A generalization of Motzkin's transposition theorem is proved, which is used to generate an interpolant between two mutuall… ▽ More An algorithm for generating interpolants for formulas which are conjunctions of quadratic polynomial inequalities (both strict and nonstrict) is proposed. The algorithm is based on a key observation that quadratic polynomial inequalities can be linearized if they are concave. A generalization of Motzkin's transposition theorem is proved, which is used to generate an interpolant between two mutually contradictory conjunctions of polynomial inequalities, using semi-definite programming in time complexity $\mathcal{O}(n^3+nm))$, where $n$ is the number of variables and $m$ is the number of inequalities. Using the framework proposed by \cite{SSLMCS2008} for combining interpolants for a combination of quantifier-free theories which have their own interpolation algorithms, a combination algorithm is given for the combined theory of concave quadratic polynomial inequalities and the equality theory over uninterpreted functions symbols (\textit{EUF}). The proposed approach is applicable to all existing abstract domains like \emph{octagon}, \emph{polyhedra}, \emph{ellipsoid} and so on, therefore it can be used to improve the scalability of existing verification techniques for programs and hybrid systems. In addition, we also discuss how to extend our approach to formulas beyond concave quadratic polynomials using Gröbner basis. △ Less

Submitted 10 November, 2016; v1 submitted 19 January, 2016; originally announced January 2016.

Comments: 40 pages, 1 figures

ACM Class: D.2.4

Showing 1–32 of 32 results for author: Gan, T