Enhancing Performance and User Engagement in Everyday Stress Monitoring: A Context-Aware Active Reinforcement Learning Approach

Seyed Amir Hossein Aqajari University of California, IrvineIrvineCAUSA92617 saqajari@uci.edu Ziyu Wang University of California, IrvineIrvineCAUSA92617 ziyuw31@uci.edu Ali Tazarv University of California, IrvineIrvineCAUSA92617 atazarv@uci.edu Sina Labbaf University of California, IrvineIrvineCAUSA92617 slabbaf@uci.edu Salar Jafarlou University of California, IrvineIrvineCAUSA92617 jafarlos@uci.edu Brenda Nguyen University of California, IrvineIrvineCAUSA92617 brendn3@uci.edu Nikil Dutt University of California, IrvineIrvineCAUSA92617 dutt@ics.uci.edu Marco Levorato University of California, IrvineIrvineCAUSA92617 levorato@uci.edu  and  Amir M. Rahmani University of California, IrvineIrvineCAUSA92617 a.rahmani@uci.edu
Abstract.

In today’s fast-paced world, accurately monitoring stress levels is crucial. Sensor-based stress monitoring systems often need large datasets for training effective models. However, individual-specific models are necessary for personalized and interactive scenarios. Traditional methods like Ecological Momentary Assessments (EMAs) assess stress but struggle with efficient data collection without burdening users. The challenge is to timely send EMAs, especially during stress, balancing monitoring efficiency and user convenience. This paper introduces a novel context-aware active reinforcement learning (RL) algorithm for enhanced stress detection using Photoplethysmography (PPG) data from smartwatches and contextual data from smartphones. Our approach dynamically selects optimal times for deploying EMAs, utilizing the user’s immediate context to maximize label accuracy and minimize intrusiveness. Initially, the study was executed in an offline environment to refine the label collection process, aiming to increase accuracy while reducing user burden. Later, we integrated a real-time label collection mechanism, transitioning to an online methodology. This shift resulted in an 11% improvement in stress detection efficiency. Incorporating contextual data improved model accuracy by 4%. Personalization studies indicated a 10% enhancement in AUC-ROC scores, demonstrating better stress level differentiation. This research marks a significant move towards personalized, context-driven real-time stress monitoring methods.

Health Informatics, eHealth, Active Learning, Reinforcement Learning, Personalized Machine Learning
copyright: nonedoi: xx.xxx/xxx_xjournal: HEALTHccs: Applied computing Health care information systemsccs: Applied computing Health informatics

1. Introduction

As per data from the American Institute of Stress (ame, 2021), approximately 55% of individuals in the United States encounter stress throughout their day. The American population stands as one of the most stressed globally, with their current stress levels surpassing the global average by 20 percentage points. The impact of stress extends to the physical body, cognitive processes, emotions, and behavior (may, 2023). Unaddressed stress can contribute to several health issues, including high blood pressure, heart ailments, obesity, and diabetes (may, 2023). Consequently, daily life monitoring of stress has garnered significant significance within our society, and the advancement of techniques for diagnosing human stress holds paramount importance.

The presence of stress within the human body can be diagnosed by analyzing psychophysiological signals such as Photoplethysmography (PPG) (Charlton et al., 2018; Wang et al., 2024). PPG is a simple optical sensing technique for detecting blood volume alterations within peripheral circulation (Allen, 2007). With the rapid advancements in technology and the development of the Internet of Things (IoT) (Yao et al., 2020; Wang et al., 2020b; Kanduri et al., 2023; Alikhani et al., 2023), the acquisition of PPG signals has been greatly facilitated, primarily through the utilization of wearable devices like smart rings or smartwatches (Castaneda et al., 2018). Consequently, monitoring stress levels in everyday life becomes an attainable endeavor by analyzing the PPG signals garnered from the aforementioned wearable devices. Furthermore, the evolution of mobile apps (Cheng et al., 2024a; Alikhani et al., 2024; Cheng et al., 2024b) designed for context logging has provided a means to consistently observe and record a user’s contextual data, including elements like their location, activities, weather conditions, and other relevant variables, all in real-time (Sannino et al., 2014; Yang et al., 2022). Prior studies have already demonstrated the significance of this contextual information in understanding and identifying stressful experiences encountered by individuals (Stojchevska et al., 2022; Can et al., 2020; Han et al., 2020). In everyday situations, biosignals vary widely among individuals due to physiological and lifestyle differences, as well as the diverse activities one might partake in. Additionally, the perception of stress levels differs from person to person, leading to biases in the data collected. Therefore, there is a significant need to tailor predictive models to each individual across their various activities. These adjustments, which are essential at the time of deployment, present both conceptual and technical challenges.

The evolution of stress monitoring in daily settings has seen a significant transformation. Originally, stress monitoring was largely confined to controlled environments such as laboratories, which allowed researchers to closely observe and study physiological responses under stress (Fahrenberg et al., 2007; Steptoe et al., 2003). Such controlled studies laid the groundwork for understanding stress responses in a structured setting. Over time, advancements in technology enabled the transition to more naturalistic and dynamic environments. This shift paved the way for methods like Ecological Momentary Assessments (EMAs) (Burke et al., 2017). EMAs revolutionized stress monitoring by allowing real-time data collection about participants’ stress levels throughout their day-to-day activities. Using self-reported stress (EMA) as the labeling source, users are asked to respond to real-time queries that link the data gathered by sensors to stress labels in everyday situations. One of the main challenges is optimally collecting these labels (stress levels) from individuals in their daily lives (Larradet et al., 2020). Frequently triggering EMAs or sending them at inappropriate times, such as when a user is busy with work or sleeping, could burden the user. This could lead to a significantly lower number of reported labels. Moreover, selecting the optimal moments to send the EMAs, especially during instances when an individual is experiencing stress, poses a considerable challenge and holds the utmost importance.

To tackle these challenges, in this work, we introduced a contextual variant of active learning, based on Deep Q-Learning, which incorporates the contextual information pertaining to an individual into the decision-making process. In the initial phase of our research, a context-aware active reinforcement learning algorithm was utilized in an offline setting (Tazarv et al., 2023). This approach was implemented to thoroughly evaluate the effectiveness of our proposed method. We demonstrated that the utilization of such an algorithm in a stress detection task can lead to a reduction of up to 88% in the required EMAs when compared to a random selection approach, and up to 32% when compared to traditional active learning methods. Furthermore, we observed that employing such an algorithm can increase the performance of stress detection tasks by up to 21% compared to a random selection method, and up to 8% when compared to traditional active learning approaches. However, an offline context-aware active reinforcement learning algorithm abstains from employing active learning to initiate EMAs. However, this approach may still entail user burdens and result in triggering EMAs at inappropriate times.

In this article, an extension of our previous work (Tazarv et al., 2023), we have improved our proposed algorithm for application in an online setting, leveraging active learning to initiate EMAs. In the online setting, our algorithm initiates EMAs at various points during the study, guided by the contextual information pertaining to the user. Within the context of obtaining participant labels such as stress levels, the active learning algorithm analyzes contextual information in real-time to ascertain the most appropriate timing for posing questions. This adaptive approach serves to reduce participant burden while simultaneously enhancing label accuracy. To comprehensively assess the efficacy of our online algorithm, we compare it to the prior offline one, we conducted two distinct analyses on the same dataset: one employing the offline context-aware active reinforcement learning algorithm and the other utilizing the online variant. Our findings unequivocally demonstrate that the online algorithm yields a substantial enhancement in the performance of the stress detection task when contrasted with its offline counterpart. Lastly, we employ a personalization technique to investigate the effects of personalized customization in enhancing the model’s performance.

In summary, the key contributions of this paper are as follows:

  • Propose a new form of active learning, utilizing Deep Q-learning, aimed at enhancing interaction with the monitored individual during data collection.

  • Develop a sensor-edge-cloud layered system architecture for the acquisition and labeling of the data aimed at real-time stress detection. We further demonstrate the effectiveness of our proposed system through the utilization of actual real-time data and comparing it with an identical offline variant of the algorithm.

  • Incorporate the contextual features into the stress detection models for the purpose of systematically monitoring the influence of contextual factors within the context of stress detection.

  • Examine how the performance of our algorithm is enhanced with the inclusion of subject-specific data during the training phase in order to explore the influence of personalization on stress detection.

  • Conduct a two-stage IRB-approved study on 54 individuals across undergraduate and graduate student populations over two periods: June 2020 to June 2021 (offline method) and March 2022 to May 2023 (online method), generating a total of 132,598 filtered samples. We commit to publicly releasing both datasets following our paper’s acceptance.

This paper is structured as follows: Section 2 offers a comprehensive review of stress assessment methods and associated research, highlighting the importance of personalizing models and the crucial role of user behavior and contextual data in real-time labeling. It also emphasizes the need to enhance user engagement in everyday settings. Section 3 details the platform developed for data collection and analysis. Section 4 outlines the dataset we gathered and our data processing methods. Section 5 introduces our proposed context-aware active learning approach, which incorporates user behavior and context into its query mechanism, along with temporal data correlations, to enhance query scheduling. Section 6 reports on the outcomes of various querying techniques in relation to personalization. Finally, Section 7 concludes the paper and includes a discussion.

2. Related Work

Stress-related research often delineates its origins from both exogenous factors—such as lifestyle, interpersonal relationships, and financial stability—and endogenous factors like individual psychological constitution and thought processes. These factors act as progenitors for negative affective states, including anxiety and fear, and instigate corresponding physiological responses. The physiological aspect of stress, denoted as a stress response, is a series of bodily reactions to environmental stimuli or stressors. Within the scientific discourse, the construct of stress is categorized into psychological, behavioral, and physiological dimensions. Historically, self-report measures such as the Perceived Stress Scale (PSS), formulated by Cohen et al. (Cohen et al., 1994), and the stress inventory by Holmes and Rahe (Holmes and Rahe, 1967), have been the standard for gauging stress levels retrospectively.

Nonetheless, the accuracy of survey-based assessments of stress is compromised by measurement biases, including response bias, which reflects the influence of the query’s framing on the participant’s responses. Additionally, while some behavioral expressions of stress—like facial expressions—are spontaneous, they may also be subject to volitional control, thus potentially skewing data accuracy. Consequently, recordings of such behaviors must be critically examined for systematic errors that may misrepresent the actual stress magnitude.

Given these constraints, and paralleling the evolution of high-precision sensor technology, there is an augmented demand for veritable detectors of physiological stress markers. Biosignal attributes of stress episodes are typically involuntary, and such data can be acquired through methodologies like electrocardiography (ECG), photoplethysmography (PPG), electromyography (EMG), skin conductance (SC) or electrodermal activity (EDA), respiratory rate (RSP), skin temperature (ST), pupil dilation (PD), and cerebral activity as captured by electroencephalography (EEG) (Giannakakis et al., 2019).

The current methodologies for monitoring stress in daily life utilize EMAs to inquire about participants’ stress levels throughout the day (Larradet et al., 2020). The task of effectively gathering accurate stress level indicators from individuals in the context of their everyday activities presents a significant challenge (Settles, 2009). The over-frequent activation of EMAs or their issuance at times that clash with a user’s schedule, such as during work hours or rest periods, can be an imposition. This may culminate in a reduced quantity of reported stress labels. Additionally, pinpointing the precise moments for sending EMAs, particularly in moments when stress levels are elevated, is of paramount significance and presents a notable challenge.

Existing methods in the literature for the deployment of EMAs in daily life stress monitoring studies can be categorized into three distinct categories: 1. Random, 2. Time-based, and 3. Statistical-based.

Within the random triggering methods, EMAs are dispatched at random intervals throughout the course of the study. Random sampling is often the preferred approach in situations where the research topic’s indicators cannot be reliably ascertained (Dogan et al., 2022). However, when the research topic possesses specific objectives and focus, alternative methods tend to yield more favorable outcomes.

Time-based triggering methods involve sending EMAs at fixed pre-defined intervals throughout the day. The majority of existing research endeavors in daily life stress monitoring in the literature employ this algorithm for their label querying system, as documented in previous studies (Yu et al., 2022, 2020; Mundnich et al., 2020; Wang et al., 2020a; Battalio et al., 2021). While this algorithm boasts simplicity of implementation and uniform coverage of the study period, it imposes a considerable burden on participants due to untimely EMA deliveries. Consequently, this may result in an increased prevalence of missing data and a reduction in the utility of collected labels.

In the context of statistical-based triggering methods, EMAs are dispatched based on the distribution of samples (Tazarv et al., 2021). Under this triggering algorithm, a label is requested for a specific sample based on the number of unlabeled samples within its vicinity. Although this policy can effectively mitigate the incidence of undesired EMA deliveries, it may still impose a burden on users and lead to missing data, as it fails to consider contextual information about users, which is pivotal in determining the optimal times for EMA deployment.

In this study, we propose a context-aware active reinforcement learning approach to effectively trigger EMAs throughout the day.

Initially we conducted an offline study where we employed a statistical-based triggering method to send EMAs throughout the day. In this phase, the probability of selecting each sample for labeling is proportionate to the quantity of prior unlabeled samples in its proximity. This approach increases the likelihood of requesting a user label for a sample situated in a region with a substantial number of unlabeled samples. Upon accumulating a sufficient number of labeled samples for each region, the data collection process is terminated. We implemented three distinct algorithms offline for optimal label selection in model development: 1. Random, 2. Traditional Active Reinforcement Learning, and our novel approach 3. Context-Aware Active Reinforcement Learning. Our findings revealed that using a context-aware active reinforcement learning algorithm in stress detection significantly decreases the necessity for EMAs enhances the effectiveness of stress detection over random or traditional active learning methodologies. As previously noted, statistical-based triggering algorithm may still impose a user burden and result in missing data due to its failure to incorporate user contextual information into the label-querying decision-making process.

In the next phase, we propose an online context-aware active reinforcement learning algorithm to utilize RL agent for decision making in real time to further improve the performance. This algorithm actively utilizes a context-aware active learning approach in real-time based on Deep Q-Learning to determine whether an EMA should be triggered for a given sample. By considering real-time contextual information related to each user in the decision-making process of whether to trigger an EMA for a specific sample, our approach is poised to significantly reduce the user burden associated with untimely EMA deliveries, consequently leading to an increase in the acquisition of high-utility labels. Table 1 provides a summary of the comparison between our work (offline and online studies) and existing literature on this subject.

Table 1. Comparison of our study vs existing works
Study Triggering Method EMA Frequency Real-time Analysis Data-based queries Context-based queries Online triggering method
Yu et al. (Yu et al., 2022) Time-based 1.5 hours apart, 10/day
Yu et al. (Yu et al., 2020) Time-based 1/day
Mundnich et al. (Mundnich et al., 2020) Time-based 1/day
Wang et al. (Wang et al., 2020a) Time-based Every three months
Battalio et al. (Battalio et al., 2021) Random and Time-based End of the day, varies
Our Offline Study Statistical-based A cap of seven EMAs per day
Our Online Study Active Reinforcement Learning A cap of seven EMAs per day

3. Offline Study

Refer to caption
Figure 1. System Architecture - Offline Study

In the initial stage of this work, we target to evaluate the effectiveness of our proposed label triggering method (Tazarv et al., 2023). The EMAs are dispatched to participants’ phones on a statistical-based basis. Once data collection was completed, we applied our proposed method of context-aware active reinforcement learning for the labeling process. Our offline study involved an Institutional Review Board (IRB)-sanctioned study on human subjects, during which we gathered over 2,629 days of data in everyday environments from college students.

The collected dataset encompasses PPG and various motion metrics (such as acceleration, gyroscope, and gravity readings), and is partially annotated with information on stress levels, emotional states, and physical activities, determined through EMAs conducted at statistical-decided intervals.

Our data collection initiative, spanning from June 2020 to June 2021, involved 20 volunteers selected from undergraduate and graduate student populations. The demographic breakdown of the participants included 13 male and 7 female students, ranging in age from 19 to 29 years. During the study, we gathered 109,586 samples over a period of 2,629 days. Participants contributed to the study for periods ranging from 11 to 287 days, with an average participation duration of 131 days. On average, each subject contributed 5,479 instances to the dataset. We commit to making this dataset publicly available following the acceptance of our paper.

3.1. Proposed System Architecture

Creating a reliable system to gather real-time physiological and contextual data while using active learning from participants is challenging. Wearable devices like smartwatches can be affected by motion artifacts, requiring extensive processing for stress detection (Seok et al., 2021). Timing label requests is crucial to ensure participant engagement and label reliability. Figure 1 depicts our offline proposed three-layer system including a sensor layer for data acquisition, an edge layer for data transmission and user interaction, and a cloud layer for data processing and decision-making respectively. This system architecture illustrates the architectural composition of our proposed three-layer system, called ZotCare (Sina et al., 2023).

3.2. Sensor layer

This study utilizes Samsung Galaxy Active 2 Watches, equipped with PPG (20Hz), accelerometer, and gyroscope sensors (Sarhaddi et al., 2022). We developed a Tizen-based smartwatch app to collect these signals (Vashisht et al., 2014). Data is sent to the cloud via Wi-Fi or Bluetooth to a smartphone when Wi-Fi is unavailable. The raw signal acquisition program consists of two services and a user interface (UI). The first service sends sensor data to the cloud every 15 minutes at 2-minute intervals.

3.3. Edge layer

We employ the AWARE framework (awa, 2023) to collect contextual data in everyday scenarios. AWARE is an open-source mobile tool designed for recording, sharing, and reusing context-related information on mobile devices. It utilizes the built-in sensors of smartphones to capture various aspects of daily life, including battery status, weather conditions, location, screen activity, and more. In situations where Wi-Fi connectivity is unavailable, we utilize an alternative smartphone application installed on the edge of our network. This application collects raw PPG signals and accelerometer data from the sensor layer through Bluetooth and subsequently transmits this data to cloud storage. To obtain stress level ratings from our study participants, we have developed an additional smartphone application. This application employs an EMA approach to request stress level assessments from the participants.

3.4. Cloud layer

This layer comprises two distinct modules:

  • Data Processing: This module focuses on processing data retrieved from the edge layer, with its primary objectives being encryption and the storage of data in the appropriate format on ZotCare servers.

  • Statistical-based EMA Triggering: Our label triggering method is consists of two phases:

    • Initial Stage: To obtain an initial approximation of the sample distribution within the sample space, we start the procedure with observation. During the first N samples (100 samples in our configuration, equivalent to approximately 25 hours of wearing the watch), no EMAs are initiated. By the conclusion of this phase, an estimation of the sample distribution in the sample space is obtained.

    • Query Stage: Subsequent to the initial phase (from N+1 onward), EMAs are triggered (labels requested) for a subset of samples. The selection probability for labeling each sample is proportionate to the number of preceding samples (unlabeled) in its proximity. This approach ensures that samples in regions with a substantial number of unlabeled counterparts are more likely to be queried for labels. Once a sufficient number of labeled samples are acquired for a particular region, the label collection for that region ceases. Nonetheless, the minimum probability of triggering an EMA for a sample is set at P = 0.1. Consequently, even if a sample is situated in an area with few or no previous samples, the probability of a query remains nonzero. This design enables exploration of unseen regions as well as regions with higher densities while maintaining a balanced approach.

3.5. Preprocessing

Following the conclusion of study and data collection, the collected raw PPG signals are preprocessed in order to extract relevant features for model building.

3.5.1. Data Cleaning

The bio-signals from wearable devices, which are inherently noisy, are stored directly in the cloud. The goal of this step is to remove clearly erroneous data points. We use the motion data to remove noises and artifacts in the bio-signals. In our study, we primarily focus on refining PPG signals and heart-rate data. For PPG signals, we utilize a bandpass Butterworth filter with cutoff frequencies between 0.7 Hz and 3.5 Hz. To ensure consistency, the filter is of the third order, and we adopt a sampling rate of 20 Hz, which aligns with our data collection parameters. Additionally, we implement a moving average over a 1-second window to smoothen the PPG data, mitigating artifacts commonly induced by body gestures and movements in daily environments.

3.5.2. Data Normalization

Normalization is indispensable when aiming to minimize variances specific to individual participants and countering the repercussions of subpar bio-signal samples. A standout method in this context, particularly in statistical analyses and machine learning, is the min-max normalization. This technique scales feature values consistently within a predetermined range, commonly [0, 1]. By doing so, it ensures the inherent structure of the data remains intact, which becomes crucial for algorithms that might be affected by the magnitude of the features. For our PPG bio-signals, we’ve employed the min-max normalization as our primary estimator to ensure that each feature aligns appropriately within a range dictated by the training set.

3.5.3. Feature Extraction

In our investigation, a feature extraction module was employed to analyze PPG data in 2-minute intervals. This facilitated the identification of PPG peaks and the derivation of key metrics such as heart rate. Utilizing the HeartPy library (Van Gent et al., 2019) for comprehensive PPG signal processing, we extracted 12 features from both electrical activations and pressure waveforms in the dataset. The extracted PPG features are presented in Table 3.

3.6. Context-aware Active Reinforcement Learning Algorithm

Our study utilizes the Context-aware Active Reinforcement Learning algorithm to label the collected data in our offline study. In this section, we provide a detailed explanation of this method and compare it with the random selection method and traditional active reinforcement learning methods for label querying.

In traditional supervised learning, the entire labeled dataset was utilized for training (Kotsiantis et al., 2007). However, within our personalized data collection approach, we accumulated labeled data from diverse sources over time. To leverage this, we iteratively queried our users for new annotations. Commencing with a subset of labeled data, we employed a query mechanism to identify the most informative unlabeled instances with the assistance of human experts.

Active learning played a pivotal role in this process, strategically selecting data samples for labeling to enhance model accuracy while minimizing data usage. Strategies encompassed uncertainty sampling (opting for ambiguous data) and diversity sampling (choosing unique and indicative data). Nevertheless, real-world queries incurred costs, necessitating a delicate balance between query expenses and model improvement.

In our scenario, we encountered distinctive challenges. User behavior influenced data quality and availability, contingent on factors such as activity, time, query frequency, and phone interaction. A lack of response resulted in the denial of labeling and delayed responses, diminishing data quality alignment. Consequently, our active learning approach needed to consider not only data quality but also future label accessibility. We proposed the use of Deep Q-Learning, where an agent modeled user behavior to ensure sustained user engagement.

3.6.1. Deep Q-learning

Deep Q-Learning (DQN) (Hester et al., 2018) is a model-free, online, off-policy reinforcement learning method. At its core, DQN seeks to estimate the action-value function, denoted as Q(s,a)𝑄𝑠𝑎Q(s,a)italic_Q ( italic_s , italic_a ), which predicts the expected return after taking an action a𝑎aitalic_a in state s𝑠sitalic_s. The Bellman equation, which is fundamental to Q-learning, is given by:

(1) Q(s,a)=r+γmaxaQ(s,a)𝑄𝑠𝑎𝑟𝛾subscriptsuperscript𝑎𝑄superscript𝑠superscript𝑎Q(s,a)=r+\gamma\max_{a^{\prime}}Q(s^{\prime},a^{\prime})italic_Q ( italic_s , italic_a ) = italic_r + italic_γ roman_max start_POSTSUBSCRIPT italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )

Where r𝑟ritalic_r is the immediate reward, γ𝛾\gammaitalic_γ is the discount factor, and ssuperscript𝑠s^{\prime}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the subsequent state after taking action a𝑎aitalic_a in state s𝑠sitalic_s. The primary distinction between traditional Q-learning and DQN is the utilization of deep neural networks to approximate the Q-values. This is paramount for tasks with large state spaces. The loss \mathcal{L}caligraphic_L during training is defined as:

(2) (θ)=𝔼(s,a,r,s)U(D)[(r+γmaxaQ(s,a;θ)Q(s,a;θ))2]𝜃subscript𝔼similar-to𝑠𝑎𝑟superscript𝑠𝑈𝐷delimited-[]superscript𝑟𝛾subscriptsuperscript𝑎𝑄superscript𝑠superscript𝑎superscript𝜃𝑄𝑠𝑎𝜃2\mathcal{L}(\theta)=\mathbb{E}_{(s,a,r,s^{\prime})\sim U(D)}\left[\left(r+% \gamma\max_{a^{\prime}}Q(s^{\prime},a^{\prime};\theta^{-})-Q(s,a;\theta)\right% )^{2}\right]caligraphic_L ( italic_θ ) = blackboard_E start_POSTSUBSCRIPT ( italic_s , italic_a , italic_r , italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∼ italic_U ( italic_D ) end_POSTSUBSCRIPT [ ( italic_r + italic_γ roman_max start_POSTSUBSCRIPT italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_θ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) - italic_Q ( italic_s , italic_a ; italic_θ ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]

Where D𝐷Ditalic_D is the replay buffer, U(D)𝑈𝐷U(D)italic_U ( italic_D ) is a uniform random sample from D𝐷Ditalic_D, θ𝜃\thetaitalic_θ are the network parameters, and θsuperscript𝜃\theta^{-}italic_θ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT are the target network parameters.

In the following part, we will elaborate the detailed definitions in our DQN.

3.6.2. State Representation

The state vector, denoted by s𝑠sitalic_s, encodes crucial information leading to decision-making. The intention is to refine and personalize the stress detector to optimize accuracy while minimizing user queries. The state comprises:

  • Uncertainty Factor: Originating from the raw output of a pre-trained classifier. This factor measures the distance from the decision boundary, effectively quantifying the confidence of the prediction.

  • Time-aware Response Rate: This accounts for the time of the day (in hourly intervals) and embodies the user’s responsiveness across different hours.

  • Time since Last Query: To enhance user experience and prevent excessive querying in short time frames.

  • Time of Day: Represents the current hour and is used to model potential variations in stress levels throughout the day.

3.6.3. Reward Formulation

The reward function integrates components from the ‘n_state‘ vector for holistic decision-making. It is formulated as:

(3) r0subscript𝑟0\displaystyle r_{0}italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT =11+e20(nstate[0]0.5)absent11superscript𝑒20subscript𝑛statedelimited-[]00.5\displaystyle=\frac{1}{1+e^{-20(n_{\text{state}[0]}-0.5)}}= divide start_ARG 1 end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT - 20 ( italic_n start_POSTSUBSCRIPT state [ 0 ] end_POSTSUBSCRIPT - 0.5 ) end_POSTSUPERSCRIPT end_ARG
(4) r1subscript𝑟1\displaystyle r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =reward_F(nstate[1])absentreward_Fsubscript𝑛statedelimited-[]1\displaystyle=\text{reward\_F}(n_{\text{state}[1]})= reward_F ( italic_n start_POSTSUBSCRIPT state [ 1 ] end_POSTSUBSCRIPT )
(5) r2subscript𝑟2\displaystyle r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =11+e10(nstate[2]0.5)absent11superscript𝑒10subscript𝑛statedelimited-[]20.5\displaystyle=\frac{1}{1+e^{-10(n_{\text{state}[2]}-0.5)}}= divide start_ARG 1 end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT - 10 ( italic_n start_POSTSUBSCRIPT state [ 2 ] end_POSTSUBSCRIPT - 0.5 ) end_POSTSUPERSCRIPT end_ARG

The overall reward function, R𝑅Ritalic_R, based on the action taken, is:

(6) R(action)={reward_pif action is True3reward_pif action is False𝑅actioncasesreward_pif action is True3reward_pif action is FalseR(\text{action})=\begin{cases}\text{reward\_p}&\text{if action is True}\\ 3-\text{reward\_p}&\text{if action is False}\end{cases}italic_R ( action ) = { start_ROW start_CELL reward_p end_CELL start_CELL if action is True end_CELL end_ROW start_ROW start_CELL 3 - reward_p end_CELL start_CELL if action is False end_CELL end_ROW

3.6.4. Q-Network Design

The Q-network constitutes the backbone of our framework. The structure and features are enumerated below:

  • The core is a densely connected neural network geared towards estimating Q-values.

  • Input: The network takes in 4 nodes, matching the count of state variables.

  • Hidden Layers: The architecture consists of variable hidden layers, as specified by the list hhitalic_h. In the provided example, four hidden layers are employed with 5, 9, 7, and 5 nodes, respectively. Each of these nodes uses the ReLU activation function and incorporates both l1𝑙1l1italic_l 1 and l2𝑙2l2italic_l 2 kernel regularizers, with the l2𝑙2l2italic_l 2 regularization strength set at 1e21superscript𝑒21e^{-2}1 italic_e start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT.

  • Output: The network furnishes 2 output nodes, indicative of the duo of feasible actions, with a linear activation function.

Additionally, in our experiments, the agent employed an ϵitalic-ϵ\epsilonitalic_ϵ-greedy policy accompanied by a linear annealing schedule for its exploration factor. This strategy ensures a gradual transition from exploration to exploitation during the learning process, thereby enhancing convergence and robustness in diverse environments. For our implementation, we leveraged the Keras-RL library (Plappert, 2016). To elucidate, the ϵitalic-ϵ\epsilonitalic_ϵ-greedy policy in reinforcement learning can be characterized as follows:

(7) π(a|s)={ϵ+1ϵ|A|if a=argmaxaAQ(s,a)1ϵ|A|otherwise𝜋conditional𝑎𝑠casesitalic-ϵ1italic-ϵ𝐴if 𝑎subscriptargmaxsuperscript𝑎𝐴𝑄𝑠superscript𝑎1italic-ϵ𝐴otherwise\pi(a|s)=\begin{cases}\epsilon+\frac{1-\epsilon}{|A|}&\text{if }a=\text{argmax% }_{a^{\prime}\in A}Q(s,a^{\prime})\\ \frac{1-\epsilon}{|A|}&\text{otherwise}\end{cases}italic_π ( italic_a | italic_s ) = { start_ROW start_CELL italic_ϵ + divide start_ARG 1 - italic_ϵ end_ARG start_ARG | italic_A | end_ARG end_CELL start_CELL if italic_a = argmax start_POSTSUBSCRIPT italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_A end_POSTSUBSCRIPT italic_Q ( italic_s , italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 - italic_ϵ end_ARG start_ARG | italic_A | end_ARG end_CELL start_CELL otherwise end_CELL end_ROW

Where:

  • π(a|s)𝜋conditional𝑎𝑠\pi(a|s)italic_π ( italic_a | italic_s ) is the probability of taking action a𝑎aitalic_a in state s𝑠sitalic_s.

  • ϵitalic-ϵ\epsilonitalic_ϵ is the exploration probability.

  • A𝐴Aitalic_A is the set of possible actions.

  • Q(s,a)𝑄𝑠superscript𝑎Q(s,a^{\prime})italic_Q ( italic_s , italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) is the estimated value of taking action asuperscript𝑎a^{\prime}italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in state s𝑠sitalic_s.

For our experiment, we employed a sequential memory architecture with a capacity limited to 50,000 instances and a window length set at one. The DQN agent was initialized with parameters set as follows: a discount factor γ𝛾\gammaitalic_γ at 0.95, a warm-up phase consisting of 100 steps, and a learning rate of 1e21superscript𝑒21e^{-2}1 italic_e start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT for the target model update. Optimization was carried out using the Adam optimizer, and the performance was gauged using the Mean Absolute Error (MAE) metric.

3.6.5. Policy Strategy

The decision strategy, symbolized by πθ(s,a)subscript𝜋𝜃𝑠𝑎\pi_{\theta}(s,a)italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_s , italic_a ), selects the action that corresponds to the optimum Q-value for a specified state s𝑠sitalic_s. To guarantee a complete traversal of the state space:

  • On an estimated 5% of occasions, a query is initiated randomly. This procedure considers the potentially undiscovered areas within the state space.

  • Following every K=100𝐾100K=100italic_K = 100 occasions, which ideally occurs once in a 24-hour span, the user’s response frequency metrics are re-calibrated according to their interaction patterns. This adjustment acknowledges individual variability and revitalizes the query selection methodology tailored for each user.

  • The stress detection module undergoes periodic retraining. This process assimilates the most recent subjectively labeled information, amalgamated with the prior objective data procured from diverse subjects.

3.7. Evaluation and Results

We initiated the pre-training of the stress detector, which was subsequently utilized to derive the state and reward in constructing the active learning framework. A random forest classifier with n = 500 estimators (number of trees) and max depth = 5 for each tree was employed. Participants were requested to assess their stress levels on a five-point scale: (1) not at all, (2) a little bit, (3) some, (4) a lot, and (5) extremely. We translated the stress labels into two categories: a lot and extremely as 1 (stressed), and the remaining three labels as 0 (not stressed).

We conducted training for the model using labeled data from 14 different subjects, reserving one subject for personalization. The newly trained model on this subject exhibited a recall value of recall = 0.238 for the minority class (class stressed). The Q-learning agent underwent pre-training with an offline sequential objective dataset. Upon completing a sequence (referred to as one episode), we restarted from the beginning until reaching the total number of steps. The model underwent training for K=200,000 steps. The total reward achieved during the episodes reached a saturation point before this stage, indicating the model’s convergence.

The agent underwent training in two distinct modes. Initially, we employed a ”traditional” approach, where the agent’s attention was solely on the state and reward associated with the classifier’s raw output. The agent received significant rewards for executing the ’submit query’ action for instances within the classifier’s uncertainty region, while being rewarded for the ’do not submit query’ action for other instances. In this mode, the agent operated without capturing contextual information and functioned akin to a conventional active learning selection policy. Subsequently, a modification was introduced by incorporating contextual information into the reward function, as previously described. High rewards were assigned for the ’submit query’ action for instances not only within the uncertainty region but also from a time interval when the user demonstrated increased responsiveness. Moreover, instances not in a short time distance from the preceding ’select’ action were considered. For other instances, the agent received rewards for executing the ’do not submit query’ action.

We conducted an analysis on the quantity of queries needed to achieve a particular level of personalization. The experiment was repeated N = 100 times to mitigate the influence of random selection. The average number of instances is presented in Figure 2. Notably, a substantial disparity exists between random selection and DQN agents, with the context-aware agent demonstrating the capability to attain high performance with a significantly lower number of queries. Specifically, the context-aware selection policy reduces the required queries by up to 88%, in contrast to the random selection method. While the number of necessary labels remains consistent for the two DQN agents throughout the analysis (as expected), the context-aware agent manages to reduce the required queries by up to 32%. These findings, derived from a subject with a higher number of labels, exhibit similar trends when extended to data from other subjects.

Refer to caption
Figure 2. Number of queries needed to reach a certain performance level during personalization.

We examined the effectiveness of two agents and a random selection policy in achieving the primary objective of personalizing the classifier, and the findings are illustrated in Figure 3. The number of instances selected for querying remained consistent across each step for all selection methods. However, a subset of these selected instances stayed unlabeled, resulting in different quantities of instances available for personalization depending on the selection policies. Aside from this, the selected instances had varying impacts on personalization under different policies, comparing random selection to the other two methods. From a single subject, we obtained a total of 12,700 instances, with 922 labeled. We reserved 230 labeled instances (from the end of the sequence) as test data from one subject, leaving the remainder for training (25% - 75%). The process started without any personalization, gradually progressing through partially labeled subjective data. A subset of this data was chosen for querying, and a part of the selected data was labeled, with the labeled data being utilized for personalization. At each step, the context-aware agent selected fewer instances than the non-context-aware agent. To ensure a fair comparison, we randomly down-sampled the number of queries from the larger group. Additionally, for random selection, we randomly picked a number of partially labeled samples equivalent to the number of queries from the agents. To mitigate the impact of random selection, at each step, we selected instances and personalized the models N = 100 times. Figure 3 displays the mean and standard deviation of recall (True Positive Rate) for the stressed class on test data for the three selection methods.

Refer to caption
Figure 3. Presonalization Recall in Previous Work

Instances that are selected by the random agent do not improve the performance significantly since they include samples from the entire region of the input space of the classifier, including samples whose class is ‘trivial’ to be extracted. Instances that are selected by a non-contextual active learning method (blue curve) increase the performance. However, with an equal number of queries, the best result is achieved when the agent is context aware (green curve), since it results in a higher number of impactful instances which also have a higher chance to receive the label.

4. Online Study

In our online study, we employ the Context-aware Active Learning Deep Q-Network (Context-aware AL DQN) algorithm, aligned with our offline investigation, to assess the efficacy of our proposed algorithm in a real-time setting where users are actively involved in training the Reinforcement Learning (RL) agent for decision-making. This real-time approach significantly reduces the user burden and has the potential to enhance stress detection performance compared to its offline counterpart by leveraging a real-time smart RL agent that query labels.

We assessed data derived from a cohort of 34 individuals. This study spanned from March 2022 to May 2023. Participants, ranging in age from 19 to 29 years, provided a comprehensive dataset. After filtering out anomalous and noisy records, we aggregated 23,012 samples over a period of 420 days. On an average basis, each participant yielded 676 distinct samples. It is noteworthy to mention that the respective IRB granted approval for all aspects of this investigation. We pledge to release the dataset to the public once our paper is accepted.

Refer to caption
Figure 4. System Architecture - Online Study

4.1. Proposed System Architecture

The proposed system architecture is illustrated in Figure 4. Consistent with our offline study, we maintain a three-layer system, denoted as ZotCare. A comparison with the architecture presented in Figure 1 reveals that, although the sensor layer and the edge layer remain unchanged, substantial modifications are implemented in the cloud layer. These alterations are undertaken to render the system conducive to real-time label querying through the utilization of our proposed context-aware active reinforcement learning algorithm explained in 3.6.

Cloud layer mainly comprises of four distinct modules to replace the previous simple statistical-based triggering method with our proposed triggering algorithm.

  • PPG Signal Preprocessing and Feature Extraction: To effectively identify moments conducive to experiencing stress, we continuously monitor participants’ stress levels in real time using PPG signals from their watches. However, to make these signals suitable for stress prediction, several preprocessing steps are required. This module is dedicated to preparing the PPG signals for stress detection. More details can be found in Section 4.2.

  • Stress Detection: We utilize the data from our previous study (Tazarv et al., 2023) to construct our stress detection module. The features extracted from the PPG signals are input into this module for stress detection. The level of certainty regarding stress is then forwarded to the context-aware active reinforcement learning module to aid in identifying stressful moments.

  • Context Recognition: Within this module, we extract contextual information pertaining to each user, which is subsequently provided to our active learning module for decision-making purposes. This includes factors such as the time elapsed since the last query, the time of day, and the time-aware response rate. The time-aware response rate considers the user’s responsiveness within the current hour based on their historical activity.

  • Context-Aware Active Reinforcement Learning: The primary objective of this module is to determine whether it is appropriate to trigger an EMA at any given moment. Stress certainty, time elapsed since the last query, time of day, and time-aware response rate are all input into this module to inform the decision-making process. If an EMA needs to be triggered, a notification is dispatched to the user’s mobile device on the edge layer, prompting them to rate their current stress level. In the following section, we will delve deeper into the training process of the active reinforcement learning agent.

4.2. Preprocessing

PPG signals, contextual AWARE data, and user-reported stress levels are collected from the cloud for stress model construction. However, raw cloud-stored PPG and AWARE data need preprocessing before building the model. This section explains our data preparation steps.

4.2.1. Data Cleaning and Normalization

This study employs the same modules for data cleaning and normalization as discussed in the offline study section (see Section 3.5).

4.2.2. Feature Extraction

  • PPG Features: This module has been previously discussed in 3.5. For the information regarding the PPG features, please refer to the Table 3.

  • Contextual Features: The raw contextual information obtained from AWARE is not ready for building the stress detection models. We transform both categorical and numerical raw features into solely numerical features. We show the features extracted from raw AWARE data in Table 2.

Table 2. AWARE Features
Feature Definition
Call Call duration, type, and count
Notification APP source and count
Screen & Touch User screen interactions
Battery Battery charge duration and level
Message Message type and count
Time Time of the day (24-hour format)
Location Longitude, latitude, altitude
Table 3. PPG Features
Feature Definition
BPM Heart beats per minutet
IBI Inter-Beat Interval, the average time interval between two successive heartbeats (NN intervals)
SDNN Standard deviation of NN intervals
SDSD Standard deviation of successive differences between adjacent NNs
RMSSD Root mean square of successive differences between the adjacent NNs
PNN20 The proportion of successive NNs greater than 20ms
PNN50 The proportion of successive NNs greater than 50ms
HR_mad Median absolute deviation of NN intervals
SD1 and SD2 Standard deviations of the corresponding Poincare plot
S Area of ellipse described by SD1 and SD2
BR The number of breaths per minute (breathing rate)

4.2.3. Data Labeling

The EMA protocol is set to activate no more than seven times daily, prompting the participants to rate their stress levels on a five-point scale: (1) not at all, (2) a little bit, (3) some, (4) a lot, and (5) extremely. These self-reported stress levels, along with their associated timestamps, are archived in the cloud for future analysis. Each 15-minute interval of accumulated physiological and contextual data is labeled in accordance with the nearest subsequent EMA response. The distribution of these labels can be seen in Figure 5.

Refer to caption
Figure 5. Distribution of Stress Labels

4.3. Evaluation and Results

In order to conduct a comprehensive comparison between our online context-aware active learning method for stress detection and previously offline variant, we have deliberately employed identical classification algorithms for both studies.

Three distinct classification techniques have been used: Support Vector Machines (SVM) (Hearst et al., 1998), Random Forest (Breiman, 2001), and XGBoost (Chen et al., 2015). SVM finds a hyperplane in high-dimensional space to separate data classes. Random Forest uses multiple decision tree classifiers on dataset subsets, improving predictive accuracy while avoiding overfitting. Additionally, XGBoost is employed, providing an effective gradient-boosted trees implementation.

The utilization of these diverse classification techniques enables a comprehensive and robust evaluation of our proposed stress detection algorithm.

Our stress detection models are classified into two categories: single-modal and multi-modal algorithms. Within the single-modal algorithm, solely the PPG signal is employed for constructing the stress detection models. On the other hand, the multi-modal algorithm utilizes both the PPG signal and contextual information (AWARE data) in the development of the proposed models.

4.4. Classification Performance

4.4.1. Experiment Detail

In order to ensure a fair evaluation, we utilize the k-fold cross validation technique (Berrar et al., 2019) with k equal to 4.

K-fold cross-validation involves splitting the data into multiple subsets for training and testing the model. It prevents overfitting, utilizes all available data, and improves model robustness against data variations. Averaging results across folds provides a reliable way to evaluate model performance, making it valuable for model selection and hyperparameter tuning.

4.4.2. Evaluation Metrics

To evaluate our stress monitoring system, we use three key metrics: F1-score, precision, and recall. The F1-score assesses binary categorization test accuracy, calculated from precision and recall, where precision measures correctly identified ”true positive” results and recall identifies all ”true positive” results. F1-score is a weighted average of precision and recall, important for binary classification tests.

4.4.3. Classification Performance Results

Table 4 presents a comprehensive performance analysis of our novel stress detection algorithm, incorporating an online context-aware active learning approach, compared with the offline variant.

The results clearly illustrate the substantial performance enhancements achievable with the online context-aware algorithm across all evaluated metrics when compared to the offline counterpart. Notably, for the Random Forest classifier, we observe a noteworthy 11% improvement in F1-score. The significant improvement in performance underscores the importance of employing intelligent real-time label triggering methods to identify optimal moments for sending Ecological Momentary Assessments (EMAs).

This outcome also underscores the considerable advantage of incorporating contextual awareness into our model, resulting in significant enhancements across various classification metrics and reaffirming the pivotal role of context in stress detection tasks.

Table 4. Classification Performance Results
Classification Model
Random Forest XGBoost SVM
Active Learning Method Data F1 Precision Recall F1 Precision Recall F1 Precision Recall
Offline Context-Aware PPG 0.21 0.27 0.17 0.31 0.35 0.27 0.41 0.32 0.58
Online Context-Aware PPG 0.32 0.43 0.25 0.39 0.43 0.35 0.5 0.41 0.64
Online Context-Aware PPG and Context 0.36 0.49 0.28 0.40 0.45 0.36 0.52 0.45 0.61

4.5. Personalization Performance

Leveraging the unique physiological and behavioral variations in individuals can significantly enhance the efficacy of generic models. Inspired by the potential advantages of individualized prediction models, we hypothesize that personalizing reinforcement learning models might similarly elevate data quality. To validate this premise, we initially trained a generalized representation model using the aggregated training data from all users. Subsequently, we fine-tuned this model for each user individually, aiming to discern potential enhancements in prediction accuracy. Our evaluation centered on contrasting these generalized and personalized models to elucidate the tangible benefits of our individual-based personalization strategy in data collection.

4.5.1. Experiment Detail

To accurately evaluate the affect of personalization in our data collection mechanism, we implemented a unique train-test splitting strategy. Our dataset comprises data from multiple users. To ensure a robust evaluation, we adopted a leave-one-subject-out cross-validation scheme. In each round of this scheme, data for each user is divided temporally into two parts. The initial half serves the purpose of model personalization, while the latter half is reserved for testing.

Two distinct models were constructed for comparative assessment:

  • Plain Model: This model is trained using the entire dataset except for the data of the user currently under consideration. For testing and evaluation, the latter half of this user’s data is employed.

  • Personalized Model: This model, on the other hand, is trained using the complete dataset (excluding the data of the current user) combined with the initial half of the current user’s data. Again, the latter half of the user’s data is utilized for testing.

Comparing these two models helps us gauge the effectiveness of our personalization strategy. By contrasting their performance, we can see how incorporating user-specific data for training improves accuracy significantly compared to using a generic global dataset.

4.5.2. Personalization Performance Results

Table 4 showcases the comparative performance of our stress detection model in both personalized and unpersonalized configurations. We present outcomes from both the ROC curve, as referenced in 4, and additional performance metrics. The ROC curve assesses binary classification model efficacy by illustrating the relationship between True Positive Rate and False Positive Rate across various decision thresholds. An elevated Area Under the Curve (AUC) signifies superior model performance, underscoring its merit as a comparative measure. The findings, as depicted in the provided figure and table, reveal that adopting a personalized training approach markedly amplifies the efficacy of our stress detection strategy, as evidenced by the AUC-ROC score. Notably, when employing the XGBoost classifier, we observed a pronounced boost of approximately 10% in the AUC-ROC score.

Table 5. Personalization Results
Classification Model
Random Forest XGBoost SVM
Personalized Training Method F1 Precision Recall F1 Precision Recall F1 Precision Recall
Not Personalized 0.60 0.55 0.64 0.60 0.54 0.68 0.62 0.59 0.66
Personalized 0.64 0.61 0.66 0.66 0.61 0.71 0.65 0.66 0.65
Refer to caption
Figure 6. Presonalization ROC Curve

5. Conclusions

In conclusion, this work introduced a novel contextual variant of active learning, leveraging Deep Q-Learning to incorporate individual contextual information into the decision-making process (Tazarv et al., 2023, 2021). In the initial phase, the implementation of a context-aware active reinforcement learning algorithm in an offline setting showcased its efficacy, resulting in a significant reduction of up to 88% in required EMAs compared to random selection and up to 32% compared to traditional active learning methods. Additionally, stress detection performance exhibited notable improvements, with up to a 21% enhancement compared to random selection and up to 8% compared to traditional active learning.

Moving to the second phase, our online implementation of the algorithm utilized active learning for EMA initiation, leveraging real-time contextual information to optimize question timings and reduce participant burden. Comparative analyses of the offline and online variants on the same dataset unequivocally demonstrated the superiority of the online algorithm, showcasing a potential improvement of up to 11% in stress detection performance. Incorporating contextual features further improved results by 4%, emphasizing the significance of personalization in enhancing model performance.

This study not only contributes a valuable advancement in stress detection methodologies but also underscores the pivotal role of context-awareness and online implementation in achieving superior results. The demonstrated reductions in participant burden and improvements in label accuracy signify the potential practical impact of this research in real-world applications. Future directions may explore additional personalization techniques and extend the application of context-aware active learning to diverse domains, fostering continued advancements in intelligent and user-centric systems.

References

  • (1)
  • ame (2021) 2021. The American Institute of Stresss. https://www.stress.org
  • awa (2023) 2023. AWARE Framework. https://awareframework.com
  • may (2023) 2023. Mayo Clinic. https://www.mayoclinic.org
  • Alikhani et al. (2023) Hamidreza Alikhani, Anil Kanduri, Pasi Liljeberg, Amir M Rahmani, and Nikil Dutt. 2023. DynaFuse: Dynamic Fusion for Resource Efficient Multi-Modal Machine Learning Inference. IEEE Embedded Systems Letters (2023).
  • Alikhani et al. (2024) Hamidreza Alikhani, Ziyu Wang, Anil Kanduri, Pasi Lilieberg, Amir M Rahmani, and Nikil Dutt. 2024. SEAL: Sensing Efficient Active Learning on Wearables through Context-awareness. In 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–2.
  • Allen (2007) John Allen. 2007. Photoplethysmography and its application in clinical physiological measurement. Physiological measurement 28, 3 (2007), R1.
  • Battalio et al. (2021) Samuel L Battalio et al. 2021. Sense2Stop: a micro-randomized trial using wearable sensors to optimize a just-in-time-adaptive stress management intervention for smoking relapse prevention. Contemporary Clinical Trials 109 (2021), 106534.
  • Berrar et al. (2019) Daniel Berrar et al. 2019. Cross-Validation.
  • Breiman (2001) Leo Breiman. 2001. Random forests. Machine learning 45 (2001), 5–32.
  • Burke et al. (2017) Lora E Burke, Saul Shiffman, Edvin Music, Mindi A Styn, Andrea Kriska, Asim Smailagic, Daniel Siewiorek, Linda J Ewing, Eileen Chasens, Brian French, et al. 2017. Ecological momentary assessment in behavioral research: addressing technological and human participant challenges. Journal of medical Internet research 19, 3 (2017), e77.
  • Can et al. (2020) Yekta Said Can et al. 2020. Real-life stress level monitoring using smart bands in the light of contextual information. IEEE Sensors Journal 20, 15 (2020), 8721–8730.
  • Castaneda et al. (2018) Denisse Castaneda et al. 2018. A review on wearable photoplethysmography sensors and their potential future applications in health care. International journal of biosensors & bioelectronics 4, 4 (2018), 195.
  • Charlton et al. (2018) Peter H Charlton et al. 2018. Assessing mental stress from the photoplethysmogram: a numerical study. Physiological measurement 39, 5 (2018), 054001.
  • Chen et al. (2015) Tianqi Chen et al. 2015. Xgboost: extreme gradient boosting. R package version 0.4-2 1, 4 (2015), 1–4.
  • Cheng et al. (2024a) Ming Cheng, Bowen Zhang, Ziyu Wang, Ziyi Zhou, Weiqi Feng, Yi Lyu, and Xingjian Diao. 2024a. VeTraSS: Vehicle Trajectory Similarity Search Through Graph Modeling and Representation Learning. arXiv preprint arXiv:2404.08021 (2024).
  • Cheng et al. (2024b) Ming Cheng, Ziyi Zhou, Bowen Zhang, Ziyu Wang, Jiaqi Gan, Ziang Ren, Weiqi Feng, Yi Lyu, Hefan Zhang, and Xingjian Diao. 2024b. Efflex: Efficient and Flexible Pipeline for Spatio-Temporal Trajectory Graph Modeling and Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2546–2555.
  • Cohen et al. (1994) Sheldon Cohen, Tom Kamarck, Robin Mermelstein, et al. 1994. Perceived stress scale. Measuring stress: A guide for health and social scientists 10, 2 (1994), 1–2.
  • Dogan et al. (2022) Gulin Dogan et al. 2022. Stress detection using experience sampling: A systematic mapping study. International Journal of Environmental Research and Public Health 19, 9 (2022), 5693.
  • Fahrenberg et al. (2007) Jochen Fahrenberg, Michael Myrtek, Kurt Pawlik, and Meinrad Perrez. 2007. Ambulatory assessment-monitoring behavior in daily life settings. European Journal of Psychological Assessment 23, 4 (2007), 206–213.
  • Giannakakis et al. (2019) Giorgos Giannakakis, Dimitris Grigoriadis, Katerina Giannakaki, Olympia Simantiraki, Alexandros Roniotis, and Manolis Tsiknakis. 2019. Review on psychological stress detection using biosignals. IEEE Transactions on Affective Computing 13, 1 (2019), 440–460.
  • Han et al. (2020) Hee Jeong Han et al. 2020. Objective stress monitoring based on wearable sensors in everyday settings. Journal of Medical Engineering & Technology 44, 4 (2020), 177–189.
  • Hearst et al. (1998) Marti A. Hearst et al. 1998. Support vector machines. IEEE Intelligent Systems and their applications 13, 4 (1998), 18–28.
  • Hester et al. (2018) Todd Hester et al. 2018. Deep q-learning from demonstrations. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.1.
  • Holmes and Rahe (1967) Thomas H Holmes and Richard H Rahe. 1967. The social readjustment rating scale. Journal of psychosomatic research (1967).
  • Kanduri et al. (2023) Anil Kanduri, Sina Shahhosseini, Emad Kasaeyan Naeini, Hamidreza Alikhani, Pasi Liljeberg, Nikil Dutt, and Amir M Rahmani. 2023. Edge-centric Optimization of Multi-modal ML-driven eHealth Applications. In Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing: Use Cases and Emerging Challenges. Springer, 95–125.
  • Kotsiantis et al. (2007) Sotiris B Kotsiantis et al. 2007. Supervised machine learning: A review of classification techniques. Emerging artificial intelligence applications in computer engineering 160, 1 (2007), 3–24.
  • Larradet et al. (2020) Fanny Larradet et al. 2020. Toward emotion recognition from physiological signals in the wild: approaching the methodological issues in real-life data collection. Frontiers in psychology 11 (2020), 1111.
  • Mundnich et al. (2020) Karel Mundnich et al. 2020. TILES-2018, a longitudinal physiologic and behavioral data set of hospital workers. Scientific Data 7, 1 (2020), 354.
  • Plappert (2016) Matthias Plappert. 2016. keras-rl. https://github.com/keras-rl/keras-rl.
  • Sannino et al. (2014) Giovanna Sannino et al. 2014. A mobile system for real-time context-aware monitoring of patients’ health and fainting. International journal of data mining and bioinformatics 10, 4 (2014), 407–423.
  • Sarhaddi et al. (2022) Fatemeh Sarhaddi et al. 2022. A comprehensive accuracy assessment of Samsung smartwatch heart rate and heart rate variability. PloS one 17, 12 (2022), e0268361.
  • Seok et al. (2021) Dongyeol Seok et al. 2021. Motion artifact removal techniques for wearable EEG and PPG sensor systems. Frontiers in Electronics 2 (2021), 685513.
  • Settles (2009) Burr Settles. 2009. Active learning literature survey. (2009).
  • Sina et al. (2023) Labbaf Sina et al. 2023. ZotCare: a flexible, personalizable, and affordable mhealth service provider. Front. Digit. Health (2023).
  • Steptoe et al. (2003) Andrew Steptoe, Sabine Kunz-Ebrecht, Natalie Owen, Pamela J Feldman, Gonneke Willemsen, Clemens Kirschbaum, and Michael Marmot. 2003. Socioeconomic status and stress-related biological responses over the working day. Psychosomatic medicine 65, 3 (2003), 461–470.
  • Stojchevska et al. (2022) Marija Stojchevska et al. 2022. Assessing the added value of context during stress detection from wearable data. BMC Medical Informatics and Decision Making 22, 1 (2022), 268.
  • Tazarv et al. (2021) Ali Tazarv et al. 2021. Personalized stress monitoring using wearable sensors in everyday settings. In 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 7332–7335.
  • Tazarv et al. (2023) Ali Tazarv et al. 2023. Active Reinforcement Learning for Personalized Stress Monitoring in Everyday Settings. arXiv preprint arXiv:2305.00111 (2023).
  • Van Gent et al. (2019) Paul Van Gent et al. 2019. HeartPy: A novel heart rate algorithm for the analysis of noisy signals. Transportation research part F: traffic psychology and behaviour 66 (2019), 368–378.
  • Vashisht et al. (2014) Geetika Vashisht et al. 2014. A study on the Tizen Operating System. International Journal of Computer Trends and Technology 12, 1 (2014), 14–15.
  • Wang et al. (2020a) Weichen Wang et al. 2020a. Social sensing: assessing social functioning of patients living with schizophrenia using mobile phone sensing. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1–15.
  • Wang et al. (2020b) Ziyu Wang et al. 2020b. GuardHealth: Blockchain empowered secure data management and Graph Convolutional Network enabled anomaly detection in smart healthcare. J. Parallel and Distrib. Comput. 142 (2020), 1–12.
  • Wang et al. (2024) Ziyu Wang, Zhongqi Yang, Iman Azimi, and Amir M Rahmani. 2024. Differential private federated transfer learning for mental health monitoring in everyday settings: A case study on stress detection. arXiv preprint arXiv:2402.10862 (2024).
  • Yang et al. (2022) Xinyu Yang, Haoyuan Liu, Ziyu Wang, and Peng Gao. 2022. Zebra: Deeply integrating system-level provenance search and tracking for efficient attack investigation. arXiv preprint arXiv:2211.05403 (2022).
  • Yao et al. (2020) Yuanfan Yao et al. 2020. Privacy-preserving and energy efficient task offloading for collaborative mobile computing in IoT: An ADMM approach. Computers & Security 96 (2020), 101886.
  • Yu et al. (2020) H Yu et al. 2020. Passive sensor data based future mood health and stress prediction: User adaptation using deep learning; passive sensor data based future mood health and stress prediction: User adaptation using deep learning. (2020).
  • Yu et al. (2022) Han Yu et al. 2022. Semi-supervised learning and data augmentation in wearable-based momentary stress detection in the wild. arXiv preprint arXiv:2202.12935 (2022).