Stats

16 pages, 1454 KiB

Open AccessArticle

Seismic Evaluation Based on Poisson Hidden Markov Models—The Case of Central and South America

by Evangelia Georgakopoulou, Theodoros M. Tsapanos, Andreas Makrides, Emmanuel Scordilis, Alex Karagrigoriou, Alexandra Papadopoulou and Vassilios Karastathis

Stats 2024, 7(3), 777-792; https://doi.org/10.3390/stats7030047 - 23 Jul 2024

Abstract

A study of earthquake seismicity is undertaken over the areas of Central and South America, the tectonics of which are of great interest. The whole territory is divided into 10 seismic zones based on some seismotectonic characteristics, as in previously published studies. The [...] Read more.

A study of earthquake seismicity is undertaken over the areas of Central and South America, the tectonics of which are of great interest. The whole territory is divided into 10 seismic zones based on some seismotectonic characteristics, as in previously published studies. The earthquakes used in the present study are extracted from the catalogs of the International Seismological Center, cover the period of 1900–2021, and are restricted to shallow depths (≤60 km) and a magnitude

M \geq 4.5

. Fore- and aftershocks are removed according to Reasenberg’s technique. The paper confines itself to the evaluation of earthquake occurrence probabilities in the seismic zones covering parts of Central and South America, and we implement the hidden Markov model (HMM) and apply the EM algorithm. Full article

► Show Figures

Figure 1

16 pages, 1017 KiB

Open AccessArticle

Time-Varying Correlations between JSE.JO Stock Market and Its Partners Using Symmetric and Asymmetric Dynamic Conditional Correlation Models

by Anas Eisa Abdelkreem Mohammed, Henry Mwambi and Bernard Omolo

Stats 2024, 7(3), 761-776; https://doi.org/10.3390/stats7030046 - 22 Jul 2024

Abstract

The extent of correlation or co-movement among the returns of developed and emerging stock markets remains pivotal for efficiently diversifying global portfolios. This correlation is prone to variation over time as a consequence of escalating economic interdependence fostered by international trade and financial [...] Read more.

The extent of correlation or co-movement among the returns of developed and emerging stock markets remains pivotal for efficiently diversifying global portfolios. This correlation is prone to variation over time as a consequence of escalating economic interdependence fostered by international trade and financial markets. In this study, the time-varying correlation and co-movement between the JSE.JO stock market of South Africa and its developed and developing stock market partners are analyzed. The dynamic conditional correlation–exponential generalized autoregressive conditional heteroscedasticity (DCC-EGARCH) methodology is employed with different multivariate distributions to explore the time-varying correlation and volatilities between the JSE.JO stock market and its partners. Based on the conditional correlation results, the JSE.JO stock market is integrated and co-moves with its partners, and the conditional correlation for all markets exhibits time-variant behavior. The conditional volatility results show that the JSE.JO stock market behaves differently from other markets, especially after 2015, indicating a positive sign for investors to diversify between the JSE.JO and its partners. The highest value of conditional volatility for markets was in 2020 during the COVID-19 pandemic, representing the riskiest period that investors should avoid due to the lack of diversification opportunities during crises. Full article

(This article belongs to the Section Time Series Analysis)

► Show Figures

Figure 1

16 pages, 520 KiB

Open AccessCase Report

Parametric Estimation in Fractional Stochastic Differential Equation

by Paramahansa Pramanik, Edward L. Boone and Ryad A. Ghanam

Stats 2024, 7(3), 745-760; https://doi.org/10.3390/stats7030045 - 20 Jul 2024

Abstract

Fractional Stochastic Differential Equations are becoming more popular in the literature as they can model phenomena in financial data that typical Stochastic Differential Equations models cannot. In the formulation considered here, the Hurst parameter, H, controls the Fraction of Differentiation, which needs [...] Read more.

Fractional Stochastic Differential Equations are becoming more popular in the literature as they can model phenomena in financial data that typical Stochastic Differential Equations models cannot. In the formulation considered here, the Hurst parameter, H, controls the Fraction of Differentiation, which needs to be estimated from the data. Fortunately, the covariance structure among observations in time is easily expressed in terms of the Hurst parameter which means that a likelihood is easily defined. This work derives the Maximum Likelihood Estimator for H, which shows that it is biased and is not a consistent estimator. Simulation data used to understand the bias of the estimator is used to create an empirical bias correction function and a bias-corrected estimator is proposed and studied. Via simulation, the bias-corrected estimator is shown to be minimally biased and its simulation-based standard error is created, which is then used to create a 95% confidence interval for H. A simulation study shows that the 95% confidence intervals have decent coverage probabilities for large n. This method is then applied to the S&P500 and VIX data before and after the 2008 financial crisis. Full article

(This article belongs to the Special Issue Novel Semiparametric Methods)

► Show Figures

Figure 1

13 pages, 288 KiB

Open AccessCase Report

Bayesian Model Averaging and Regularized Regression as Methods for Data-Driven Model Exploration, with Practical Considerations

by Hyemin Han

Stats 2024, 7(3), 732-744; https://doi.org/10.3390/stats7030044 - 18 Jul 2024

Abstract

Methodological experts suggest that psychological and educational researchers should employ appropriate methods for data-driven model exploration, such as Bayesian Model Averaging and regularized regression, instead of conventional hypothesis-driven testing, if they want to explore the best prediction model. I intend to discuss practical [...] Read more.

Methodological experts suggest that psychological and educational researchers should employ appropriate methods for data-driven model exploration, such as Bayesian Model Averaging and regularized regression, instead of conventional hypothesis-driven testing, if they want to explore the best prediction model. I intend to discuss practical considerations regarding data-driven methods for end-user researchers without sufficient expertise in quantitative methods. I tested three data-driven methods, i.e., Bayesian Model Averaging, LASSO as a form of regularized regression, and stepwise regression, with datasets in psychology and education. I compared their performance in terms of cross-validity indicating robustness against overfitting across different conditions. I employed functionalities widely available via R with default settings to provide information relevant to end users without advanced statistical knowledge. The results demonstrated that LASSO showed the best performance and Bayesian Model Averaging outperformed stepwise regression when there were many candidate predictors to explore. Based on these findings, I discussed appropriately using the data-driven model exploration methods across different situations from laypeople’s perspectives. Full article

(This article belongs to the Section Data Science)

13 pages, 265 KiB

Open AccessCase Report

Transitioning from the University to the Workplace: A Duration Model with Grouped Data

by Manuel Salas-Velasco

Stats 2024, 7(3), 719-731; https://doi.org/10.3390/stats7030043 - 16 Jul 2024

Abstract

Labor market surveys usually measure unemployment duration in time intervals. In these cases, traditional duration models such as Cox regression and parametric survival models are not suitable for studying the duration of unemployment spells. In order to deal with this above issue, we [...] Read more.

Labor market surveys usually measure unemployment duration in time intervals. In these cases, traditional duration models such as Cox regression and parametric survival models are not suitable for studying the duration of unemployment spells. In order to deal with this above issue, we use Han and Hausman’s ordered logit model for grouped durations, which has more flexibility than standard specifications. In particular, its flexibility arises from the fact that we do not need to specify any functional form for the baseline hazard function—it also circumvents problems associated with heterogeneity. The focus of interest is on the first unemployment duration of higher education graduates. The analysis is accomplished by using a large dataset from a graduate survey of Spanish university graduates. The results show that the university-to-work transition of higher education graduates is significantly associated with the graduate’s age, participation in internship programs, field of study, type of university, and gender. Specifically, graduates who participated in internship programs, engineering graduates, and graduates from private universities experience a smooth transition. Full article

(This article belongs to the Section Survival Analysis)

22 pages, 493 KiB

Open AccessArticle

Optimal Estimators of Cross-Partial Derivatives and Surrogates of Functions

by Matieyendou Lamboni

Stats 2024, 7(3), 697-718; https://doi.org/10.3390/stats7030042 - 14 Jul 2024

Abstract

Computing cross-partial derivatives using fewer model runs is relevant in modeling, such as stochastic approximation, derivative-based ANOVA, exploring complex models, and active subspaces. This paper introduces surrogates of all the cross-partial derivatives of functions by evaluating such functions at N randomized points and [...] Read more.

Computing cross-partial derivatives using fewer model runs is relevant in modeling, such as stochastic approximation, derivative-based ANOVA, exploring complex models, and active subspaces. This paper introduces surrogates of all the cross-partial derivatives of functions by evaluating such functions at N randomized points and using a set of L constraints. Randomized points rely on independent, central, and symmetric variables. The associated estimators, based on

N L

model runs, reach the optimal rates of convergence (i.e.,

O (N^{- 1})

), and the biases of our approximations do not suffer from the curse of dimensionality for a wide class of functions. Such results are used for (i) computing the main and upper bounds of sensitivity indices, and (ii) deriving emulators of simulators or surrogates of functions thanks to the derivative-based ANOVA. Simulations are presented to show the accuracy of our emulators and estimators of sensitivity indices. The plug-in estimates of indices using the U-statistics of one sample are numerically much stable. Full article

(This article belongs to the Section Statistical Methods)

► Show Figures

Figure 1

12 pages, 380 KiB

Open AccessCase Report

Neurodevelopmental Impairments Prediction in Premature Infants Based on Clinical Data and Machine Learning Techniques

by Arantxa Ortega-Leon, Arnaud Gucciardi, Antonio Segado-Arenas, Isabel Benavente-Fernández, Daniel Urda and Ignacio J. Turias

Stats 2024, 7(3), 685-696; https://doi.org/10.3390/stats7030041 - 12 Jul 2024

Abstract

Preterm infants are prone to NeuroDevelopmental Impairment (NDI). Some previous works have identified clinical variables that can be potential predictors of NDI. However, machine learning (ML)-based models still present low predictive capabilities when addressing this problem. This work attempts to evaluate the application [...] Read more.

Preterm infants are prone to NeuroDevelopmental Impairment (NDI). Some previous works have identified clinical variables that can be potential predictors of NDI. However, machine learning (ML)-based models still present low predictive capabilities when addressing this problem. This work attempts to evaluate the application of ML techniques to predict NDI using clinical data from a cohort of very preterm infants recruited at birth and assessed at 2 years of age. Six different classification models were assessed, using all features, clinician-selected features, and mutual information feature selection. The best results were obtained by ML models trained using mutual information-selected features and employing oversampling, for cognitive and motor impairment prediction, while for language impairment prediction the best setting was clinician-selected features. Although the performance indicators in this local cohort are consistent with similar previous works and still rather poor. This is a clear indication that, in order to obtain better performance rates, further analysis and methods should be considered, and other types of data should be taken into account together with the clinical variables. Full article

► Show Figures

Figure 1

14 pages, 2280 KiB

Open AccessCase Report

Estimator Comparison for the Prediction of Election Results

by Miltiadis S. Chalikias, Georgios X. Papageorgiou and Dimitrios P. Zarogiannis

Stats 2024, 7(3), 671-684; https://doi.org/10.3390/stats7030040 - 1 Jul 2024

Abstract

Cluster randomized experiments and estimator comparisons are well-documented topics. In this paper, using the datasets of the popular vote in the presidential elections of the United States of America (2012, 2016, 2020), we evaluate the properties (SE, MSE) of three cluster sampling estimators: [...] Read more.

Cluster randomized experiments and estimator comparisons are well-documented topics. In this paper, using the datasets of the popular vote in the presidential elections of the United States of America (2012, 2016, 2020), we evaluate the properties (SE, MSE) of three cluster sampling estimators: Ratio estimator, Horvitz–Thompson estimator and the linear regression estimator. While both the Ratio and Horvitz–Thompson estimators are widely used in cluster analysis, we propose a linear regression estimator defined for unequal cluster sizes, which, in many scenarios, performs better than the other two. The main objective of this paper is twofold. Firstly, to indicate which estimator is most suited for predicting the outcome of the popular vote in the United States of America. We do so by applying the single-stage cluster sampling technique to our data. In the first partition, we use the 50 states plus the District of Columbia as primary sampling units, whereas in the second one, we use 3112 counties instead. Secondly, based on the results of the aforementioned procedure, we estimate the number of clusters in a sample for a set standard error while also considering the diminishing returns from increasing the number of clusters in the sample. The linear regression estimator is best in the majority of the examined cases. This type of comparison can also be used for the estimation of any other country’s elections if prior voting results are available. Full article

(This article belongs to the Special Issue Statistical Learning for High-Dimensional Data)

► Show Figures

Figure 1

24 pages, 713 KiB

Open AccessArticle

Hierarchical Time Series Forecasting of Fire Spots in Brazil: A Comprehensive Approach

by Ana Caroline Pinheiro and Paulo Canas Rodrigues

Stats 2024, 7(3), 647-670; https://doi.org/10.3390/stats7030039 - 27 Jun 2024

Abstract

This study compares reconciliation techniques and base forecast methods to forecast a hierarchical time series of the number of fire spots in Brazil between 2011 and 2022. A three-level hierarchical time series was considered, comprising fire spots in Brazil, disaggregated by biome, and [...] Read more.

This study compares reconciliation techniques and base forecast methods to forecast a hierarchical time series of the number of fire spots in Brazil between 2011 and 2022. A three-level hierarchical time series was considered, comprising fire spots in Brazil, disaggregated by biome, and further disaggregated by the municipality. The autoregressive integrated moving average (ARIMA), the exponential smoothing (ETS), and the Prophet models were tested for baseline forecasts, and nine reconciliation approaches, including top-down, bottom-up, middle-out, and optimal combination methods, were considered to ensure coherence in the forecasts. Due to the need for transformation to ensure positive forecasts, two data transformations were considered: the logarithm of the number of fire spots plus one and the square root of the number of fire spots plus 0.5. To assess forecast accuracy, the data were split into training data for estimating model parameters and test data for evaluating forecast accuracy. The results show that the ARIMA model with the logarithmic transformation provides overall better forecast accuracy. The BU, MinT(s), and WLS(v) yielded the best results among the reconciliation techniques. Full article

(This article belongs to the Special Issue Modern Time Series Analysis II)

► Show Figures

Figure 1

20 pages, 1027 KiB

Open AccessArticle

Impact of Brexit on STOXX Europe 600 Constituents: A Complex Network Analysis

by Anna Maria D’Arcangelis, Arianna Pierdomenico and Giulia Rotundo

Stats 2024, 7(3), 627-646; https://doi.org/10.3390/stats7030038 - 27 Jun 2024

Abstract

Political events play a significant role in exerting their influence on financial markets globally. This paper aims to investigate the long term effect of Brexit on European stock markets using Complex Network methods as a starting point. The media has heavily emphasized the [...] Read more.

Political events play a significant role in exerting their influence on financial markets globally. This paper aims to investigate the long term effect of Brexit on European stock markets using Complex Network methods as a starting point. The media has heavily emphasized the connection between this major political event and its economic and financial impact. To analyse this, we created two samples of companies based on the geographical allocation of their revenues to the UK. The first sample consists of companies that are either British or financially linked to the United Kingdom. The second sample serves as a control group and includes other European companies that are conveniently matched in terms of economic sector and firm size to those in the first sample. Each analysis is repeated over three non-overlapping periods: before the 2016 Referendum, between the Referendum and the 2019 General Elections, and after the 2019 General Elections. After an event study aimed at verifying the short-term response of idiosyncratic daily returns to the referendum result, we analysed the topological evolution of the networks through the MST (Minimum Spanning Trees) of the various samples. Finally, after the computation of the centrality measures pertaining to each network, our attention was directed towards the examination of the persistence of the levels of degree and eigenvector centralities over time. Our target was the investigation on whether the events that determined the evolution of the MST had also brought about structural modifications to the centrality of the most connected companies within the network. The findings demonstrate the unexpected impact of the referendum outcome, which is more noticeable on European equities compared to those of the UK, and the lack of influence from the elections that marked the beginning of the hard Brexit phase in 2019. The modifications in the MST indicate a restructuring of the network of British companies, particularly evident in the third period with a repositioning of the UK nodes. The dynamics of the MSTs around the referendum date is associated with the persistence in the relative rank of the centrality measures (relative to the median). Conversely, the arrival of hard Brexit does alter the relative ranking of the nodes in accord to the the degree centrality. The ranking in accord to the eigenvector centrality keeps the persistence. However, such movements are not statistically significant. An analysis of this kind points out relevant insights for investors, as it equips them to have a comprehensive view of political events, while also assisting policymakers in their endeavour to uphold stability by closely monitoring the ever-changing influence and interconnectedness of global stock markets during similar political events. Full article

(This article belongs to the Section Financial Statistics)

► Show Figures

Figure 1

14 pages, 1744 KiB

Open AccessCase Report

Investigating Risk Factors for Racial Disparity in E-Cigarette Use with PATH Study

by Amy Liu, Kennedy Dorsey, Almetra Granger, Ty-Runet Bryant, Tung-Sung Tseng, Michael Celestin, Jr. and Qingzhao Yu

Stats 2024, 7(3), 613-626; https://doi.org/10.3390/stats7030037 - 21 Jun 2024

Abstract

Background: Previous research has identified differences in e-cigarette use and socioeconomic factors between different racial groups However, there is little research examining specific risk factors contributing to the racial differences. Objective: This study sought to identify racial disparities in e-cigarette use and to [...] Read more.

Background: Previous research has identified differences in e-cigarette use and socioeconomic factors between different racial groups However, there is little research examining specific risk factors contributing to the racial differences. Objective: This study sought to identify racial disparities in e-cigarette use and to determine risk factors that help explain these differences. Methods: We used Wave 5 (2018–2019) of the Adult Population Assessment of Tobacco and Health (PATH) Study. First, we conducted descriptive statistics of e-smoking across our risk factor variables. Next, we used multiple logistic regression to check the risk effects by adjusting all covariates. Finally, we conducted a mediation analysis to determine whether identified factors showed evidence of influencing the association between race and e-cigarette use. All analyses were performed in R or SAS. The R package mma was used for the mediation analysis. Results: Between Hispanic and non-Hispanic White populations, our potential risk factors collectively explain 17.5% of the racial difference, former cigarette smoking explains 7.6%, receiving e-cigarette advertising 2.6%, and perception of e-cigarette harm explains 27.8% of the racial difference. Between non-Hispanic Black and non-Hispanic White populations, former cigarette smoking, receiving e-cigarette advertising, and perception of e-cigarette harm explain 5.2%, 1.8%, and 6.8% of the racial difference, respectively. E-cigarette use is most prevalent in the non-Hispanic White population compared to non-Hispanic Black and Hispanic populations, which may be explained by former cigarette smoking, exposure to e-cigarette advertising, and e-cigarette harm perception. Conclusions: These findings suggest that racial differences in e-cigarette use may be reduced by increasing knowledge of the dangers associated with e-cigarette use and reducing exposure to e-cigarette advertisements. This comprehensive analysis of risk factors can be used to significantly guide smoking cessation efforts and address potential health burden disparities arising from differences in e-cigarette usage. Full article

► Show Figures

Figure 1

21 pages, 462 KiB

Open AccessArticle

Estimation of Standard Error, Linking Error, and Total Error for Robust and Nonrobust Linking Methods in the Two-Parameter Logistic Model

by Alexander Robitzsch

Stats 2024, 7(3), 592-612; https://doi.org/10.3390/stats7030036 - 21 Jun 2024

Abstract

The two-parameter logistic (2PL) item response theory model is a statistical model for analyzing multivariate binary data. In this article, two groups are brought onto a common metric using the 2PL model using linking methods. The linking methods of mean–mean linking, mean–geometric–mean linking, [...] Read more.

The two-parameter logistic (2PL) item response theory model is a statistical model for analyzing multivariate binary data. In this article, two groups are brought onto a common metric using the 2PL model using linking methods. The linking methods of mean–mean linking, mean–geometric–mean linking, and Haebara linking are investigated in nonrobust and robust specifications in the presence of differential item functioning (DIF). M-estimation theory is applied to derive linking errors for the studied linking methods. However, estimated linking errors are prone to sampling error in estimated item parameters, thus resulting in artificially increased the linking error estimates in finite samples. For this reason, a bias-corrected linking error estimate is proposed. The usefulness of the modified linking error estimate is demonstrated in a simulation study. It is shown that a simultaneous assessment of the standard error and linking error in a total error must be conducted to obtain valid statistical inference. In the computation of the total error, using the bias-corrected linking error estimate instead of the usually employed linking error provides more accurate coverage rates. Full article

(This article belongs to the Special Issue Robust Statistics in Action II)

16 pages, 476 KiB

Open AccessArticle

A Comparison of Limited Information Estimation Methods for the Two-Parameter Normal-Ogive Model with Locally Dependent Items

by Alexander Robitzsch

Stats 2024, 7(3), 576-591; https://doi.org/10.3390/stats7030035 - 21 Jun 2024

Abstract

The two-parameter normal-ogive (2PNO) model is one of the most popular item response theory (IRT) models for analyzing dichotomous items. Consistent parameter estimation of the 2PNO model using marginal maximum likelihood estimation relies on the local independence assumption. However, the assumption of local [...] Read more.

The two-parameter normal-ogive (2PNO) model is one of the most popular item response theory (IRT) models for analyzing dichotomous items. Consistent parameter estimation of the 2PNO model using marginal maximum likelihood estimation relies on the local independence assumption. However, the assumption of local independence might be violated in practice. Likelihood-based estimation of the local dependence structure is often computationally demanding. Moreover, many IRT models that model local dependence do not have a marginal interpretation of item parameters. In this article, limited information estimation methods are reviewed that allow the convenient and straightforward handling of local dependence in estimating the 2PNO model. In detail, pairwise likelihood, weighted least squares, and normal-ogive harmonic analysis robust method (NOHARM) estimation are compared with marginal maximum likelihood estimation that ignores local dependence. A simulation study revealed that item parameters can be consistently estimated with limited information methods. At the same time, marginal maximum likelihood estimation resulted in biased item parameter estimates in the presence of local dependence. From a practical perspective, there were only minor differences regarding the statistical quality of item parameter estimates of the different estimation methods. Differences between the estimation methods are also compared for two empirical datasets. Full article

(This article belongs to the Special Issue Statistics, Analytics, and Inferences for Discrete Data)

► Show Figures

Figure 1

27 pages, 1193 KiB

Open AccessArticle

Assessing Spillover Effects of Medications for Opioid Use Disorder on HIV Risk Behaviors among a Network of People Who Inject Drugs

by Joseph Puleo, Ashley Buchanan, Natallia Katenka, M. Elizabeth Halloran, Samuel R. Friedman and Georgios Nikolopoulos

Stats 2024, 7(2), 549-575; https://doi.org/10.3390/stats7020034 - 19 Jun 2024

Abstract

People who inject drugs (PWID) have an increased risk of HIV infection partly due to injection behaviors often related to opioid use. Medications for opioid use disorder (MOUD) have been shown to reduce HIV infection risk, possibly by reducing injection risk behaviors. MOUD [...] Read more.

People who inject drugs (PWID) have an increased risk of HIV infection partly due to injection behaviors often related to opioid use. Medications for opioid use disorder (MOUD) have been shown to reduce HIV infection risk, possibly by reducing injection risk behaviors. MOUD may benefit individuals who do not receive it themselves but are connected through social, sexual, or drug use networks with individuals who are treated. This is known as spillover. Valid estimation of spillover in network studies requires considering the network’s community structure. Communities are groups of densely connected individuals with sparse connections to other groups. We analyzed a network of 277 PWID and their contacts from the Transmission Reduction Intervention Project. We assessed the effect of MOUD on reductions in injection risk behaviors and the possible benefit for network contacts of participants treated with MOUD. We identified communities using modularity-based methods and employed inverse probability weighting with community-level propensity scores to adjust for measured confounding. We found that MOUD may have beneficial spillover effects on reducing injection risk behaviors. The magnitudes of estimated effects were sensitive to the community detection method. Careful consideration should be paid to the significance of community structure in network studies evaluating spillover. Full article

(This article belongs to the Section Statistical Methods)

► Show Figures

Figure 1

12 pages, 855 KiB

Open AccessPerspective

Redefining Significance: Robustness and Percent Fragility Indices in Biomedical Research

by Thomas F. Heston

Stats 2024, 7(2), 537-548; https://doi.org/10.3390/stats7020033 - 17 Jun 2024

Abstract

The p-value has long been the standard for statistical significance in scientific research, but this binary approach often fails to consider the nuances of statistical power and the potential for large sample sizes to show statistical significance despite trivial treatment effects. Including [...] Read more.

The p-value has long been the standard for statistical significance in scientific research, but this binary approach often fails to consider the nuances of statistical power and the potential for large sample sizes to show statistical significance despite trivial treatment effects. Including a statistical fragility assessment can help overcome these limitations. One common fragility metric is the fragility index, which assesses statistical fragility by incrementally altering the outcome data in the intervention group until the statistical significance flips. The robustness index takes a different approach by maintaining the integrity of the underlying data distribution while examining changes in the p-value as the sample size changes. The percent fragility index is another useful alternative that is more precise than the fragility index and is more uniformly applied to both the intervention and control groups. Incorporating these fragility metrics into routine statistical procedures could address the reproducibility crisis and increase research efficacy. Using these fragility indices can be seen as a step toward a more mature phase of statistical reasoning, where significance is a multi-faceted and contextually informed judgment. Full article

(This article belongs to the Section Biostatistics)

► Show Figures

Figure 1

16 pages, 617 KiB

Open AccessArticle

An Optimal Design through a Compound Criterion for Integrating Extra Preference Information in a Choice Experiment: A Case Study on Moka Ground Coffee

by Rossella Berni, Nedka Dechkova Nikiforova and Patrizia Pinelli

Stats 2024, 7(2), 521-536; https://doi.org/10.3390/stats7020032 - 8 Jun 2024

Abstract

In this manuscript, we propose an innovative approach to studying consumers’ preferences for coffee, which integrates a choice experiment with consumer sensory tests and chemical analyses (caffeine contents obtained through a High-Performance Liquid Chromatography (HPLC) method). The same choice experiment is administered on [...] Read more.

In this manuscript, we propose an innovative approach to studying consumers’ preferences for coffee, which integrates a choice experiment with consumer sensory tests and chemical analyses (caffeine contents obtained through a High-Performance Liquid Chromatography (HPLC) method). The same choice experiment is administered on two consecutive occasions, i.e., before and after the guided tasting session, to analyze the role of tasting and awareness about coffee composition in the consumers’ preferences. To this end, a Bayesian optimal design, based on a compound design criterion, is applied in order to build the choice experiment; the compound criterion allows for addressing two main issues related to the efficient estimation of the attributes and the evaluation of the sensorial part, e.g., the HPLC effects and the scores obtained through the consumer sensory test. All these elements, e.g., the attributes involved in the choice experiment, the scores obtained for each coffee through the sensory tests, and the HPLC quantitative evaluation of caffeine, are analyzed through suitable Random Utility Models. The initial results are promising, confirming the validity of the proposed approach. Full article

► Show Figures

Figure 1

13 pages, 947 KiB

Open AccessArticle

A Spatial Gaussian-Process Boosting Analysis of Socioeconomic Disparities in Wait-Listing of End-Stage Kidney Disease Patients across the United States

by Sounak Chakraborty, Tanujit Dey, Lingwei Xiang and Joel T. Adler

Stats 2024, 7(2), 508-520; https://doi.org/10.3390/stats7020031 - 7 Jun 2024

Abstract

In this study, we employed a novel approach of combining Gaussian processes (GPs) with boosting techniques to model the spatial variability inherent in End-Stage Kidney Disease (ESKD) data. Our use of the Gaussian processes boosting, or GPBoost, methodology underscores the efficacy of this [...] Read more.

In this study, we employed a novel approach of combining Gaussian processes (GPs) with boosting techniques to model the spatial variability inherent in End-Stage Kidney Disease (ESKD) data. Our use of the Gaussian processes boosting, or GPBoost, methodology underscores the efficacy of this hybrid method in capturing intricate spatial dynamics and enhancing predictive accuracy. Specifically, our analysis demonstrates a notable improvement in out-of-sample prediction accuracy regarding the percentage of the population remaining on the wait list within geographic regions. Furthermore, our investigation unveils race and gender-based factors that significantly influence patient wait-listing. By leveraging the GPBoost approach, we identify these pertinent factors, shedding light on the complex interplay between demographic variables and access to kidney transplantation services. Our findings underscore the imperative for a multifaceted strategy aimed at reducing spatial disparities in kidney transplant wait-listing. Key components of such an approach include mitigating gender disparities, bolstering access to healthcare services, fostering greater awareness of transplantation options, and dismantling structural barriers to care. By addressing these multifactorial challenges, we can strive towards a more equitable and inclusive landscape in kidney transplantation. Full article

(This article belongs to the Special Issue Bayes and Empirical Bayes Inference)

► Show Figures

Figure 1

17 pages, 3974 KiB

Open AccessArticle

Residual Analysis for Poisson-Exponentiated Weibull Regression Models with Cure Fraction

by Cleanderson R. Fidelis, Edwin M. M. Ortega and Gauss M. Cordeiro

Stats 2024, 7(2), 492-507; https://doi.org/10.3390/stats7020030 - 20 May 2024

Abstract

The use of cure-rate survival models has grown in recent years. Even so, proposals to perform the goodness of fit of these models have not been so frequent. However, residual analysis can be used to check the adequacy of a fitted regression model. [...] Read more.

The use of cure-rate survival models has grown in recent years. Even so, proposals to perform the goodness of fit of these models have not been so frequent. However, residual analysis can be used to check the adequacy of a fitted regression model. In this context, we provide Cox–Snell residuals for Poisson-exponentiated Weibull regression with cure fraction. We developed several simulations under different scenarios for studying the distributions of these residuals. They were applied to a melanoma dataset for illustrative purposes. Full article

► Show Figures

Figure 1

11 pages, 1044 KiB

Open AccessCase Report

Testing for Level–Degree Interaction Effects in Two-Factor Fixed-Effects ANOVA When the Levels of Only One Factor Are Ordered

by J. C. W. Rayner and G. C. Livingston, Jr.

Stats 2024, 7(2), 481-491; https://doi.org/10.3390/stats7020029 - 15 May 2024

Abstract

In testing for main effects, the use of orthogonal contrasts for balanced designs with the factor levels not ordered is well known. Here, we consider two-factor fixed-effects ANOVA with the levels of one factor ordered and one not ordered. The objective is to [...] Read more.

In testing for main effects, the use of orthogonal contrasts for balanced designs with the factor levels not ordered is well known. Here, we consider two-factor fixed-effects ANOVA with the levels of one factor ordered and one not ordered. The objective is to extend the idea of decomposing the main effect to decomposing the interaction. This is achieved by defining level–degree coefficients and testing if they are zero using permutation testing. These tests give clear insights into what may be causing a significant interaction, even for the unbalanced model. Full article

(This article belongs to the Section Statistical Methods)

► Show Figures

Figure 1

Journal Description

Latest Articles

Journal Menu

Journal Browser

Highly Accessed Articles

Latest Books

E-Mail Alert

News

Topics

Conferences

Special Issues

Further Information

Guidelines

MDPI Initiatives

Follow MDPI