6
$\begingroup$

I am an MBA student taking courses in Statistics.

I attended a seminar hosted by the Statistics faculty in my university in which students present the research they are doing.

The students had data (e.g. age, gender, race, number of children they have, which university they studied at, what they studied in university, neighborhood they live in, city they live in, province they live in , etc.) on different people from across the country - and they are building models to understand the effect of these different variables on "university graduation rate" (e.g. in the data they have, the student either "graduated" or "dropped out").

In my very naïve understanding of statistics, this kind of problem can be solved using a Logistic Regression Model. Provided that the model is fit properly and the results of the model are statistically significant (e.g. p-values), a Logistic Regression Model should be able to tell you the effect of different combinations of dependent variables on "university graduation rate". As I understand, this is closely related to the Odds Ratio - how much more likely are "students with children living in city A" to graduate university compared to "students without children in city B".

However, the students were instead presenting a type of model called a "Multilevel Regression Model" (I think this also called a "Hierarchical Model").

When I asked the students why they chose a "Multilevel Regression Model" instead of a "Logistic Regression Model", they gave me the following answer:

  • It is a reasonable assumption to believe that students in the same faculty within the same university have more similar graduation rates compared to students in the same university but from different faculties. On the other hand, it is also a reasonable assumption to believe that students in the same neighborhood have similar graduation rates to one another compared to students in different neighborhoods. Apparently a Multilevel Regression Model can potentially address these "within group correlations" whereas a Logistic Regression Model can not. (Some further comments were mentioned about "Fixed Effects", "Mixed Effects" and "Random Effects" - but I could not fully understand these concepts nor understand the relevance of these concepts when comparing the aptness of Logistic Regression vs. Multilevel Logistic Regression.)

While this sounds like a reasonable justification as to why one would prefer a Multilevel Regression Model compared to a Logistic Regression Model - due to my lack of knowledge in Statistics, I have no choice but to blindly accept this justification.

In my opinion, a Logistic Regression Model can provide estimates and help understand the effects of different independent variables on the dependent variable - for example, all students in the same neighborhood should all be subject to the same effect of this neighborhood on the university graduation rate. Therefore, in this regard, it seems to me that both the Logistic Regression Model and the Multilevel Regression Model can take into consideration similarities within groups when estimating the effect of independent variables on the dependent variable.

Thus, why might the students from the seminar chosen a Multilevel Regression Model compared to a Logistic Regression Model - given that both models can take into consideration the effects of similar groups when estimating the effect of independent variables on the dependent variable?

Some Final Notes:

  • Even though a Logistic Regression Model can estimate the effect of independent variables on the dependent variable, a Logistic Regression Model assumes that all observations are independent from one another (I.I.D.), whereas a Multilevel Regression model does not make this assumption (i.e. cluster/group correlation)

  • If this is really the case - why not just fit a whole bunch of individual models for each combination of factors within the independent variables? I understand that this might result in fitting a large number of models, but aren't modern computers strong enough to do this?

$\endgroup$

5 Answers 5

5
$\begingroup$

To address you final comment - "If this is really the case - why not just fit a whole bunch of individual models for each combination of factors within the independent variables? I understand that this might result in fitting a large number of models, but aren't modern computers strong enough to do this?"

It is possible to treat a clustered dataset like this, running an individual model separately for each cluster. However, this approach has several disadvantages. First, fitting a series of individual models instead of using the whole data in a multilevel model, sample size and thus statistical power is drastically reduced. Second, using a multilevel model, you get an overall coefficient that estimates (in a simple case of y ~ x) the average relationship between x and y while addressing the clustering of data. If you run a bunch of individual models, you have a bunch of individual coefficients with no simple way to combine them to gain an understanding of an overall y ~ x relationship in your total dataset. Third, running individual models you don't get information about what kind of role the cluster plays in y ~ x relationship (with a multilevel model you can see whether cluster-specific slope variance is significant and compare different clusters' slopes). Fourth, using a multilevel random slope model you still get cluster-specific coefficients for the relationship between x and y.

$\endgroup$
5
$\begingroup$

As another comment notes - multilevel regression and logistic regression are not mutually exclusive, and in fact it might be that the best model to run is a mutlilevel logit model! In this case, the key issue is whether or not you want to consider "school" as a variable or a unit of analysis.

But first, this question conflates two separate questions about regression modeling that don't have much to do with one another.

The first question is - what "flavor" of regression model should I use, based on the way my dependent variable is measured. Normal "OLS" regression assumes that the dependent variable is continuous and has a more or less normal-ish distribution. But if your dependent variable is binary - for example "did you graduate or not?" then an OLS model isn't really appropriate. Instead you want to use a logit (or logistic) regression model. This type of model is specifically set up to analyze binary dependent variables - it tells you the extent to which a one unit increase in a particular independent variable changes the probability of getting a "1" on the binary dependent variable, holding all other variables constant.

Totally sperate from that entire discussion, we have the question of what to do when you have observations that are "nested" in other groups, and where some variables are at a different "level" than others. So in this case you have students who are nested in schools, and some variables (like gender, or graduation) refer the characteristics of students while other variables (like school size, public/private) refer to characteristics of the school, so that every student who went to school #345 will have identical values for those variables. As you allude to this messes with the assumption that observations are independent which is common to "single level" version of ALL types of regression analysis, including logit, OLS, tobit etc.

Making whatever type of model you are running a "multilevel" model is one particular way to deal with this issue, although there are actually a bunch of different kinds of multilevel models. In this case what the student was probably proposing is a model that allows the intercept term to vary randomly (that is, according to a normal curve) at the school level - what's called a random intercept model. This sort of approach can work on an OLS or a logit model, although the implementation is a bit different in each case. However, there is one feature of this approach that may be problematic, and might be the source of your confusion. In a multilevel model you don't actually ANALYZE the effect of being in one "group" (school in this case) vs the other. The model assumes that, in general, school level intercepts vary randomly around an overall intercept and just estimates the variance of that normal curve using the data available. So it won't actually tell you if students at school A are more likely to graduate at school B, because it's not thinking about school as a variable but as another unit of analysis (like people). You can, however, use a method called empirical bayes estimation to try and figure out which schools do better or worse, based on the model you just ran.

An alternative to a multilevel model would be a "fixed effects" model that just includes a dummy variable for each school. This model treats school like a variable and it will tell you how each school does in comparison to some reference school (although for various reasons the empirical bayes approach I discussed above might be better for this question). But a fixed effects model might end up with big standard errors, and it won't allow you to include school level variables (like school size) that might tell you WHY some schools do better than others.

So to answer the question "what type of model should I use" you need to answer two separate questions:

  1. How is the dependent variable measured? If it's binary, use a logit model. If it's continuous use OLS. If it's something else...use another type of model (ologit, tobit, negative binomial etc)

  2. How are you going to deal with the fact that students are nested in schools? if you want to treat "school" like a variable, you can use a fixed effects model. If you want to include school level variables in the model, then you might explore a random intercept model.

$\endgroup$
4
$\begingroup$

Multilevel models provide us with a few benefits when compared to logistic regression. Here are a few I can think of without being too pedantic:

Shrinkage

Suppose you are estimating the mean/proportion of a response variable per group. Sometimes a group can have a mean too high just because that group has fewer samples on it (just think of thumbs-up reviews on a product that is not very popular on an e-commerce website). You know you can't trust the review number on those instances, so instead of removing them from the analysis, you "shrink" the estimates towards the gran mean/proportion in an "if I don't trust your estimate because of your sample size I consider you close to the gran estimate" rationale.

This shrinking effect will happen for all of the parameters you estimate, ie: per group intercepts and slopes.

Variance components

Whenever we try to fit models to test a statistical hypothesis, the goal is to have as little variance around the estimates of parameters of interest so we can test them, have p-values, etc. Adding covariates, for example, if they help explain part of the variability of the response variable together with the parameters of interest will usually aid in reducing the variability around the estimates of the parameters of interest.

In multilevel models, we get "extra" variance parameters, each for the groups of random slopes/effects and intercepts, since those have distributions tied to them as well. This helps us distribute explainable variance of the response variable toward those new components in a way that allows for that type of justification of "grouped observations are usually similar" meaning they have their own variance component.

Usually less prediction error

Now, this is due to the shrinkage and variance components we have in those models. Usually, since we make the parameters closer to the grand mean, we have less quadratic error overall. The read more about this type of relationship between shrinkage and prediction error, search for Stein's Paradox.

Allow for finding interesting between patterns in groups

Now again due to the shrinkage and individual groups parameters, we can have some discussions like: "even though variable X effect is overall tied to such and such changes in the response variable, we can see that in group A the effect is zero (or even opposite) to the overall trend".

Some further reading

I think you will be interested in reading a bit more about those modeling techniques. And from your question, I think you will be able to follow those texts with no trouble at all. Gelman et al's text has some really nice examples of the type of conclusions you can get from the multilevel models and does a superb job of explaining the math around those. Singer et al's text has some nice intro chapters explaining the why's and how to perform exploratory/descriptive analysis around those Multilevel models.

$\endgroup$
3
$\begingroup$

Standard Logistic Regression

All of the answers here are great. I just wanted to add a visual example because it is often illustrative of why this can matter. Using R as an example in case you may want to look at this yourself, we can load these libraries and data:

#### Load Libraries ####
library(datarium)
library(tidyverse)
library(lmerTest)
    
#### Load Data ####
hdp <- read.csv("https://stats.idre.ucla.edu/stat/data/hdp.csv")
hdp <- hdp %>% 
  mutate(DID = factor(DID)) %>% 
  filter(DID %in% 1:20) %>% 
  as_tibble() 
hdp

Printing the data will look like this, information on many metrics predicting cancer remission:

# A tibble: 454 × 27
   tumorsize   co2  pain wound mobil…¹ ntumors nmorp…² remis…³ lungc…⁴
       <dbl> <dbl> <int> <int>   <int>   <int>   <int>   <int>   <dbl>
 1      68.0  1.53     4     4       2       0       0       0   0.801
 2      64.7  1.68     2     3       2       0       0       0   0.326
 3      51.6  1.53     6     3       2       0       0       0   0.565
 4      86.4  1.45     3     3       2       0       0       0   0.848
 5      53.4  1.57     3     4       2       0       0       0   0.886
 6      51.7  1.42     4     5       2       0       0       0   0.701
 7      78.9  1.71     3     4       2       0       0       0   0.891
 8      69.8  1.53     3     3       3       0       0       0   0.661
 9      62.9  1.54     4     4       3       2       0       0   0.909
10      71.8  1.59     5     4       3       0       0       0   0.959

If we fit a regular logistic regression for the overall effect of a patient's length of stay at a hospital and its predictiveness of cancer remission:

#### Logistic Regression ####
glm.fit <- glm(remission
               ~ LengthofStay,
               data = hdp,
               family = binomial)
summary(glm.fit)

We will see that there is a general effect of cancer remission decrease based on length of stay, though its impact is fairly weak:

Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)   -0.3546     0.5584  -0.635    0.525
LengthofStay  -0.1376     0.1012  -1.359    0.174

This can be visualized with the following plot. A plot that looks like an "S" in its curve is typically strong, whereas a flat or semi-flat line is less predictive of the outcome:

#### Plot Overall Trend ####
hdp %>% 
  ggplot(aes(x=LengthofStay,
             y=remission))+
  geom_point()+
  stat_smooth(method="glm",
              se=FALSE, 
              method.args = list(family=binomial),
              color = "steelblue",
              size = 2)+
  labs(x="Length of Stay",
       y="Cancer Remission",
       title = "Overal Trend of Cancer Remission by Length of Stay")

enter image description here

GLMM Logistic Regression

What if there is annoying noise related to which doctor was seeing each patient? Does it impact the outcome? We can fit a model with a doctor's identifier (DID) as a random intercept to see if there is some variance in outcomes based on this "random noise" from each doctor. We can also fit a random slope for length of stay per doctor, as patients may be with doctors for various lengths of time that differ between doctors, so we can tease this random variance out as well.

glmm.fit <- glmer(remission 
                  ~ LengthofStay
                  + (1 + LengthofStay| DID),
           data = hdp, 
           family = binomial,
           control = glmerControl(optimizer = "bobyqa"))
summary(glmm.fit)

From the summary, we can in fact see some differences...the outcome varies by doctor by more than 2.5 standard deviations and there is a negative intercept-slope correlation. We also see that the effect of the predictor is slightly stronger, though still not statistically significant:

Random effects:
 Groups Name         Variance Std.Dev. Corr 
 DID    (Intercept)  6.7623   2.6004        
        LengthofStay 0.0423   0.2057   -0.64
Number of obs: 454, groups:  DID, 20

Fixed effects:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)   -0.6443     1.0467  -0.616    0.538
LengthofStay  -0.1722     0.1630  -1.056    0.291

If we plot this, we can see some dramatic differences between doctors. Some have had no patients go into remission, some have very little impact on the outcome, some have a strong impact on the outcome (indicated by a strong S shape in the curves), and others had all their patients go into remission (gasp).

#### Plot by Doctor Trend ####
hdp %>% 
  ggplot(aes(x=LengthofStay,
             y=remission))+
  geom_point()+
  stat_smooth(method="glm",
              se=FALSE, 
              method.args = list(family=binomial),
              size = 2,
              color = "steelblue")+
  facet_wrap(~DID)+
  labs(x="Length of Stay",
       y="Cancer Remission",
       title = "By-Doctor Cancer Remission Based on Length of Stay")

enter image description here

By teasing apart the random variance here, we have removed this randomness and figured out more of where the fixed effect is going. Since you are likely learning R for mixed models, you can play around with the other variables in this data and see if its useful for learning. I recommend using the full dataset for that purpose, as I have made some modifications for this example.

A Final Caveat

All of this is to say that GLMMs can provide interesting and better predictions, but that doesn't always mean you should use them or they are always better ways of modeling regressions. There is a great summary article on the potential pitfalls of GLMMs that can be read through here:

On the unnecessary ubiquity of hierarchical linear modeling

Psychol Methods . 2017 Mar;22(1):114-140. doi: 10.1037/met0000078. Epub 2016 May 5. PMID: 27149401 DOI: 10.1037/met0000078

$\endgroup$
2
$\begingroup$

A couple quick points to add on to what has already been said:

  1. Multilevel models are not mutually exclusive from logistic regression. You can have a multilevel logistic regression model.

  2. If you have the ability to do some basic simulation, I would highly recommend it to understand multilevel models (or statistical models in general). This can be done relatively simply in software such as R. The general idea is that you have to set up a model to simulate observed data similar to that you're seeing in the dataset of interest. It forces you to think differently about what parts are systematic ("the fixed effects" you mentioned) and what parts are random ("random effects"). The utility of a multi-level model becomes apparent very quickly

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.