4

I am trying to pull a list of all users who have commented on an answer and tie it back to their User profile, if one exists. However, not all Comments table records have a UserId associated with them in the Stack Exchange Data Explorer result set. Those that do not have a UserId, have a UserDisplayName field instead. What I am curious about is how the User table information is related to the UserDisplayName field. Are these users who have not created an account on Stack Exchange? If they later create an account, is there a way to tie them back together?

I am aware that you cannot ask new questions without creating an account based on this link, but I am curious if there is a way / how to tie back the anonymous users that answer questions or comment on posts.

To contextualize why I need this, I am interested in exploring the influence of users with the answers on Stack Exchange using comments for academic research.

Here is the query I am running on SEDE:

CREATE TABLE #questions (
QuestionId INT)

CREATE TABLE #answers (
AnswerId INT
,questionId INT)

INSERT #questions
SELECT DISTINCT TOP 1000 Posts.Id
FROM Posts
WHERE PostTypeId = 1


INSERT #answers
SELECT DISTINCT 
    Posts.Id
    ,q.questionId
FROM #questions q
INNER JOIN Posts
    ON Posts.ParentID = q.QuestionId
        AND Posts.PostTypeId = 2


SELECT 'https://stackoverflow.com/questions/' + CAST(a.questionId AS CHAR(20))
    SO_Link_Question
    ,'https://stackoverflow.com/questions/' + CAST(a.answerId AS CHAR(20))
    SO_Link_Answer
    ,UserId
    ,UserDisplayName
FROM Comments c
INNER JOIN #answers a
    ON a.AnswerId = c.PostId
ORDER BY a.questionID, a.answerId

1 Answer 1

8

Yes, users need to register (and have 50 reputation) in order to be able to comment. But if later on their account is deleted, it is removed from the Users table and hence also from various UserId columns, including the table you're looking at. This is also true for questions and answers in the Posts table (the columns there are OwnerUserId resp. OwnerDisplayName).

Another (less frequent) cause of the UserId being absent/NULL is migrated posts. Those will be tied back when the user chooses to create an account on the target site (eventually, keeping in mind that SEDE refreshes once a week, on Sunday morning).

2
  • So is XYZDisplayName randomly generated, or can it be used as a key to tie back to a deleted account? Commented Jan 27, 2023 at 12:58
  • 1
    It's not random, it's tied to the user's ID. See this post and related questions (I forgot the exact location).
    – Glorfindel Mod
    Commented Jan 27, 2023 at 14:43

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .