49

I think it would be useful if we could join across databases in the data-explorer.

For example, I'd like to query up a list of questions that I had a hand in migrating away from SO, and see what percentage of them were subsequently closed or re-migrated.

I realise I could download the various data dumps and do this on my own machine, but it would be nice if it were allowed on the online version.

Is this possible/practical?

2

3 Answers 3

36

You can already do this if you want to do a bit of light hacking in your queries. And, of course, you'll have to roll your own logic to do what you're specifically asking for.

SELECT * FROM sys.databases

(All SEDE activity is IP-logged, so please don't go poking into the system databases.)

When you switch sites in SEDE, all the web app does is direct the query to the appropriate database. Since all the databases are accessible through any connection, you have access to all the data at any time -- you just have to 3-part-name the database objects.

SELECT TOP 1 *
    FROM [StackOverflow]..[Users] u1
    INNER JOIN [ServerFault]..[Users] u2 ON u1.AccountId = u2.AccountId
1
  • 1
    Could this query be modified to search tags across every SE site?
    – Stevoisiak
    Commented Jul 28, 2017 at 20:42
17

Automatic cross-site queries have been made possible by senshin in this query (mind you, large queries will timeout) (current version https://data.stackexchange.com/space/query/355774/cross-site-query-template by Mark Hurd):

-- Create cursor for list of sites
DECLARE sites CURSOR FOR
  SELECT name
  FROM sys.databases
  WHERE name NOT IN ('master', 'tempdb', 'model', 'msdb', 'Data.StackExchange')
-- And declare some variables
DECLARE @sitedbname AS nvarchar(max)
DECLARE @sitehostname AS nvarchar(max)
DECLARE @ispersitemeta AS bit

DECLARE @query AS nvarchar(max)
CREATE TABLE #out (
  Site nvarchar(max) NOT NULL,
  -- ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓
  -- COLUMN NAMES YOU WANT TO ADD SHOULD GO HERE   
  [User Count] int NOT NULL
  -- ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
)
-- These variables are for the SPOOKY HOSTNAME GENERATION CODE
DECLARE @spooky_string AS nvarchar(max)
DECLARE @spooky_delimiter AS char(1) = '.'
DECLARE @spooky_xml AS xml
DECLARE @spooky_result AS nvarchar(max)

-- Step through cursor
OPEN sites
FETCH NEXT FROM sites INTO @sitedbname
WHILE @@FETCH_STATUS = 0
BEGIN
  -----------------------------------------------------------------------------
  -- BEGIN SPOOKY HOSTNAME GENERATION CODE ------------------------------------
  -- adapted from <http://data.stackexchange.com/stackoverflow/query/256747/> -
  -----------------------------------------------------------------------------
  SET @spooky_string = @sitedbname
  SET @spooky_xml = CAST(('<X>' + REPLACE(@spooky_string, @spooky_delimiter, '</X><X>') + '</X>') AS xml)
  SET @spooky_result = ''

  SELECT
    C.value('.', 'nvarchar(max)') AS [Piece],
    C.value('for $i in . return count(../*[. << $i]) + 1', 'int') AS [Index]
  INTO #spooky_pieces
  FROM @spooky_xml.nodes('X') AS X(C)

  SELECT @spooky_result = COALESCE(@spooky_result + '.', '') + [Piece]
  FROM #spooky_pieces
  ORDER BY [Index] DESC

  DROP TABLE #spooky_pieces

  SET @sitehostname = 'http://' + RIGHT(@spooky_result, LEN(@spooky_result)-1) + '.com'
  SET @ispersitemeta = (CASE WHEN @sitedbname LIKE '%Meta%' AND @sitedbname != 'StackExchange.Meta' THEN 1 ELSE 0 END)

  ----------------------------------------
  -- HERE COME THE SPOOKY SPECIAL CASES --
  ----------------------------------------
  -- Meta MathOverflow doesn't have a redirect; see <http://meta.stackexchange.com/q/215071/224428>
  IF @sitedbname = 'StackExchange.Mathoverflow.Meta' SET @sitehostname = 'http://Meta.MathOverflow.net'
  -- For some reason probably involving the AVP/Audio/Video/Sound hullabaloo, there is 
  -- still a StackExchange.Audio DB that's getting updated. http://audio.stackexchange.com/
  -- no longer exists, so we use Video.SE for the hostname instead.
  IF @sitedbname = 'StackExchange.Audio' SET @sitehostname = 'http://Video.StackExchange.com'
  -----------------------------------------------------------------------------
  -- END SPOOKY HOSTNAME GENERATION CODE --------------------------------------
  -----------------------------------------------------------------------------

  -- ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓
  -- CODE YOU WANT TO ADD SHOULD GO HERE
  -- for example,
  SET @query = '
    USE [' + @sitedbname + ']

    INSERT INTO #out
    SELECT
      ''' + @sitehostname + '|' + @sitedbname + ''' AS [Site],
      (Select Count(*) From Users) As [User Count]
    '
  EXEC sp_executesql @query
  -- ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑

  FETCH NEXT FROM sites INTO @sitedbname
END
CLOSE sites
DEALLOCATE sites

-- Reap results (also optional)
SELECT
  *
FROM
  #out
2
  • 1
    Based upon the SQL posted here, not the query linked, I have "simplified" the query and highlighted the other place that needs changing when selecting some columns per site, on SEDE.
    – Mark Hurd
    Commented Aug 27, 2015 at 3:46
  • Can you give an example selecting multiple values per-site? It's unclear from this example if it's possible without repeating the select, since the "as" is outside the select.
    – Cascabel
    Commented Apr 18, 2017 at 21:34
8

Here's an improved cross-site version.

  • Performs up to 2 times faster.
  • Uses newer TSQL features for shorter, easier to understand code.
  • Gives more natural site names.
  • Provides https URL's.
  • Catches the oversight with StackExchange.Audio.Meta.

See the query in SEDE.

Or here's the code:

/*-- INSTRUCTIONS:
    1)  Set the columns of #AllSiteResults to what you need in the final query.
    2)  Set the @seSiteQuery text (inside the WHILE loop) to the query that will run on each site to build
        the #AllSiteResults table.
    3)  Comment out the `WHERE       (dadn.dbName = 'StackExchange.Meta'...` line if site metas are desired.
    4)  Adjust the final query if post processing is desired (optional).
*/
DECLARE @seDbName       AS NVARCHAR (max)
DECLARE @seSiteURL      AS NVARCHAR (max)
DECLARE @sitePrettyName AS NVARCHAR (max)
DECLARE @seSiteQuery    AS NVARCHAR (max)

CREATE TABLE #AllSiteResults (
      -- PUT THE COLUMNS YOU WILL USE, HERE
      -- vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
      [Site]            NVARCHAR(max)
      , [User Count]    INT
      -- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)

DECLARE seSites_crsr CURSOR FOR
WITH dbsAndDomainNames AS (
    SELECT      dbL.dbName
                , STRING_AGG (dbL.domainPieces, '.')    AS siteDomain
    FROM (
        SELECT      TOP 50000   -- Never be that many sites and TOP is needed for order by, below
                    name        AS dbName
                    , value     AS domainPieces
                    , row_number ()  OVER (ORDER BY (SELECT 0)) AS [rowN]
        FROM        sys.databases
        CROSS APPLY STRING_SPLIT (name, '.')
        WHERE       CASE    WHEN state_desc = 'ONLINE'
                            THEN OBJECT_ID (QUOTENAME (name) + '.[dbo].[PostNotices]', 'U') -- Pick a table unique to SE data
                    END
                    IS NOT NULL
        ORDER BY    dbName, [rowN] DESC
    ) AS dbL
    GROUP BY    dbL.dbName
)
SELECT      REPLACE (REPLACE (dadn.dbName, 'StackExchange.', ''), '.', ' ' )  AS [Site Name]
            , dadn.dbName
            , CASE  -- See https://meta.stackexchange.com/q/215071
                    WHEN dadn.dbName = 'StackExchange.Mathoverflow.Meta'
                    THEN 'https://meta.mathoverflow.net/'
                    -- Some AVP/Audio/Video/Sound kerfuffle?
                    WHEN dadn.dbName = 'StackExchange.Audio'
                    THEN 'https://video.stackexchange.com/'
                    -- Ditto
                    WHEN dadn.dbName = 'StackExchange.Audio.Meta'
                    THEN 'https://video.meta.stackexchange.com/'
                    -- Normal site
                    ELSE 'https://' + LOWER (siteDomain) + '.com/'
            END AS siteURL
FROM        dbsAndDomainNames dadn
WHERE       (dadn.dbName = 'StackExchange.Meta'  OR  dadn.dbName NOT LIKE '%Meta%')

-- Step through cursor
OPEN    seSites_crsr
FETCH   NEXT FROM seSites_crsr INTO @sitePrettyName, @seDbName, @seSiteURL
WHILE   @@FETCH_STATUS = 0
BEGIN
    -- QUERY THAT YOU WANT TO RUN ON EACH SITE, GOES HERE
    -- For example:
    -- vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
    SET @seSiteQuery = '
        USE [' + @seDbName + ']

        INSERT INTO #AllSiteResults
        SELECT
            ''' + @seSiteURL + '|' + @sitePrettyName + '''  AS [Site],  -- Creates a link
            (SELECT Count(*) FROM Users)                    AS [User Count]
    '
    -- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    EXEC sp_executesql @seSiteQuery

    FETCH NEXT FROM seSites_crsr INTO @sitePrettyName, @seDbName, @seSiteURL
END
CLOSE       seSites_crsr
DEALLOCATE  seSites_crsr

-- ADJUST THIS QUERY IF ANY POST PROCESSING IS DESIRED.
SELECT      *
FROM        #AllSiteResults
ORDER BY    [User Count] DESC, [Site]

Not the answer you're looking for? Browse other questions tagged .