35

I went over the Archive site where I could retrieve data from Astronomy.SE. After I downloaded the content, I found 8 XML files that came maybe from a dump process.

Now I am not able to assemble all the data inside a database to perform query over it. Even if I go to its SEDE site, it is hard to create a good query because I don't know the entity-relationship model.

Is there a place where we could see the ERD of Stack Exchange model? Is the complete diagram composed only by 8 tables? Does someone have used any tool to assembly those 8 tables?

The best post I found was this one but not solved the problem completely: Database schema documentation for the public data dump and SEDE

4
  • 2
    It would help to say in what way the post on database schema is incomplete.
    – user259867
    Commented Feb 28, 2015 at 19:50
  • it is not incomplete. It is not fitting the 8 dumped files. So how can we use this 18 tables post to create a schema of 8 tables dumped?
    – IgorAlves
    Commented Feb 28, 2015 at 19:54
  • 1
    Did the readme help any? Commented Feb 28, 2015 at 21:37
  • Some people used to maintain scripts to convert the data dumps into common database formats (example), but I don't know if any of those are still around.
    – Jeremy
    Commented Mar 1, 2015 at 21:42

2 Answers 2

53

As far as I know, there is no visual representation of the database schema of the Data Explorer maintained, so I think this will be the first one.

This is what I have after a couple of hours for writing a script and fiddling a little bit with draw.io. Remember this is not fully generated, I had to touch up the image to unclutter it a little bit (and I left out all the *Types tables).

ERD of SEDE

Click the image for larger variant.

The cardinalities are OK-ish, but I didn't bother to hunt down all zero/one base ends. And I did use SEDE as my starting-point which is a little bit richer in tables.

You can generate a working-copy that you can import in draw.io by running the following query. If you download the query result as CSV and open that file in your favorite editor, you will see something like this:

"<mxGraphModel dx=""835"" dy=""760"" grid=""1"" gridSize=""10"" guides=""1"" tooltips=""1"" connect=""1"" fold=""1"" page=""1"" pageScale=""1"" pageWidth=""826"" pageHeight=""1169"" style=""default-style2"" math=""0"">
    <root>
        <mxCell id=""0""/>
 .... many more lines skipped
    </root>
</mxGraphModel>"

Remove the first and the last double-quote (") and then do replace all on double-double-quotes to double-quotes. To get a result similar to this:

<mxGraphModel dx="835" dy="760" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" fold="1" page="1" pageScale="1" pageWidth="826" pageHeight="1169" style="default-style2" math="0">
    <root>
        <mxCell id="0"/>
        <mxCell id="1" parent="0"/>
        <mxCell 
            style="swimlane;html=1;childLayout=stackLayout;horizontal=1;startSize=32;horizontalStack=0;fillColor=#008cff;fontColor=#FFFFFF;rounded=1;fontSize=17;fontStyle=0;strokeWidth=2;resizeParent=0;resizeLast=1;shadow=0;dashed=0;align=center;;swimlaneLine=0"
            id="245575913"
            parent="1" 
            vertex="1" 
            value="Users">
                <mxGeometry 
                    as="geometry" 
                    height="392" 
                    width="150" 
                    y="100" 
                    x="2100"/>
        </mxCell>

If you save this file as XML, you can use the import function in draw.io to get the visual representation of that XML. After that, you can choose to Auto-layout (Organic) and then un-clutter some of the lines and entities by moving them around.

By no means this ERD or script is fully complete. Consider it as a starting point. I'm happy to take feature requests and see what I can do in the next couple of weeks.

0
13

The Schema Visualized SEDE ERD The Entity Relationship Diagram (ERD) above was constructed using the data found in the information_schema.columns table. (Thank you to ashleedawg for paring that table further down to its bare essentials). Python 3 was used to convert all the table, column and key information into a SQL script. The ERD was reverse engineered using dbschema.

Please find the script and other related files in my GitHub repository.

The interactive HTML version of the ERD is posted up here: https://sedeschema.github.io/

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .