0

I'm a newbie to SQL with a large dataset I need to manipulate. I've used Excel to analyze a small piece of this data, but now I need to look at the whole thing. I have imported it into SQL Server Management Studio 2016 and can view all of the tables using the "First 1000 rows" feature, but that's really all I know how to do.

There are only two tables that I care about. Each one has 100 million rows. They share a single overlapping column-let's call it, EventID. Each table depicts what occurred during the 'event.' So Table 1 has an entry for EventID #100001, #100002, etc., and Table 2 has an entry for EventID #100001, #100002, etc.

So it would be much simpler to have one big table, but c'est la vie...

I am trying to figure out two things:

1) Where do I go within SQL Server Management Studio 2016 to type in a query to ask questions of the database? (Ultimately, I'm looking to perform a consecutive values analysis, but I may need to resort the information before doing so)

2) How do I deal with the issue of having two different tables? (Should I / Can I create another table that combines the two tables I currently have? Or should I deal with this issue in the query that I write?)

Thank you in advance for any help.

1
  • You find the SQL Query Pane, type in your query then press the red exclamation mark.
    – Dave
    Commented Oct 7, 2016 at 7:58

2 Answers 2

0

I will provide an overview of pointers to get you started and refer you to other online tutorials. Then you should ask more specific questions as you encounter problems.

1. Import your data and set a primary key

You already imported your data, but here are more specific steps. If each row has a unique EventID in each of your table, then that would be your primary key (important later), in which case you don't need "enable identity insert" to generate a new one.

Make sure your table is configured so that EventID is set as the primary key in each table, in each table. Instructions here. If there are many rows with the same EventID, things will be more complicated, so I'm assuming they are unique.

2. Define relationships

Define relationships between the 2 tables, by linking them on their primary keys (EventID). Pick a one table as the main one, and the primary key of the second table becomes the foreign key to the first. If there's a 1:1 match between EventIDs, then that's the simplest case to work with.

See tutorial here which shows both the GUI and SQL methods.

3. Use the Query Designer

This is a tool to help you create a SQL query that will join (connect, or link) both tables and show the result. Follow-up tutorial here. Since you have defined the relationships already, the query designer will know how to join them.

In a query, you can pick the columns from each joined tables you want to see, and the query will show them combined as one. You can add filters ("where" criteria) and sorting ("order by") here too.

4. Create a view out of the query you made.

A view is a database term to mean a query that is saved and can be read as if it was a real table, although behind the scenes it's still just the SQL query you created, joining both tables as one. That should answer your second question.

Follow-up tutorial here.

5. Experiment

Once you get familiar with this, you may want to learn SQL more in depth. There's a ton of resources online. And in fact, in SQL steps 2-4 could be replaced by just this:

create view MyBigTableView as
select * from table1 
inner join table2 
on table1.EventID = table2.EventID

Addendum

If you have more than 1 column as the primary key (the thing that uniquely identifies every single row), that is called a compound or composite key. It's easy to define this type of primary key in the table designer (see stackoverflow answer here), and that will be used to define the relationships, as well as eventually for creating indexes for better query performance (not covered here).

In SQL (whether you have defined the primary key or not) an inner join on 2 tables that have both EventID and SubEventID would look like this:

select * from table1 
inner join table2
on table1.EventID = table2.EventID 
and table1.SubEventID = table2.SubEventID
4
  • mtone - thanks, and sorry for the generic question, but sometimes it's information overload when all you need is to figure out how to get to the starting line. I actually do have a situation where the EventID is not totally unique on one of the tables. But I have a column that's basically SubEventID, where if they both match, I'll always be looking at the same thing. Can you show me how to do an inner join so my output is only entries where EventID + SubEventID match up?
    – Zero Cool
    Commented Oct 7, 2016 at 20:57
  • @ZeroCool Added a paragraph in the answer. Hope that helps.
    – mtone
    Commented Oct 7, 2016 at 23:56
  • That worked, thank you! Okay, seriously last question - what if I don't want to display the entire table2, just a couple of the columns? (I actually don't need EventID and SubEventID, but I assume it's a good idea to keep them to verify the overlap)
    – Zero Cool
    Commented Oct 8, 2016 at 16:52
  • @ZeroCool select * means all fields (columns). Replace * by those you want, prefixed by their tablename to avoid ambiguities, and optionally give them a new name: select table1.col1, table2.field, table3.field1 as newname, etc... You can also do select table1.*, table2.field1, ...
    – mtone
    Commented Oct 8, 2016 at 21:26
0

The intersect operator would be easier to use in this instance.

select * from table1
intersect
select * from table2

This will yield only the records that are common to both tables. Here is a handy explanation for the intersect, union, and except operators.

1
  • Thanks for the suggestion. When I tried this, it said couldn't be performed because the tables have a different number of values - currently trying @mtone's method
    – Zero Cool
    Commented Oct 8, 2016 at 16:11

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .