3

I have a large dataset of students and the classes they have taken. Each student has taken 12 to 18 of around 80 available classes. Using Excel (2013) I would like to find out, for any given pair of classes, how many students took both of them. I envisage a table with the 80 classes as both rows and columns, and then for each intersection I would see a count of how many students took that combination.

The data arrive as an Excel file with one row per student per class:

Student  Class
Smith    E101
Jones    E101
Parker   E101
Brown    E102
Green    E102
Smith    E201
Jones    E202
Parker   E201
Brown    E202
Green    E203
...

Envisaged output:

      E101  E102  E201  E202  E203  ...
E101        0     2     1     0
E102  0           0     1     1    
E201  2     0           0     0
E202  1     1     0           0
E203  0     1     0     0
...

(Obviously I only need a diagonal half of the above, as the other half mirrors it.)

I've used a pivot table to get the data into a table with students as rows, and all possible classes as columns, showing a 1 where a student took a given class.

        E101  E102  E201  E202  E203  ...
Smith   1           1           
Jones   1                 1         
Parker  1           1          
Brown         1           1     
Green         1                 1
...

But I'm then stuck on how to proceed, with as little manual intervention as possible, to my desired output.

Can anyone suggest a way to achieve the output I need in Excel? I've done fairly extensive searching but haven't found anything.

Or should I be looking to other software?

2
  • Welcome to Super User! Please note that superuser.com is not a free script/code writing service. If you tell us what you have tried so far (include the scripts/code you are already using) and where you are stuck then we can try to help with specific problems. You should also read How do I ask a good question?.
    – DavidPostill
    Commented Aug 8, 2018 at 14:30
  • 2
    I'm not using any scripts/code. I have got as far as using a pivot table, as above, to go from a flat list to a crosstabulation of student and class. I now do not know whether there is a way of achieving what I want (as per the post) in Excel, despite fairly extensive searching. My question is therefore can I obtain that information within Excel, and if so, how? I don't mean to be obtuse but I don't see what is wrong with my question and would welcome more explanation on that. I am not asking for code, but suggestions of approaches. Commented Aug 10, 2018 at 12:28

1 Answer 1

2

This is quite simple to do in Excel with a formula that operates on your pivot table.

With the two tables set up like this

Worksheet Screenshot showing Tables and Formula

enter the following formula in J2 and ctrl-enter/copy-paste/fill-down&right/auto-fill into the rest of the table's cells:

=
IF(
  J$1=$I2,
  "",
  COUNTIFS(
    INDEX($A$1:$F$6,ROW($A$1:$F$6),MATCH(J$1,$A$1:$F$1,0)):INDEX($A$1:$F$6,1+ROWS($A$1:$F$6)-ROW($A$1:$F$6),MATCH(J$1,$A$1:$F$1,0)),
    1,
    INDEX($A$1:$F$6,ROW($A$1:$F$6),MATCH($I2,$A$1:$F$1,0)):INDEX($A$1:$F$6,1+ROWS($A$1:$F$6)-ROW($A$1:$F$6),MATCH($I2,$A$1:$F$1,0)),
    1
  )
)


Explanation:

The first argument of the COUNTIFS() function is the dynamically generated column of the pivot table corresponding to the column header of the output table. It's a little easier to understand if we look at the intermediate evaluated steps (for cell L2):

INDEX($A$1:$F$6,ROW($A$1:$F$6),MATCH(L$1,$A$1:$F$1,0)):INDEX($A$1:$F$6,1+ROWS($A$1:$F$6)-ROW($A$1:$F$6),MATCH(L$1,$A$1:$F$1,0))
INDEX($A$1:$F$6,1,MATCH("E201",$A$1:$F$1,0)):INDEX($A$1:$F$6,6,MATCH("E201",$A$1:$F$1,0))
INDEX($A$1:$F$6,1,4):INDEX($A$1:$F$6,6,4)
$D$1:$D$6

(Note that the second arguments of each INDEX() are just the fully dynamic start and end rows, respectively, of the pivot table.)

Similarly for the third argument of the COUNTIFS() function, but this time the dynamically generated column of the pivot table corresponds to the row header of the output table. For cell L2 it evaluates to $B$1:$B$6.

Thus the COUNTIFS() function in L2 becomes

COUNTIFS($D$1:$D$6,1,$B$1:$B$6,1)

which is the standard way of counting the number of rows (students) where both columns contain a 1 (i.e. the student was enrolled in both classes).

The encapsulating IF() function is just there to ensure that the diagonal cells are blank.

2
  • 1
    Thank you @robinCTS ! Please accept huge apologies for the time it took to reply. I had some issues with logging in to StackExchange that meant I was not able to respond until now, but I did read and implement the solution. I simply adjusted the ranges for my own data and it works perfectly. I especially appreciate the detailed explanation of each element - it took some time to work through but I understand. I'm very grateful for your time! Commented Aug 22, 2018 at 8:23
  • @ermintrude75 Not a problem. Glad to help! I'm just sorry your question was initially closed and took so long to be re-opened.
    – robinCTS
    Commented Aug 22, 2018 at 8:35

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .