I'm trying to do an assignment problem with the following characteristics:
I have two sets that need to be matched with each other, set A (Students) and set B (Class combinations). Set A and B have the same number of rows (there's 100% matching and no leftovers in either set)
Each row in Set A is a student with multiple categorical attributes (Gender, Age Group, Country of Residence etc). There can be any amount of attributes and each attribute can have any number of options (Age Group can be in bins of 5 or 10 etc).
Each row in Set B is a set of classes that each student in Set A can attend, such as (Physics, Chemistry, Linguistics). Each row has the same amount of classes (3 classes per row for example), and Y amount of classes to choose from. There are no repeating classes within a row, and the positional sequence of each class doesn't matter ([Physics, Chemistry] is the same as [Chemistry, Physics])
What I'm trying to do is match each student in set A to a class combination in set B such that each class in set B is taken by students of equal proportions across all student attribute. For example, if Country of Residence was 70:30 in set A between USA and Canada, the proportions of students taking each class should also be 70:30. This should apply to all student attributes in set A and classes in set B.
I understand that because it's not possible to match everything perfectly it would end up being a minimization problem where you minimize an error term that determines how far the class proportion is from the actual student proportion. However I've never dealt with assignment problems where both sides have attributes that need matching (worker and job), nor have I found any resources online about such problems and frankly don't know where to begin formulating this as an LP. If there's anyone who can provide advice about how to begin formulating such a problem that can be translated to a solver like or-tools (everything linearized as much as possible) would be extremely helpful.
EDIT 1:
Based on Sutanu's answer I've gotten some headway but I'm still slightly confused on the matching part.
I have 1 optimization variable:
Binary $x_{s,r}$ = Student s is taking row r
1 intermediary variable dependent on the optimization variable:
Binary $y_{s,c}$ = Student s is taking class c (Dependent on $x_{s,r}$)
1 preset binary variable for rows to class:
Binary $z_{r,c}$ = Row r contains class c
Functionally equivalent to $c_r$ in Sutanu's response
Preset binary variables for each categorical attribute and student
Binary $att_{s,a,b}$ = Student s is attribute a for given categorical attribute b (a = US/Canada for b = Country, or a = Fresh/Soph/Jr/Sr for b = Year etc)
(Intended to be functionally equivalent to $d_s$ in Sutanu's response)
Overall Constriants:
$\sum_{s}^{}x_{s,r} = 1$ forcing 1 row to be taken by 1 student
$\sum_{r}^{}x_{s,r} = 1$ forcing 1 student to take only 1 row
Linearization of $x_{s,r}$ & $z_{r,c}$ => $y_{s,c}$ :
$y_{s,c} \leqslant x_{s,r}$
$y_{s,c} \leqslant z_{r,c}$
$y_{s,c} \geqslant x_{s,r} + z_{r,c} - 1$
My question at the moment is: How exactly do I link $att_{s,a,b}$ and $y_{s,c}$ to make a variable that determines the number of students for a given attribute (US/Canada) are in a given class? Is the only way to make another binary variable $yatt_{s,a,b,c}$ if student s of attribute a in categorical attribute b is taking class c and linearize that for $att_{s,a,b}$ and $y_{s,c}$ => $yatt_{s,a,b,c}$?
I understand that this variable is what I should constrain to keep within the desired proportions (30% Canada / 70% USA etc), but I don't really know if there is a better way to create this variable than making $yatt_{s,a,b,c}$.