I have created a LP function to help maximize a set of features. My first time playing with this library and also conducting LP.
Variables:
- Number of features => X
- Number of Categories => Y
Problem function: Maximize the Z(s) given changes in X and Y. If I add more features (X) from specific categories or the pool of categories (Y) then Z should be at its max.
Constraints:
- feature can come from specific category though it does not have to
- feature may have a specific threshold though it does not have to
- the number of features in total regardless of the categories must be 5
Code:
import pandas as pd
import numpy as np
from pulp import *
import random
data = [{'category': 'category 1',
'item_title': 'item 1',
'feature 1': 10.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 0.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 8.0,
'feature 17': 0.0},
{'category': 'category 1',
'item_title': 'item 2',
'feature 1': 0.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 0.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 10.0,
'feature 8': 30.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 9.0,
'feature 17': 0.0},
{'category': 'category 1',
'item_title': 'item 3',
'feature 1': 0.0,
'feature 2': 22.0,
'feature 3': 0.0,
'feature 4': 0.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 7.0,
'feature 17': 0.0},
{'category': 'category 1',
'item_title': 'item 4',
'feature 1': 0.0,
'feature 2': 36.0,
'feature 3': 0.0,
'feature 4': 0.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 18.0,
'feature 17': 0.0},
{'category': 'category 1',
'item_title': 'item 5',
'feature 1': 0.0,
'feature 2': 54.0,
'feature 3': 0.0,
'feature 4': 0.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 5.0,
'feature 16': 32.0,
'feature 17': 0.0},
{'category': 'category 1',
'item_title': 'item 6',
'feature 1': 0.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 20.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 15.0,
'feature 17': 0.0},
{'category': 'category 1',
'item_title': 'item 7',
'feature 1': 2.0,
'feature 2': 0.0,
'feature 3': 4.0,
'feature 4': 0.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 10.0,
'feature 17': 0.0},
{'category': 'category 1',
'item_title': 'item 8',
'feature 1': 8.0,
'feature 2': 0.0,
'feature 3': 2.0,
'feature 4': 0.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 20.0,
'feature 16': 0.0,
'feature 17': 0.0},
{'category': 'category 1',
'item_title': 'item 9',
'feature 1': 0.0,
'feature 2': 19.0,
'feature 3': 0.0,
'feature 4': 8.0,
'feature 5': 0.0,
'feature 6': 8.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 5.0,
'feature 14': 0.0,
'feature 15': 5.0,
'feature 16': 5.0,
'feature 17': 0.0},
{'category': 'category 2',
'item_title': 'item 10',
'feature 1': 0.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 0.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 55.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 5.0,
'feature 16': 0.0,
'feature 17': 0.0},
{'category': 'category 2',
'item_title': 'item 11',
'feature 1': 0.0,
'feature 2': 89.0,
'feature 3': 0.0,
'feature 4': 0.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 35.0,
'feature 12': 9.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 0.0,
'feature 17': 0.0},
{'category': 'category 2',
'item_title': 'item 12',
'feature 1': 0.0,
'feature 2': 12.0,
'feature 3': 0.0,
'feature 4': 7.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 27.0,
'feature 12': 50.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 0.0,
'feature 17': 0.0},
{'category': 'category 2',
'item_title': 'item 13',
'feature 1': 0.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 9.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 37.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 0.0,
'feature 17': 0.0},
{'category': 'category 2',
'item_title': 'item 14',
'feature 1': 0.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 110.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 29.0,
'feature 12': 6.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 0.0,
'feature 17': 0.0},
{'category': 'category 2',
'item_title': 'item 15',
'feature 1': 0.0,
'feature 2': 5.0,
'feature 3': 0.0,
'feature 4': 0.0,
'feature 5': 0.0,
'feature 6': 8.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 43.0,
'feature 12': 0.0,
'feature 13': 6.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 3.0,
'feature 17': 0.0},
{'category': 'category 3',
'item_title': 'item 16',
'feature 1': 0.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 64.0,
'feature 5': 12.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 52.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 0.0,
'feature 17': 0.0},
{'category': 'category 3',
'item_title': 'item 17',
'feature 1': 0.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 66.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 0.0,
'feature 17': 8.0},
{'category': 'category 3',
'item_title': 'item 18',
'feature 1': 0.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 8.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 0.0,
'feature 17': 18.0},
{'category': 'category 3',
'item_title': 'item 19',
'feature 1': 0.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 1.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 0.0,
'feature 17': 4.0},
{'category': 'category 3',
'item_title': 'item 20',
'feature 1': 0.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 0.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 9.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 5.0,
'feature 16': 0.0,
'feature 17': 4.0},
{'category': 'category 3',
'item_title': 'item 21',
'feature 1': 0.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 90.0,
'feature 5': 2.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 62.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 0.0,
'feature 17': 0.0},
{'category': 'category 3',
'item_title': 'item 22',
'feature 1': 0.0,
'feature 2': 17.0,
'feature 3': 0.0,
'feature 4': 19.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 42.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 0.0,
'feature 17': 0.0},
{'category': 'category 3',
'item_title': 'item 23',
'feature 1': 0.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 4.0,
'feature 5': 2.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 0.0,
'feature 17': 19.0},
{'category': 'category 3',
'item_title': 'item 24',
'feature 1': 0.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 45.0,
'feature 5': 20.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 0.0,
'feature 17': 0.0},
{'category': 'category 3',
'item_title': 'item 25',
'feature 1': 0.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 18.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 25.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 0.0,
'feature 16': 0.0,
'feature 17': 0.0},
{'category': 'category 4',
'item_title': 'item 26',
'feature 1': 0.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 0.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 6.0,
'feature 14': 0.0,
'feature 15': 6.0,
'feature 16': 0.0,
'feature 17': 0.0},
{'category': 'category 4',
'item_title': 'item 27',
'feature 1': 0.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 0.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 80.0,
'feature 15': 0.0,
'feature 16': 0.0,
'feature 17': 0.0},
{'category': 'category 4',
'item_title': 'item 28',
'feature 1': 90.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 0.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 40.0,
'feature 16': 0.0,
'feature 17': 0.0},
{'category': 'category 4',
'item_title': 'item 29',
'feature 1': 0.0,
'feature 2': 0.0,
'feature 3': 0.0,
'feature 4': 0.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 10.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 7.0,
'feature 16': 0.0,
'feature 17': 0.0},
{'category': 'category 4',
'item_title': 'item 30',
'feature 1': 0.0,
'feature 2': 10.0,
'feature 3': 0.0,
'feature 4': 0.0,
'feature 5': 0.0,
'feature 6': 0.0,
'feature 7': 0.0,
'feature 8': 0.0,
'feature 9': 0.0,
'feature 10': 0.0,
'feature 11': 0.0,
'feature 12': 0.0,
'feature 13': 0.0,
'feature 14': 0.0,
'feature 15': 9.0,
'feature 16': 0.0,
'feature 17': 0.0}]
df = pd.DataFrame(data)
input_features = [{'variable':'feature 1', 'sum_threshold':100, 'dType':"Integer", "constrained_Group":"category 1"},
{'variable':'feature 2', 'sum_threshold':49, 'dType':"Integer", "constrained_Group":"category 2"},
{'variable':'feature 8', 'sum_threshold':66, 'dType':"Integer", "constrained_Group":"category 3"},
]
categories = list(set(df.category)) # categories in data
problem = LpProblem("Best Features", LpMaximize) # initialise problem
indexes_for_categories = [] #to store the indexes of all categories that are used in input_features
# Loop through list of dictionary to store conditions/constraints in problem
for dict_ in input_features:
# Create index of items which will help to extract final features at the end
items = df.index.tolist()
# Create problem variables as dict - index of data frame and the column from desired variable
problem_var = dict(zip(items, np.array(df[dict_['variable']].tolist())))
# Need to create unique names for x so that pulp does not run into error of duplicates
X = LpVariable.dicts(f"x_{random.uniform(1,7)}", indices=items, lowBound=0, upBound=1, cat=dict_['dType'], indexStart=[])
# problem to solve. Maximize the sum of chosen variables
problem += lpSum( [X[i] * problem_var[i] for i in items])
# if category is applied, must apply constraint - max sum must only be within this category
if dict_['constrained_Group'] is not None:
constrained_df = df[df['category'].str.contains(dict_['constrained_Group'])].fillna(0)
constrained_df_items = constrained_df.index.tolist()
constrained_df_problem_var = dict(zip(constrained_df_items, np.array(constrained_df[dict_['variable']].tolist())))
problem += lpSum( [X[i] * constrained_df_problem_var[i] for i in constrained_df_items])
# if threshold provided when category is provided, max must be within this threshold
if dict_['sum_threshold'] is not None:
problem += lpSum([X[i] * constrained_df_problem_var[i] for i in constrained_df_items]) <= dict_['sum_threshold']
# Range of indexes from categories selected - to be used if all input features explicitly state a category. This will be the sample from which to select all 6 items.
category_index = np.arange(constrained_df.index.min(),constrained_df.index.max()).tolist()
indexes_for_categories.append(category_index)
# if no category is provided
else:
# if threshold is provided when no category is provided solution must be within this threshold
if dict_['sum_threshold'] is not None:
problem += lpSum([X[i] * problem_var[i] for i in items]) <= dict_['sum_threshold']
# if all input features (list of dicts) all have categories, then need to constrain total number of items (5) to just those in the categories selected. If not then select best 5 from total pool of items.
only_constrained_gs = [dict_['constrained_Group'] for dict_ in input_features if dict_['constrained_Group'] != None ]
if len(only_constrained_gs) == len(input_features):
sample_to_choose_from = np.concatenate(indexes_for_categories)
problem += lpSum( [X[i] for i in sample_to_choose_from] ) == 5
else:
problem += lpSum( [X[i] for i in items] ) == 5
# solve problem
problem.solve()
# store variables and extract indexes to then extract from original data
variables = []
values = []
for v in problem.variables():
variable = v.name
value = v.varValue
variables.append(variable)
values.append(value)
values = np.array(values).astype(int)
items_list = pd.DataFrame(np.array([variables,values]).T,columns = ['Variable','Optimal Value'])
items_list['Optimal Value'] = items_list['Optimal Value'].astype(int)
items_list_opt = items_list[items_list['Optimal Value']!=0]
res_df = []
for dict_ in input_features:
index_pos = np.array([int(i) for i in items_list_opt["Variable"].str.split('_').str[-1].tolist()])
items_attribute_vals = df[dict_['variable']].loc[index_pos].astype(int)
items_names = df['item_title'].loc[index_pos] #.astype(int)
result_optimize = pd.concat([items_names, items_attribute_vals], axis=1).T
res_df.append(result_optimize)
df[df.index.isin(pd.concat(res_df, axis=1).T.drop_duplicates(subset="item_title").index)]
The current output:
category item_title feature 1 feature 2 feature 3 feature 4 feature 5 feature 6 feature 7 feature 8 feature 9 feature 10 feature 11 feature 12 feature 13 feature 14 feature 15 feature 16 feature 17
0 category 1 item 1 10.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.0 0.0
1 category 1 item 2 0.0 0.0 0.0 0.0 0.0 0.0 10.0 30.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 9.0 0.0
10 category 2 item 11 0.0 89.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 35.0 9.0 0.0 0.0 0.0 0.0 0.0
11 category 2 item 12 0.0 12.0 0.0 7.0 0.0 0.0 0.0 0.0 0.0 0.0 27.0 50.0 0.0 0.0 0.0 0.0 0.0
20 category 3 item 21 0.0 0.0 0.0 90.0 2.0 0.0 0.0 62.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
I built a quick codesandbox here if you wish to test it out.
Is this the right implementation for the problem I am trying to solve? Would appreciate some guidance on this code implementation.