Is this the right implementation for Linear Programming (puLP) on python?

Question

I have created a LP function to help maximize a set of features. My first time playing with this library and also conducting LP.

Variables:

Number of features => X
Number of Categories => Y

Problem function: Maximize the Z(s) given changes in X and Y. If I add more features (X) from specific categories or the pool of categories (Y) then Z should be at its max.

Constraints:

feature can come from specific category though it does not have to
feature may have a specific threshold though it does not have to
the number of features in total regardless of the categories must be 5

Code:

import pandas as pd
import numpy as np
from pulp import *
import random

data = [{'category': 'category 1',
  'item_title': 'item 1',
  'feature 1': 10.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 0.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 8.0,
  'feature 17': 0.0},
 {'category': 'category 1',
  'item_title': 'item 2',
  'feature 1': 0.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 0.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 10.0,
  'feature 8': 30.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 9.0,
  'feature 17': 0.0},
 {'category': 'category 1',
  'item_title': 'item 3',
  'feature 1': 0.0,
  'feature 2': 22.0,
  'feature 3': 0.0,
  'feature 4': 0.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 7.0,
  'feature 17': 0.0},
 {'category': 'category 1',
  'item_title': 'item 4',
  'feature 1': 0.0,
  'feature 2': 36.0,
  'feature 3': 0.0,
  'feature 4': 0.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 18.0,
  'feature 17': 0.0},
 {'category': 'category 1',
  'item_title': 'item 5',
  'feature 1': 0.0,
  'feature 2': 54.0,
  'feature 3': 0.0,
  'feature 4': 0.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 5.0,
  'feature 16': 32.0,
  'feature 17': 0.0},
 {'category': 'category 1',
  'item_title': 'item 6',
  'feature 1': 0.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 20.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 15.0,
  'feature 17': 0.0},
 {'category': 'category 1',
  'item_title': 'item 7',
  'feature 1': 2.0,
  'feature 2': 0.0,
  'feature 3': 4.0,
  'feature 4': 0.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 10.0,
  'feature 17': 0.0},
 {'category': 'category 1',
  'item_title': 'item 8',
  'feature 1': 8.0,
  'feature 2': 0.0,
  'feature 3': 2.0,
  'feature 4': 0.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 20.0,
  'feature 16': 0.0,
  'feature 17': 0.0},
 {'category': 'category 1',
  'item_title': 'item 9',
  'feature 1': 0.0,
  'feature 2': 19.0,
  'feature 3': 0.0,
  'feature 4': 8.0,
  'feature 5': 0.0,
  'feature 6': 8.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 5.0,
  'feature 14': 0.0,
  'feature 15': 5.0,
  'feature 16': 5.0,
  'feature 17': 0.0},
 {'category': 'category 2',
  'item_title': 'item 10',
  'feature 1': 0.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 0.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 55.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 5.0,
  'feature 16': 0.0,
  'feature 17': 0.0},
 {'category': 'category 2',
  'item_title': 'item 11',
  'feature 1': 0.0,
  'feature 2': 89.0,
  'feature 3': 0.0,
  'feature 4': 0.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 35.0,
  'feature 12': 9.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 0.0,
  'feature 17': 0.0},
 {'category': 'category 2',
  'item_title': 'item 12',
  'feature 1': 0.0,
  'feature 2': 12.0,
  'feature 3': 0.0,
  'feature 4': 7.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 27.0,
  'feature 12': 50.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 0.0,
  'feature 17': 0.0},
 {'category': 'category 2',
  'item_title': 'item 13',
  'feature 1': 0.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 9.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 37.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 0.0,
  'feature 17': 0.0},
 {'category': 'category 2',
  'item_title': 'item 14',
  'feature 1': 0.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 110.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 29.0,
  'feature 12': 6.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 0.0,
  'feature 17': 0.0},
 {'category': 'category 2',
  'item_title': 'item 15',
  'feature 1': 0.0,
  'feature 2': 5.0,
  'feature 3': 0.0,
  'feature 4': 0.0,
  'feature 5': 0.0,
  'feature 6': 8.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 43.0,
  'feature 12': 0.0,
  'feature 13': 6.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 3.0,
  'feature 17': 0.0},
 {'category': 'category 3',
  'item_title': 'item 16',
  'feature 1': 0.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 64.0,
  'feature 5': 12.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 52.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 0.0,
  'feature 17': 0.0},
 {'category': 'category 3',
  'item_title': 'item 17',
  'feature 1': 0.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 66.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 0.0,
  'feature 17': 8.0},
 {'category': 'category 3',
  'item_title': 'item 18',
  'feature 1': 0.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 8.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 0.0,
  'feature 17': 18.0},
 {'category': 'category 3',
  'item_title': 'item 19',
  'feature 1': 0.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 1.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 0.0,
  'feature 17': 4.0},
 {'category': 'category 3',
  'item_title': 'item 20',
  'feature 1': 0.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 0.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 9.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 5.0,
  'feature 16': 0.0,
  'feature 17': 4.0},
 {'category': 'category 3',
  'item_title': 'item 21',
  'feature 1': 0.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 90.0,
  'feature 5': 2.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 62.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 0.0,
  'feature 17': 0.0},
 {'category': 'category 3',
  'item_title': 'item 22',
  'feature 1': 0.0,
  'feature 2': 17.0,
  'feature 3': 0.0,
  'feature 4': 19.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 42.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 0.0,
  'feature 17': 0.0},
 {'category': 'category 3',
  'item_title': 'item 23',
  'feature 1': 0.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 4.0,
  'feature 5': 2.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 0.0,
  'feature 17': 19.0},
 {'category': 'category 3',
  'item_title': 'item 24',
  'feature 1': 0.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 45.0,
  'feature 5': 20.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 0.0,
  'feature 17': 0.0},
 {'category': 'category 3',
  'item_title': 'item 25',
  'feature 1': 0.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 18.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 25.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 0.0,
  'feature 16': 0.0,
  'feature 17': 0.0},
 {'category': 'category 4',
  'item_title': 'item 26',
  'feature 1': 0.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 0.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 6.0,
  'feature 14': 0.0,
  'feature 15': 6.0,
  'feature 16': 0.0,
  'feature 17': 0.0},
 {'category': 'category 4',
  'item_title': 'item 27',
  'feature 1': 0.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 0.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 80.0,
  'feature 15': 0.0,
  'feature 16': 0.0,
  'feature 17': 0.0},
 {'category': 'category 4',
  'item_title': 'item 28',
  'feature 1': 90.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 0.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 40.0,
  'feature 16': 0.0,
  'feature 17': 0.0},
 {'category': 'category 4',
  'item_title': 'item 29',
  'feature 1': 0.0,
  'feature 2': 0.0,
  'feature 3': 0.0,
  'feature 4': 0.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 10.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 7.0,
  'feature 16': 0.0,
  'feature 17': 0.0},
 {'category': 'category 4',
  'item_title': 'item 30',
  'feature 1': 0.0,
  'feature 2': 10.0,
  'feature 3': 0.0,
  'feature 4': 0.0,
  'feature 5': 0.0,
  'feature 6': 0.0,
  'feature 7': 0.0,
  'feature 8': 0.0,
  'feature 9': 0.0,
  'feature 10': 0.0,
  'feature 11': 0.0,
  'feature 12': 0.0,
  'feature 13': 0.0,
  'feature 14': 0.0,
  'feature 15': 9.0,
  'feature 16': 0.0,
  'feature 17': 0.0}]

df = pd.DataFrame(data)

input_features = [{'variable':'feature 1', 'sum_threshold':100, 'dType':"Integer", "constrained_Group":"category 1"},
                  {'variable':'feature 2', 'sum_threshold':49, 'dType':"Integer", "constrained_Group":"category 2"},
                  {'variable':'feature 8', 'sum_threshold':66, 'dType':"Integer", "constrained_Group":"category 3"},  
             ] 
categories = list(set(df.category)) # categories in data

problem = LpProblem("Best Features", LpMaximize) # initialise problem

indexes_for_categories = [] #to store the indexes of all categories that are used in input_features
# Loop through list of dictionary to store conditions/constraints in problem
for dict_ in input_features:
    
    # Create index of items which will help to extract final features at the end
    items = df.index.tolist()
    # Create problem variables as dict - index of data frame and the column from desired variable
    problem_var = dict(zip(items, np.array(df[dict_['variable']].tolist())))
    # Need to create unique names for x so that pulp does not run into error of duplicates
    X = LpVariable.dicts(f"x_{random.uniform(1,7)}", indices=items, lowBound=0, upBound=1, cat=dict_['dType'], indexStart=[])

    # problem to solve. Maximize the sum of chosen variables 
    problem += lpSum( [X[i] * problem_var[i] for i in items]) 

    # if category is applied, must apply constraint - max sum must only be within this category
    if dict_['constrained_Group'] is not None:

        constrained_df = df[df['category'].str.contains(dict_['constrained_Group'])].fillna(0)
        constrained_df_items = constrained_df.index.tolist()
        constrained_df_problem_var = dict(zip(constrained_df_items, np.array(constrained_df[dict_['variable']].tolist())))
        problem += lpSum( [X[i] * constrained_df_problem_var[i] for i in constrained_df_items]) 

         # if threshold provided when category is provided, max must be within this threshold
        if dict_['sum_threshold'] is not None: 
            problem += lpSum([X[i] * constrained_df_problem_var[i] for i in constrained_df_items]) <= dict_['sum_threshold']

         # Range of indexes from categories selected - to be used if all input features explicitly state a category. This will be the sample from which to select all 6 items. 
        category_index = np.arange(constrained_df.index.min(),constrained_df.index.max()).tolist()
        indexes_for_categories.append(category_index)

    # if no category is provided 
    else:
        # if threshold is provided when no category is provided solution must be within this threshold
        if dict_['sum_threshold'] is not None: 
            problem += lpSum([X[i] * problem_var[i] for i in items]) <= dict_['sum_threshold']  

# if all input features (list of dicts) all have categories, then need to constrain total number of items (5) to just those in the categories selected. If not then select best 5 from total pool of items. 
only_constrained_gs = [dict_['constrained_Group'] for dict_ in input_features if dict_['constrained_Group'] != None ]
if len(only_constrained_gs) == len(input_features):
    sample_to_choose_from = np.concatenate(indexes_for_categories)
    problem += lpSum( [X[i] for i in sample_to_choose_from] ) == 5
else:
    problem += lpSum( [X[i] for i in items] ) == 5

# solve problem
problem.solve()

# store variables and extract indexes to then extract from original data
variables = []
values = [] 
for v in problem.variables():
    variable = v.name
    value = v.varValue 
    variables.append(variable)
    values.append(value)
values = np.array(values).astype(int)
items_list = pd.DataFrame(np.array([variables,values]).T,columns = ['Variable','Optimal Value'])
items_list['Optimal Value'] = items_list['Optimal Value'].astype(int)
items_list_opt = items_list[items_list['Optimal Value']!=0]

res_df = []
for dict_ in input_features:
    index_pos = np.array([int(i) for i in items_list_opt["Variable"].str.split('_').str[-1].tolist()])
    items_attribute_vals = df[dict_['variable']].loc[index_pos].astype(int)


    items_names = df['item_title'].loc[index_pos] #.astype(int)
    result_optimize = pd.concat([items_names, items_attribute_vals], axis=1).T
    res_df.append(result_optimize)

df[df.index.isin(pd.concat(res_df, axis=1).T.drop_duplicates(subset="item_title").index)]

The current output:

category    item_title  feature 1   feature 2   feature 3   feature 4   feature 5   feature 6   feature 7   feature 8   feature 9   feature 10  feature 11  feature 12  feature 13  feature 14  feature 15  feature 16  feature 17
0   category 1  item 1  10.0    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.0 0.0
1   category 1  item 2  0.0 0.0 0.0 0.0 0.0 0.0 10.0    30.0    0.0 0.0 0.0 0.0 0.0 0.0 0.0 9.0 0.0
10  category 2  item 11 0.0 89.0    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 35.0    9.0 0.0 0.0 0.0 0.0 0.0
11  category 2  item 12 0.0 12.0    0.0 7.0 0.0 0.0 0.0 0.0 0.0 0.0 27.0    50.0    0.0 0.0 0.0 0.0 0.0
20  category 3  item 21 0.0 0.0 0.0 90.0    2.0 0.0 0.0 62.0    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

I built a quick codesandbox here if you wish to test it out.

Is this the right implementation for the problem I am trying to solve? Would appreciate some guidance on this code implementation.

I have updated my initial q with a code sandbox and the current output. — Kale, Commented Jul 17, 2023 at 16:46
Ah, yes it does. But as I Add more constraints, less and less so. For example, I wanted to add the constraint of making certain items from categories be a certain amount of the total variables. Example:, category 1 == 3 and the last 2 come from other categories. I have included this constraint in the codesandbox. It was also indicated in the original question but was able to somewhat model it earlier this afternoon. The idea is to constrain each category to a certain number, remove its sample from the original then derive the remnants (2) from what is left. Does not work (>5 items results). — Kale, Commented Jul 17, 2023 at 18:17

Reinderien · Accepted Answer · 2023-07-19 02:02:34Z

Don't import *.

Move your data to a CSV.

Your constraint construction seems a little too generic - you've written for cases that will never actually be evaluated. If you want to keep it that way, you need to write functions with well-defined parameters. I've demonstrated the opposite and only showed the code for constraints that are currently applied.

Don't list(set) in this context.

Best Features in its current format is going to be ignored and an underscore added, so you might as well do that yourself.

random.uniform is... appalling on many dimensions. Think about your actual data and what index values would make your variable names meaningful and unique. In this case it will be based on item and feature.

lpSum is not needed when the variables are in a dataframe. Even if you did want to use the pulp utility, it would be lpDot and not lpSum.

lowBound=0, upBound=1, cat=dict_['dType'] is equivalent to just cat=pulp.LpBinary.

It's critically important that you check the status after a solve. Otherwise you're outputting garbage if the solve fails.

Your feature matrix is highly sparse, and I don't entirely understand why the zeros are still observed in your output. I demonstrate a way to only optimize for the non-zero values.

If I understand your code correctly, it could instead look like

import pandas as pd
import numpy as np
import pulp

# Load dataframe from CSV, moving category and item to index and removing redundant prefixes
df = pd.read_csv('features.csv')
df['category'] = df.category.str.removeprefix('category ')
df['item_title'] = df['item_title'].str.removeprefix('item ')
df.set_index(['category', 'item_title'], inplace=True)
df.index.names = 'category', 'item'
df.columns = df.columns.str.removeprefix('feature ')
df.columns.name = 'feature'

# Convert from a sparse rectangular frame to a dense triple-indexed column series,
# and only keep input features of interest
inputs = df[['1', '2', '8']].replace(0, np.nan).stack()
print('Input feature values:')
print(inputs.to_string(), end='\n\n')

# assignment variable names (omitting category as it is not needed for uniqueness)
var_names = (
    'i'
    + inputs.index.get_level_values('item')
    + '_f'
    + inputs.index.get_level_values('feature')
).to_series(index=inputs.index)

# binary assignments, again only for non-zero feature positions
assigns = var_names.apply(pulp.LpVariable, cat=pulp.LpBinary)
print('Assignment variables:')
print(assigns.to_string(), end='\n\n')

problem = pulp.LpProblem('best_features', pulp.LpMaximize)
problem.objective = assigns.dot(inputs)
problem.addConstraint(name='total', constraint=assigns.sum() == 5)

# For each input feature, and its constraint parameters
for (feature, feature_group), category, sum_threshold in zip(
    assigns.groupby(level='feature'),
    ('1', '2', '3'),
    (100, 49, 66),
):
    item = slice(None)  # all items within the category
    constrained_assigns = feature_group.loc[(category, item)]
    constrained_vals = inputs.loc[(category, item, feature)]

    # The dot product of the assignments and feature values is given an upper bound
    problem.addConstraint(
        name=f'sum_f{feature}_c{category}',
        constraint=constrained_assigns.dot(constrained_vals.values) <= sum_threshold,
    )

print(problem)
problem.solve()
assert problem.status == pulp.LpStatusOptimal

print('Assigned features:')
outputs = inputs[assigns.apply(pulp.LpVariable.value) > 0.5]
print(outputs.to_string())

Input feature values:
category  item  feature
1         1     1          10.0
          2     8          30.0
          3     2          22.0
          4     2          36.0
          5     2          54.0
          7     1           2.0
          8     1           8.0
          9     2          19.0
2         11    2          89.0
          12    2          12.0
          15    2           5.0
3         16    8          52.0
          21    8          62.0
          22    2          17.0
                8          42.0
          25    8          25.0
4         28    1          90.0
          30    2          10.0

Assignment variables:
category  item  feature
1         1     1           i1_f1
          2     8           i2_f8
          3     2           i3_f2
          4     2           i4_f2
          5     2           i5_f2
          7     1           i7_f1
          8     1           i8_f1
          9     2           i9_f2
2         11    2          i11_f2
          12    2          i12_f2
          15    2          i15_f2
3         16    8          i16_f8
          21    8          i21_f8
          22    2          i22_f2
                8          i22_f8
          25    8          i25_f8
4         28    1          i28_f1
          30    2          i30_f2

best_features:
MAXIMIZE
89.0*i11_f2 + 12.0*i12_f2 + 5.0*i15_f2 + 52.0*i16_f8 + 10.0*i1_f1 + 62.0*i21_f8 + 17.0*i22_f2 + 42.0*i22_f8 + 25.0*i25_f8 + 90.0*i28_f1 + 30.0*i2_f8 + 10.0*i30_f2 + 22.0*i3_f2 + 36.0*i4_f2 + 54.0*i5_f2 + 2.0*i7_f1 + 8.0*i8_f1 + 19.0*i9_f2 + 0.0
SUBJECT TO
total: i11_f2 + i12_f2 + i15_f2 + i16_f8 + i1_f1 + i21_f8 + i22_f2 + i22_f8
 + i25_f8 + i28_f1 + i2_f8 + i30_f2 + i3_f2 + i4_f2 + i5_f2 + i7_f1 + i8_f1
 + i9_f2 = 5

sum_f1_c1: 10 i1_f1 + 2 i7_f1 + 8 i8_f1 <= 100

sum_f2_c2: 89 i11_f2 + 12 i12_f2 + 5 i15_f2 <= 49

sum_f8_c3: 52 i16_f8 + 62 i21_f8 + 42 i22_f8 + 25 i25_f8 <= 66

VARIABLES
0 <= i11_f2 <= 1 Integer
0 <= i12_f2 <= 1 Integer
0 <= i15_f2 <= 1 Integer
0 <= i16_f8 <= 1 Integer
0 <= i1_f1 <= 1 Integer
0 <= i21_f8 <= 1 Integer
0 <= i22_f2 <= 1 Integer
0 <= i22_f8 <= 1 Integer
0 <= i25_f8 <= 1 Integer
0 <= i28_f1 <= 1 Integer
0 <= i2_f8 <= 1 Integer
0 <= i30_f2 <= 1 Integer
0 <= i3_f2 <= 1 Integer
0 <= i4_f2 <= 1 Integer
0 <= i5_f2 <= 1 Integer
0 <= i7_f1 <= 1 Integer
0 <= i8_f1 <= 1 Integer
0 <= i9_f2 <= 1 Integer

Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

At line 2 NAME          MODEL
At line 3 ROWS
At line 9 COLUMNS
At line 92 RHS
At line 97 BOUNDS
At line 116 ENDATA
Problem MODEL has 4 rows, 18 columns and 28 elements
Coin0008I MODEL read with 0 errors

Result - Optimal solution found

Objective value:                272.00000000
Enumerated nodes:               0
Total iterations:               0
Time (CPU seconds):             0.00
Time (Wallclock seconds):       0.00

Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.00   (Wallclock seconds):       0.00

Assigned features:
category  item  feature
1         2     8          30.0
          4     2          36.0
          5     2          54.0
3         21    8          62.0
4         28    1          90.0

You ask:

I wanted to add the constraint of making certain items from categories be a certain amount of the total variables. Example:, category 1 == 3 and the last 2 come from other categories.

Add this constraint:

problem.addConstraint(name='cat_1', constraint=assigns.loc['1'].sum() == 3)

Thank you for your input. From your code, it seems I was off by a vast distance. This looks great. The only question I have is how do I limit the output to a certain amount, say 5? I looked around and only saw examples using prob += pulp.lpSum( [ x[j] for j in boundary_condition ] )==const where boundary_condition are the list cumulation of indexes in the sample data. How would this be done using your method? - the aim would be to get the best 5 items regardless of their category where each of these features values are maximized given the above constraints. — Kale, Commented Jul 18, 2023 at 18:32
So for example, given that the current set of items present in the outputs that all satisfy the constraints, I only want to see the 5 that satisfy all of them (the top 5) accounting for all features. So out of items 1,8,2,2,2,1... which five of these satisfy the problem function for all the features and constraints. Instead of all of them, just five or some discretionary value as an output. — Kale, Commented Jul 18, 2023 at 18:49
Apologies, I meant the values under the item column. The original purpose of the optimisation was to find 5 items from the sample 1,2,3,4,5,6,7... that maximize the selected features 1,8,2. The current function you kindly created finds all items that maximize the selected features. But I just want 5 that do so. — Kale, Commented Jul 18, 2023 at 21:08

Stack Exchange Network

Is this the right implementation for Linear Programming (puLP) on python?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
python
performance
numpy
pandas
or ask your own question.

Hot Network Questions

Is this the right implementation for Linear Programming (puLP) on python?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pythonperformancenumpypandas or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
python
performance
numpy
pandas
or ask your own question.