0
\$\begingroup\$

Main Questions:

  • Is my runtime analysis correct?
  • Is my approach optimal or close to it?
  • Is there a better way and/or improvements?

Objective:

  • Write an algorithm to assign n jobs to k entities (servers, cpus, whatever) in a round robin fashion. The job that takes longest will be assign first, the shortest last. In the case of a tie, the job with lowest id will be assigned first.
  • return the job ids assigned to each entity, where an entity can be represented as a list

sample input:

  • jobs= [40, 80,40, 4, 2]
  • entities = 2
  • where jobs[i] is the length of time the task, and i is its identifier.

sample output:

  • [[1, 2, 4], [0, 3]]

I came up with two algorithms and wanted to see if my runtime analysis is correct, and if there is a better way to approach this

Implementation 1: Linear Search

Pseudocode:

  • linear search for the longest job, from left to right, saving its job id (index)
  • only update the index if we find a larger job in list
  • once we find it assign to another list, which will contain the jobs in order
  • iterate over the ordered list, assigning to each server in round robin fashion

Hypothesis for runtime:

  • time proportional to n ^ 2, where n are the number of jobs

Code


def findMax(jobs):
        ind_of_max = 0
        ### iterate over jobs
        for k in range(len(jobs)):
            # if current job we are processing is greater than the previous lomgest job
            if jobs[k] > jobs[ind_of_max]:
                ind_of_max = k
        # sinonimouns to removing the job from the PQ
        jobs[ind_of_max] = -1

        return ind_of_max

def determineJobOrder(jobs):
    
    # list of jobIds
    order_of_jobs = []
    num_of_jobs = len(jobs)
    
    #### TOTAL RUNTIME ANALYSIS:
    # - runs in time proportional to N  ^ 2
    ####

    # runs in time proportional to N
    while num_of_jobs > 0:

        # removing jobs from list (aka performing dequeue operations until we run out of jobs)
        # runs in time proportional to N
        ind_of_max =  findMax(jobs)
        
        order_of_jobs.append(ind_of_max)      
        num_of_jobs -= 1
     
    return order_of_jobs

def assignJobsToServers(servers, jobs):
    
    jobs = determineJobOrder(jobs)
    
    # create a list of lists, where the number of inner lists = number of servers
    assignment = [[] for i in range(0, servers)]
    
    
    num_jobs = len(jobs)
    job_ind = 0
    server_ind = 0
    

    ###
    ### RUNTIME ANALYSIS:
    # - runs in time proportional to n
    #while there are jobs left to assign

    while num_jobs > 0:

        assignment[server_ind % servers].append(jobs[job_ind])
        # update server index, job index, and number of jobs left to assign
        num_jobs -= 1
        job_ind += 1
        server_ind += 1

    return assignment

Implementation 2: Using Heap

Pseudocode:

  • Iterate over all jobs, creating a Job object, Job(job_id, time)
  • Since python's heapq class is a min heap, I use heapq.nlargest, passing a key for comparison, the list of Jobs

Hypothesis of runtime:

  • time proportional to n log n, where n are the number of jobs

Code:



## To enncapsulate the jobs and be able to compare by multiple attributes
class Job:
    def __init__(self, job_id, time):
        self.job_id = job_id
        self.time = time

    def __repr__(self):
        return 'Job(id: {}, time: {})'.format(self.job_id, self.time)




import heapq



def assignJobsToServers(servers, jobs):
    
    for k, exec_time in enumerate(jobs):
        jobs[k] = Job(k, exec_time)

    
    # resource used to figure out how to sort by two attributes, one in natural order one in reverse:
    # https://www.techiedelight.com/sort-list-of-objects-by-multiple-attributes-python/
    
    #### Runtime Analysis:
    # - I am assuming that what this will do is build the max heap, and then remove the n items to get them in max order.
    # - Inserts into heap take log (n).
    # - removals take log(n)
    # - if we are enqueueing n items, this will take n log n
    # - if we are removing n items, this will take n log n for removals
    # - hence: runtime should be proportional n log n, ignoring constants
    ####

    # key defines that that the larger the time the larger the, and on a tie, the smallest the job id the largest the item 
    jobs_ordered = heapq.nlargest(len(jobs), jobs, key = lambda job: (job.time, -job.job_id))


    # create a list of lists, where the number of inner lists = number of servers
    assignments = [[] for i in range(0, servers)]
    
    
    num_jobs = len(jobs)
    server_ind = 0
    job_ind = 0
    
    ###### RUNTIME ANALYSIS:
    # - at this point we already did all the difficult computation. We now just iterate over all jobs which will be in order and assign
    # - this takes time proportional to n, the number of jobs
    while num_jobs > 0:
       
        assignments[server_ind % servers].append(jobs_ordered[job_ind].job_id)
        
        # update server index, and number of jobs left to assign
        num_jobs -= 1
        server_ind += 1
        job_ind += 1

    return assignments


\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

Your runtime analysis seems fine and your heap-based approach is optimal in the narrow sense that it achieves O(n log n) performance. However, the software itself is needlessly complex.

The path to simplicity is found by remembering the most famous O(n log n) operation: sorting. You have a list of job durations that you want to sort from largest to smallest, breaking ties via their ID/index. There are a variety of reasonable ways to achieve that. Here's one illustration:

# Some job durations and a generator to produce (DURATION, ID) pairs.
durations = [40, 80, 40, 4, 2]
pairs = enumerate(durations)

# A function to take a pair and return a modified pair for sorting purposes.
sort_key = lambda pair: (-pair[1], pair[0])

# Sort the pairs and extract job IDs from them.
job_ids = [jid for jid, dur in sorted(pairs, key = sort_key)]

# Allocate the job IDs to N workers/entities in round-robin fashion.
n = 2
allocated_job_ids = [job_ids[i::n] for i in range(n)]
print(allocated_job_ids)   # [[1, 2, 4], [0, 3]]
\$\endgroup\$
2
  • \$\begingroup\$ got it. not sure why that did not come to mind but that would definitely be simpler in this case, Thank you. Would a PQ implemented with Heap be preferred in the case our list of jobs is not static , meaning, let's say, two jobs can be added to the initial list, then 1 remove, followed by more additions and removals interleaved? it seems like in that case we couldn't achieve n log n for both removals and additions. What do you think? \$\endgroup\$
    – MPC
    Commented Nov 16, 2021 at 12:17
  • 1
    \$\begingroup\$ @MPC Correct. The superpower of a heap is its log-n ability to maintain sorted order in the face of inserts/removals. \$\endgroup\$
    – FMc
    Commented Nov 16, 2021 at 15:57

Not the answer you're looking for? Browse other questions tagged or ask your own question.