0
\$\begingroup\$

I wrote code to aggregate transactions in slips. I'm concerned with the performance of my code because it uses 3 loops that are nested so the time complexity will be cubic. This code won't scale well because if the number of slips or transactions are very large then it will take a long time to run.

How can I improve my code in terms of performance and scalability?

It would also help if you described your methodology for solving this problem.

package main

import (
    "fmt"
    "strconv"
)

type Slip struct {
    TransactionIDs []int
}

type Transaction struct {
    ID     int
    Amount float64
    Payout bool
}

type AggregatedSlipData struct {
    // Number of transactions per slip
    TransactionCount int
    // The sum of the transaction amounts of this slip
    // (a payout must be subtracted instead of added!)
    TotalAmount float64
}

func main() {
    // Assume you have two distributed data sources. Your task is to
    // collect all data from these sources and return an aggregated result.
    slips := map[string]Slip{
        "slip_23": Slip{
            TransactionIDs: []int{123, 456},
        },
        "slip_42": Slip{
            TransactionIDs: []int{789},
        },
    }

    transactions := []Transaction{
        {
            ID:     123,
            Amount: 10.01,
            Payout: false,
        },
        {
            ID:     456,
            Amount: 5.01,
            Payout: true,
        },
        {
            ID:     789,
            Amount: 20.1,
            Payout: false,
        },
    }

    slipsWithTransactions := make(map[string][]Transaction)
    for k, v1 := range slips {
        slipsWithTransactions[(k[5:])] = make([]Transaction, 0, len(slips))
        for _, v2 := range v1.TransactionIDs {
            for _, v3 := range transactions {
                if v3.ID ==v2 {
                    transaction :=  Transaction{
                        ID:     v3.ID,
                        Amount: v3.Amount,
                        Payout: v3.Payout,
                    }
                    slipsWithTransactions[(k[5:])] = append(slipsWithTransactions[k[5:]], transaction)
                }
            }
        }
    }


    
    fmt.Println("slipsWithTransactions: ", slipsWithTransactions)
    fmt.Println("")

    _ = slips
    _ = transactions
    
    aggregatedSlips := make(map[int]AggregatedSlipData)
    for k, v1 := range slipsWithTransactions {
        var sum float64 = 0
        
        for _, v2 := range v1 {
            if v2.Payout == false {
                sum += v2.Amount
            } else {
                sum -= v2.Amount
            }
        }
        n, err := strconv.Atoi(k)
        if err != nil {
            panic(err)
        }
        
        aggregatedSlips[n] = AggregatedSlipData{
            TransactionCount: len(v1),
            TotalAmount:      sum,
        } 
    }

    fmt.Printf("aggregatedSlips: %+v\n", aggregatedSlips)

    // Task: Use the two data sources above and create the following result:
    result := map[int]AggregatedSlipData{
        23: AggregatedSlipData{
            TransactionCount: 2,   // Number of transactions per slip
            TotalAmount:      5.0, // The sum of the transaction amounts of this slip (a payout must be subtracted instead of added!)
        },
        42: AggregatedSlipData{
            TransactionCount: 1,
            TotalAmount:      20.1,
        },
    }

    fmt.Printf("%+v\n", result)
}
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

How can I improve ... performance and scalability?

use an appropriate data structure

The inner loop makes a O(N) linear scan over the array of all transactions, for each slip transaction. There are two ways out of that.

hash

Preprocess all transactions to turn the in-memory array into a hash map. So we visit each transaction once to add it to the map, and visit it a second time when we loop over slip transaction IDs, with total of O(1) constant time complexity per slip transaction. If we naïvely copy array entries we wind up with about double the RAM usage.

sort

Perhaps we have a large number of transactions, which won't fit conveniently into RAM. Preprocess all transactions to turn them into a sorted list of transactions which we serialize to RDBMS or to a file. Similarly preprocess all slips so we have slip info sorted by transaction ID. Now do 2-way merge to consume those sorted lists. That makes it easy to match up transaction IDs within a constant memory footprint. Total time complexity is O(N log N) due to sorting. Memory complexity is whatever you allocated to achieve a rapid sort, for example /usr/bin/sort makes it easy to trade I/O cost against limited memory use.

Consider computing the aggregate amount sum as you're looping, without waiting to post-process the result.

Consider deleting the payout boolean, in favor of just using +/- sign of amount.


break out a helper

        slipsWithTransactions[(k[5:])] = make( ... )
                    ...
                    slipsWithTransactions[(k[5:])] = append( ... )

Remove that magic number 5 by defining a getSlipId() helper.

While you're at it, break out the occasional helper so the overly long main() function will fit in a single screenful of code.

\$\endgroup\$
7
  • \$\begingroup\$ I used a hashmap like you suggested. Is this correct: go.dev/play/p/RqgI1lQUU1b? \$\endgroup\$
    – bit
    Commented Sep 18, 2023 at 20:01
  • \$\begingroup\$ I understand what you mean by "with total of O(1) constant time complexity per slip transaction". Please can you clarify? Maybe you mean that accessing the map of transactions is an O(1) constant time operation. \$\endgroup\$
    – bit
    Commented Sep 18, 2023 at 20:04
  • \$\begingroup\$ Also, I'm not sure what you mean by "If we naïvely copy array entries we wind up with about double the RAM usage.". Please can you give an example. \$\endgroup\$
    – bit
    Commented Sep 18, 2023 at 20:05
  • \$\begingroup\$ Yes, that looks correct. And yes, map access happens in constant time, even if a pair of map accesses are needed for each transaction ID. \$\endgroup\$
    – J_H
    Commented Sep 18, 2023 at 20:29
  • \$\begingroup\$ Suppose an implementer objected to the "double RAM" aspect of having both a list and a map of same data in memory simultaneously. What might we do? Well, the array could be serialized out to a file, deallocated, and then we construct a map from the file. What if we should avoid using the filesystem? For an array that's hard, unless we can walk backward through it and deallocate entries as we go. If the list is a linked list, then it's easy to free each node as we insert map entries. If each entry is bigger than in OP, say a kilobyte, then a "small" map might go from ID to integer array index. \$\endgroup\$
    – J_H
    Commented Sep 18, 2023 at 20:32

Not the answer you're looking for? Browse other questions tagged or ask your own question.