-2

From the article JavaScript 2D Array – Two Dimensional Arrays in JS, I see one way to store data is to put all properties of each element into one array:

let dataRepresentation1 = [
    ['John Doe', 20, 60, 'A'],
    ['Jane Doe', 10, 52, 'B'],
    ['Petr Chess', 5, 24, 'F'],
    ['Ling Jess', 28, 43, 'A'],
    ['Ben Liard', 16, 51, 'B']
];

I wonder how that compares to array of objects:

dataRepresentation2 = [
    {
        "id": "id",
        "key2": "value2",
        "key3": "value3"
    },
    {
        "id": "id",
        "key2": "value2",
        "key3": "value3"
    }
]

and having each property in their own array:

dataRepresentation3 = [
    [ 'John Doe', 'Jane Doe', 'Petr Chess', 'Ling Jess', 'Ben Liard' ],
    [ 20        , 10        , 5           , 28         , 16          ],
    [ 60        , 52        , 24          , 43         , 51          ],
    [ 'A'       , 'B'       , 'F'         , 'A'        , B           ]
] 

Here are my try to compare them:

Representation 1 2 3
Easy to comprehend OK Good OK
Easy to compute (modify, slice, filter, etc) Good ? Bad
Easy to get a specific value Good Good OK

Are these evaluations correct? Are there other criteria to include in? I'm mostly use JS and Python, if that matters.

See also: Array of objects vs Object of Objects

4
  • 1
    How you store data often doesn't matter that much, because you expose views. You can have some use cases that prefer rows, some that prefer columns, and yet further that benefit from key:value labels, because any of those representations can be transformed into any other.
    – Caleth
    Commented Jul 14, 2023 at 8:19
  • hmm in that case do you know a list of snippets to transform representations? I'm not sure if they are still called as design pattern?
    – Ooker
    Commented Jul 14, 2023 at 9:11
  • 1
    e.g. rows -> objects: data.map(row=>({name: row[0], key2: row[1], ...etc })). columns to rows would be use a transpose function, etc
    – Caleth
    Commented Jul 14, 2023 at 10:43
  • @Caleth do you know any collection of data transformations like that?
    – Ooker
    Commented Jul 25, 2023 at 7:36

3 Answers 3

0

I would probably only use #3 for data analytic functions specifically when most of my use cases are aggregate functions on columns (Min, Max, Avg, Sum) - you could also binary search on a column; if the data is sorted on that column. The primary advantage of #3 is you don't have to extract the column you can just pass the raw array to the aggregate function.

Both #1 and #2 have the advantage that it is easy to "extract" a single record, since in most cases you will have a lot of functions that need to process a single record - hence you don't need to pass in the entire structure and an index.

#1 is more memory efficient than #2 since it doesn't repeat the keys with every record - however there is a trade off that you need to maintain the mapping of column to column index and ensure all places you access the column are updated if a column inserted or deleted from the middle of the array - so I would generally favor #2 over #1.

That said in a strongly typed OOP language, I would choose a List/Array of Objects over both #1 and #2 - since that is likely more memory efficient than both options and it also provides type safety.

Finally if I were to switch to a language like C I would choose an array of (ideally fixed size **) structs that way memory access is linear (it is not necessary to follow pointers), hence we gain the performance benefits from better cache locality.

** - Large strings may still need pointers - to avoid creating huge structures.

4
  • how do you compare them for functions like checking if an element is good, and if not then modifying or removing the whole element from the data? It seems to me that #3 is best for searching for a value, but awful for modifying or removing elements. It's hard to write a function to do that if you need a for loop outside
    – Ooker
    Commented Jul 11, 2023 at 11:58
  • 1
    In most languages I would just use a filter function on the array/list most languages implement that as a copy operation (you get a new array/list back) hence you don't need a loop, just the validation function and typically in an OOP language I would put that validation function (which returns a boolean) on the object itself so the code would be something like newList = list.filter(::isValid)
    – DavidT
    Commented Jul 11, 2023 at 13:11
  • I see. In general is there a design pattern about this?
    – Ooker
    Commented Jul 11, 2023 at 13:49
  • 1
    Internally it's probably implemented as an Iterator, however the interface the caller is presented with, is probably closer to a Visitor since the caller doesn't control the loop and the underlying structure is typically copied.
    – DavidT
    Commented Jul 14, 2023 at 8:14
0

dataRepresentation1 and dataRepresentation3 are identical, you are just swapping around the indices when accessing them (data[x][y] vs data[y][x]). However dataRepresentation1 is probably better than 3 since you are keeping like objects together, but realistically doesn't make much of a difference.

dataRepresentation2 is similar to dataRepresentation1 but instead of using a numerical index in an array to a named key in an object, the only difference to both is "developer experience" where you can see at a glance what each column represents, but functionally doesn't make a difference.

All 3 are easy to comprehend, dataRepresentation2 probably being the easiest to comprehend since it is explicit in what each "column" is.

Easy to compute depends on what you are trying to do with them.

And as you already stated in easy to get values, they are all as good as each other.

0

TLDR;

In the final analysis this OP boils down to forcing coherent objects to look like RDBMS data tables so there is a 1:1 ( object K:V ) to ( DB table column ), flagrantly ignoring prudent RDBMS normalization.



This question is subtly convoluting data and information. This thought experiment reorganizes information and asks if we can guess the object members and meaning.

There is no de facto definition of what the data is: either in terms of RDBMS column headings or object Key-Value pairs. So how can we judge what is "easier" to read when we don't know what is what.

The explicit JS object notation has (requires?) an "ID", which is not a requirement for a coherent object.

The other representations fragments objects into positional-related groups vaguely resembling table-organized data; of which we are to infer what is what - hence this thread's title.

Interestingly there is no "ID" to coalesce the fragmented objects. An "ID" is not required per-se for a RDBMS table but the absence of a defined primary key (which does not have to be a GUID-like "ID") makes me think "one table to rule them all." The follow up question in the genre of object-data mashup is, "why is my database so slow?"

Not the answer you're looking for? Browse other questions tagged or ask your own question.