1

It often happens that we have a set of structured data. Let's say our data is about charitable organizations. I could turn it into a list/set of objects/structs:

struct Charity {
   let name: String
   let yearFounded: Int
   let volunteerCount: Int
}

I could also think about it as a table (possibly using the Python Pandas library, relational databases or other tools to work with the table):

name year founded volunteer count
Rainforest Charity 2005 841
Clean Water Charity 1845 795
Psychological Ade Charity 2009 512

What would be the advantages and disadvantages of either approach? Are there any other notable approaches beside the object oriented and relational models which would be useful in data analyses?

4 Answers 4

3

A data structure is not an object. It has nothing to do with object-orientation and you shouldn't think of it as an object either.

I know what you mean though, you can sort-of write it down with an object-like syntax. Have a class with "properties", a bean, a DTO, whatever. And to be fair, a lot of developers do exactly that. However, object-like syntax does make something object-oriented.

While a data structure is a set of data elements, an object is a set of (business-related) behaviors. They are completely different. There is a gray area of course, as with everything, but how you think about them is completely different, which I believe was your question.

Put it another way: A data structure is a passive thing with which you can do anything you like. An object is an active agent with its own responsibilities whom you collaborate with to solve a problem.

2

You're asking about a mental image, which is inherently subjective.

You're also conflating two different visualisations. A class is equivalent to a table structure (not data!), and a class instance is equivalent to table data. I think this is leading to you comparing apples and oranges here.

You're very visually inclined here about the tabular example. In practicality, when discussing a table as part of a data design, it is much, much more common to describe its structure using some logical syntax, e.g. CREATE SQL syntax:

CREATE TABLE table_name (
    name varchar(50),
    year_founded int,
    volunteer_count int
);  

Or, if still in the brainstorming stage, a SELECT statement will allow you to create an example mockup of a (future) table:

SELECT
    'My Company' as name,
    2005 as year_founded,
    123 as volunteer_count

The similarities with your class-based example are striking. Both simply list the available columns/properties and their type (albeit the SELECT leaving it up to inference more).
And this is for good reason: ORMs specifically exist as an automated mapping between a table row and an object, because from a data perspective they are equivalent structures that can easily be mapped (for common data types).

Working with objects is easier if computed properties are needed as they can easily be added as methods to the object.

The same applies to the SQL examples though.

CREATE TABLE table_name (
    name varchar(50),          -- real data
    name_length AS LEN(name)   -- computed data
);  

It still contains the same information as you'd see in a class definition.

Working with tables might we faster (in case of Python, the Pandas library can avoid creating costly Python objects and use more low-level datastructures instead.

That's IMO an overoptimization, but if you're that interested in optimizing, why not just directly use your low-level datastructures and use binary serialization to files? It will be significantly faster and not create any padded disk space.

Of course, there are a lot of usability issues with doing so, in terms of debugging or control over the data. And that's the point I'm trying to get to, because that same point applies about using low-level data structures as opposed to proper strongly typed class definitions.
It's not that it can't be done, but it comes at a significant drop in developer comfort.

Also, while I can't speak for Python specifically, in C# there is no significant size difference between an object and a collection of variables containing the same data, as an object is (in memory) merely stored as a collection of its fields. The amount of bits used for data storage is the same, the only difference is stack vs heap.

While there are performance differences here, they are completely negligible in all but the most extreme of cases, and in those extreme cases you will be forced to sacrifice a disproportionately high level of developer comfort to achieve such performance, making it only interesting if your situation really squeezes for performance at the cost of a significantly impacted development time (e.g. embedded software).

0

Use to the abstraction that's most intuitive to reason about in the domain, and use something different at the edges where the abstraction fails to scale.

In my experience, for my particular needs, classes and objects are easy to reason about, easy to work with, and do a lot to organize business rules. But, with enough data, some questions are most efficiently answered by reading the raw data into a table and doing math on the fields. Your problem domain may differ.

0

There are excellent answers here, but I'm not certain that they have captured the breadth of the original question. It seems the original question is purposefully open not to limit the answers and invite broad exposition.

For example, how to project the inheritance relations of structured data to tabular form is still a controversial topic with no one-size-fits-all answer and object-relational mapping has been called the Vietnam of Computer Science.

Let's say you have the following data structures:

struct Organization {
   let name: String
   let yearFounded: Int
}

struct Charity extends Organization {
   let volunteerCount: Int
}

These can be mapped to tabular format using either the Class Table Inheritance or Concrete Table Inheritance pattern from Martin Fowler's catalog of enterprise application patterns. One of the reasons why NoSQL databases became popular in 21st century is precisely for solving the object-relational (or object-tabular) mismatch.

All these solutions involve serious tradeoffs. You can't usually see the consequences of these tradeoffs until much later in the development cycle.

And that is just one, albeit prominent problem of choosing a data representation model. The original question asks if we can step outside of the object-tabular box altogether and if there are any other mental models besides these (I cannot think of any, but I'm sure my view is limited).

Not the answer you're looking for? Browse other questions tagged or ask your own question.