2

I have a Python application using SQLAlchemy. It uses around 15 database tables.

For example, I have an image table that lists image files manipulated by the application. Most operations involving that image have side-effects both on the filesystem and on the database.

To keep this easy to test and de-couple the logic from the database's structure, I've decided to keep the ORM model separate, and to wrap it into a "manager" object when I need to make filesystem changes. Simplified, it looks like this:

import sqlalchemy
import sqlalchemy.orm
import numpy

Base = sqlalchemy.orm.declarative_base()

class ImageModel(Base):
    """Representation of an image in the database."""

    __tablename__ = "image"

    id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True)
    rows = sqlalchemy.Column(sqlalchemy.Integer)
    columns = sqlalchemy.Column(sqlalchemy.Integer)
    file_path = sqlalchemy.Column(sqlalchemy.String)
    write_count = sqlalchemy.Column(sqlalchemy.Integer, default=0)

    @classmethod
    def get_by_id(cls, db: sqlalchemy.orm.Session, search_id: int):
        """Get the image model having the given ID."""

        return db.query(ImageModel).filter(ImageModel.id == search_id).one()


class Image:
    """This class handles manipulation of an image on the filesystem."""

    def __init__(self, image_model: ImageModel):
        self.model = image_model

    def read_window(self, col_off: int, row_off: int, width: int, height: int):
        """Read part of the image."""

        # TODO read the image here

    def write_window(self, data: numpy.ndarray, col_off: int, row_off: int, width: int, height: int):
        """Write part of the image, and register that write in the database."""

        # TODO write to the image here
        self.model.write_count += 1

Here I count the write operations, in the real application it's doing more complicated stuff with the database model. Operations on other objects and tables involve other side-effects, like network requests.

Some things bother me in this design:

  1. Modules that use images will need to manipulate two classes, depending on what they want to do. If they want the image's properties, they need an ImageModel, if they want to write to the image, they need an Image. It would make more sense to expose only one Image object.
  2. I don't know where to implement more complicated queries, like "find all images with more than 3 writes, and whose path is listed in other_database_table":
    • If I put it in a method of ImageModel, then users will have to interact with the ImageModel class, which goes back to problem 1.
    • If I put it in Image, then Image will be dependent on SQLAlchemy and on the database's structure, and the database won't be abstracted, which was the goal. The ImageModel.get_by_id method has the same problem, by the way.
  3. There are lots of complicated queries (as in point 2.) that are very specific to one part of the application. I think that they have no place in the Image or ImageModel classes, which should stay general. But they need to be database queries for performance reasons: if I just called a generic ImageModel.get_all_images method and processed that myself, it would fetch way more data than needed, and I wouldn't benefit from the RDBMS' optimizations.

Ideally, I would like to only expose one Image class, which would abstract away all the filesystem and database operations. But it should allow me to easily change the database structure, and to test one aspect of the object separately (say, the filesystem write, without having to mock the database).

How can I do that while avoiding the problems listed above?

1

1 Answer 1

2

When trying to solve a problem like this, it is important to know the constraints that your current design has. SQLAlchemy seems to be an implementation of the Active Record Pattern, which combines business operations on data with data access. This pattern implies a few challenges, namely it mixes business logic with data access logic.

This is a complex enough problem that you might need to approach this in phases. The most important OOP principal here is Separation of Concerns. Business logic is one concern. Data access logic is a different concern. These should first be separated. This does not mean abandoning SQLAlchemy. It just means that you should start out by separating the CRUD operations of the database into a separate class or group of classes that specialize in this. You might find that these methods do little more than delegate to the SQLAlchemy API. This is ok, because you are pulling query logic out of your business classes. Query logic, as you've discovered, is frequently non-trivial.

You might find this separation is enough to tame the logic in these classes, but will not eliminate the need for objects to collaborate. Proper OOP design does not mean objects must represent things in the real world. They can represent more abstract concepts. Collaboration is a valid OOP concept — some would even argue collaboration is a key concept if you think of calling methods on an object as passing messages from object to another. Consider building classes specializing in certain kinds of image manipulations.

Since you are using python, you can utilize functional and procedural programming techniques as well. An object that does not need to maintain state might not need to be an object. It might just need to be a standalone function. A chain of manipulations can be created using functional composition in place of traditional polymorphic object oriented behavior.

Not the answer you're looking for? Browse other questions tagged or ask your own question.