20

I am writing a Java web application that consists mainly of a bunch of similar pages in which every page has several tables and a filter that applies to those tables. The data on these tables comes from an SQL database.

I am using myBatis as ORM, which may not be the best choice in my case, since the database is poorly designed and mybatis is a more database oriented tool.

I am finding that I am writing a lot of duplicate code because, due to the database's poor design, I have to write different queries for similar things as those queries can be very different. That is, I cannot easily parametricise the queries. This propagates into my code and instead of populating rows on columns in my table with a simple loop I have code like:

get A Data (p1, ..., pi);

get B Data (p1, ..., pi);

get C Data (p1, ..., pi);

get D Data (p1, ..., pi); ...

And this soon explodes when we have different tables with different columns.

It also adds to the complexity the fact that I'm using "wicket", which is, in effect a mapping of objects to html elements in the page. So my Java code becomes an adapter between the database and the front end, which has me creating a lot of wiring, boilerplate code with some logic intermingled in it.

Would the correct solution be wrapping the ORM mappers with an extralayer that presents a more homogeneous interface to the db or is there a better way to deal with this spaghetti code I'm writing?

EDIT: More info about the database

The database holds mainly phone calls information. The poor design consists of:

Tables with an artificial ID as primary key that has nothing to do with the domain knowledge.

No unique, triggers, checks or foreign keys whatsoever.

Fields with a generic name that match different concepts for different records.

Records that can be categorised only by crossing with other tables with different conditions.

Columns which should be numbers or dates stored as strings.

To sum it up, a messy / lazy design all around.

10
  • 8
    Is correcting the database design an option?
    – RMalke
    Commented Jul 30, 2013 at 14:10
  • 1
    Please explain how is the database poorly designed. Commented Jul 30, 2013 at 14:13
  • @Renan Malke Stigliani Unfortunately not, as there is legacy software that depends on it, however I have mirrored some of the tables with a slightly different design and populate them, which simplifies the code. However I'm not proud of this and I'd rather not duplicate tables indiscriminately
    – DPM
    Commented Jul 30, 2013 at 14:15
  • 1
    This book might give you som eideas of how you can start to fix the datbase problesm and keep the legacy code working: amazon.com/…
    – HLGEM
    Commented Jul 30, 2013 at 14:32
  • 5
    Most of the problems you list . . . aren't. The use of surrogate keys rather than natural keys is actually a pretty standard recommendation nowadays; not "poor design" at all. The lack of constraints and the use of inappropriate column-types is a better example as far as "poor design" goes, but it shouldn't actually affect your application code at all (unless you plan to abuse these problems?).
    – ruakh
    Commented Jul 30, 2013 at 14:37

4 Answers 4

55
+50

Object orientation is valuable specifically because these types of scenarios arise, and it gives you tools to reasonably design abstractions that allow you to encapsulate complexity.

The real question here is, where do you encapsulate that complexity?

So let me step back a moment and speak to what 'complexity' I'm referring to here. Your problem (as I understand it; correct me if I'm wrong) is a persistence model which is not an effectively usable model for the tasks you need to complete with the data. It may be effective and usable for other tasks, but not for your tasks.

So what do we do when we have data that does not present a good model for our means?

Translate. It's the only thing you can do. That translation is the 'complexity' I refer to above. So now that we accept we're going to translate the model, we need to decide on a couple of factors.

Do we need to translate both directions? Are both directions going to be translated the same, as in:

(Tbl A, Tbl B) -> Obj X (read)

Obj X -> (Tbl A, Tbl B) (write)

or do insertion/update/delete activities represent a different type of object such that you read data as Obj X, but data is inserted/updated from Obj Y? Which of these two ways you wish to go, or if no update/insert/delete is possible are important factors in where you want to put the translation.


Where do you translate?

Back to the first statement I made in this answer; OO allows you to encapsulate complexity and what I refer to here is the fact that not only should you, but you must encapsulate that complexity if you wish to ensure it doesn't leak out and seep into all of your code. At the same time, it's important to recognize you can't have a perfect abstraction so worry less about that than about having a very effective and usable one.

Again now; your problem is: Where do you put this complexity? Well you have choices.

You can do it in the database using stored procedures. This has the drawback of often not playing very well with ORMs but that's not always true. Stored procedures afford some benefits, including performance often. Stored procedures however can require a lot of maintenance, but it's up to you to analyze your particular scenario and say if the maintenance will be more or less than other choices. I personally am very skilled with stored procedures, and as such this fact of available talent reduces overhead; never underestimate the value of making decisions based on what you do know. Sometimes the suboptimal solution can be more optimal than the correct solution because you or your team know how to create and maintain it better than the optimal solution.

Another in-database option is views. Depending on your database server these may be highly optimal or sub-optimal or not even effective at all, one of the drawbacks can be query times depending on what indexing options are available in your database. Views become an even better choice if you never need to make any data modification (insert/update/delete).

Stepping past the database you have the old standby of using the repository pattern. This is a time-tested approach which can be very effective. Drawbacks tend to include boiler plate but well-factored repositories can avoid some amount of this, and even when these do result in unfortunate amounts of boiler plate, repository's tend to be simple code that's easy to understand and maintain as well as presenting a good API/abstraction. Also repositories can be good for their unit-testability which you lose with in-database options.

There are tools like auto-mapper out there that may make using an ORM plausible where they can do the translation between database-model from orm to usable models, but some of these tools can be tricky to maintain/understand behaving more like magic; though they create a minimum of overhead code resulting in less maintenance overhead when well understood.

Next you're stepping further and further from the database, which means there's going to be greater amounts of code that are going to deal with the un-translated persistence model, which is going to be genuinely unpleasant. In these scenarios you talk about putting the translation layer in your UI which it sounds like you may be doing now. This is generally a very bad idea, and decays terribly over time.


Now let's start talking crazy.

The Object is not the only end-all be-all abstraction that exists. There have been a profundity of abstractions developed over the many years that computer science has been studied and even before then from the study of math. If we're going to start getting creative, let's start talking about known abstractions available that have been studied.

There's the actor model. This is an interesting approach because it says all you do is send messages to other code which effectively delegates all the work to that other code, which is very effective in encapsulating the complexity away from all your code. This could work in so far as you send a message to an actor saying "I need Obj X sent to Y" and you have a receptacle waiting for a response at location Y which then processes Obj X. You could even send a message that instructs "I need Obj X and computation Y, Z done to it" and then you don't even need to wait; the translation occurs on the other side of that message pass and you can just move on if you don't need a read of it's result. This can be slight abuse of the actor model for your purposes, but it all depends; the main goal of Actor Model is asynchrony and concurrency to be handled easily and well, both of those things are however just abstractions which can act as boundaries to encapsulate the complexity we refer to here (or any form of complexity for that matter).

Another encapsulation boundary is process boundaries. These can be used for segregating complexity very effectively. You could create the translation code as a web service where the communication is simple HTTP, using SOAP, REST, or if you really want your own protocol (not suggested). STOMP isn't altogether a bad newer protocol. Or use a normal daemon service with a system-local publicized memory pipe for communicating very quickly again using whichever protocol you choose. This actually has some pretty good benefits:

  • You can have multiple processes running that do translation for older and newer version support at the same time allowing you to update the translation service to publicize an object model V2, and then separately at a later point update the consuming code to work with the new object model.
  • You can do interesting things like pinning the process to a core for performance, you also get an amount of security safety in this approach by making that the only process running with the security privileges to touch that data.
  • You will get a very strong boundary when you're talking about process boundaries that will stay fixed ensuring minimum leakage of your abstraction for a long time because writing code in the translation space will not be able to be called outside of the translation space since they won't share process scope, ensuring a fixed set of usage scenarios by contract.
  • Ability for asynchronous/non-blocking updates being simpler.

Drawbacks are obviously more maintenance than is commonly necessary, communication overhead affecting performance and the maintenance.


There is a great variety of ways to encapsulate complexity that may allow that complexity to be placed in ever more strange and curious places in your system. Using forms of higher order functions (often times faked using strategy pattern or various other odd forms of object patterns), you can do some very interesting things.

That's right, let's start talking about a monad. You could create this translation layer in a very independent fashion of small specific functions that do the independent translations necessary, but hide all of those translation functions away not visible so they are hardly accessible to outside code. This has the benefit of reducing reliance on them allowing them to change easily without effecting much external code. You then create a class that accepts higher order functions (anonymous functions, lambda functions, strategy objects, however you need to structure them) which work on any of the nice OO model type objects. You then let the underlying code which accepts those functions do the literal execution using the appropriate translation methods.

This creates a boundary where all the translation not only exists on the other side of the boundary away from all your code; it is only used on that side allowing the rest of your code to not even know anything about it other than where the entry point for that boundary is.

Ok, yeah that really is talking crazy, but who knows; you might just be that crazy (seriously, do not undertake monads with a craziness rating below 88%, there is real risk of bodily injury).

3
  • 4
    Wow, what an extraordinarily comprehensive answer. I'd upvote this more than once if only SE would let me. Commented Jul 30, 2013 at 19:37
  • 11
    When is the movie version coming out?
    – yannis
    Commented Jul 30, 2013 at 21:10
  • 3
    @JimmyHoffa Bravo sir!!! I'm going to bookmark this answer and show my daughter when she gets older.
    – Tombatron
    Commented Aug 7, 2013 at 15:34
4

My suggestion:

Create database views that:

  1. Give meaningful names to columns
  2. Do the "crossing with other tables with different conditions" so you can have that complexity hidden away.
  3. Convert numbers or dates stored as strings to numbers and dates respectively.
  4. Create uniqueness where there is none, according to some criteria.

The idea is create a façade that emulates a better design on top of the bad one.

Then make the ORM relate to that façade instead of the real tables.

This doesn't simplify insertions though.

1
  • Using database views looks like a great idea and the most elegant course of actions abstracting away the ugliness at the lowest level, for some reason I hadn't considered it. Thank you.
    – DPM
    Commented Jul 30, 2013 at 14:45
3

I can see how your existing database schema causes you to write more specific code and queries for tasks which might otherwise be abstracted with a better-designed schema, but it shouldn't hamper your ability to write good object-oriented code.

  • Remember the SOLID principles.
  • Write code that can easily unit-tested (which often comes through following SOLID principles).
  • Keep your business logic separate from your display logic.
  • Read the Apache Wicket documentation and examples - this framework can probably save you more boilerplate code than you think, so learn how to use it effectively.
  • Keep the logic that has to deal with the database in a separate layer that provides a clean interface that your business logic can work with. This way, if you (or a future maintainer) ever get a chance to improve the schema, they can do so without too many changes to the business logic.

When you find yourself working with a database schema that isn't perfect, it's easy to gripe about all the ways that it makes your job harder, but at some point you have to set aside those complaints and make the best of it.

Think of it as an opportunity to use your creativity to write clean, reusable, easily maintainable code in spite of the imperfect schema.

2

Answering your initial question about better object-oriented code, I'd suggest using SQL-speaking objects. ORM intrinsically goes against object-oriented principles, since it operates upon an object, and object in OOP is a self-sufficient entity, who has all resources for accomplishing its goal. I'm sure this approach could make your code simpler.

Talking about problem space, i.e., your domain, I'd try to identify aggregate roots. These are consistency boundaries of your domain. Boundaries that absolutely must hold its consistency at all times. Aggregates communicate via domain events. If you have a system big enough, probably you should start splitting it on subsystems (call it SOA, Microservice, Self-contained systems, etc)

I'd also consider using CQRS -- it can greatly simplify both your write and read side. Make sure to read Udi Dahan's article about this topic.

Not the answer you're looking for? Browse other questions tagged or ask your own question.