9

HSQLDB is great. It (also) has an embedded mode (no dedicated server needed), which allows for quick prototyping of stuff like Proof of Concepts, and it can also be great in production-ready applications, as a quick and simple storage of various data.

However, at least 90% of the projects I've worked on in the past years, if they deal in any way with an SQL database, they will unavoidably have unit tests that use an embedded HSQLDB.

Sure-enough, these projects (which use the standard Maven structure most of the times), have a ton of "unit tests" (or, at least, some kind of tests, but which are located in the "unit-test" area of the project (like "src/test/java")) which use one or more embedded instances of HSQLDB in order to do some CRUD operations and check the results.

My question is: is this an anti-pattern? Does the fact that HSQLDB is easy to integrate and the embedded server is very lightweight make it so that it can be overlooked as "mock-up" of an actual database, when it shouldn't be the case? Shouldn't such tests be treated more like integration-tests (since each of them is composed of "something" AND that "something"'s integration with a database server)? Or am I missing something in the "unit test" definition, and we can infact say that using HSQLDB like this simply adheres to the "mock the dependencies, and test only the unit" part of the "unit test" definition?

5
  • 3
    Isn't this just mocking applied to the Database layer? I wouldn't say it was an anti-pattern as long as it's a simple and appropriate way of getting those modules that depend on it unit-tested. Commented Feb 11, 2014 at 11:47
  • 2
    @KilianFoth yes but is it still mocking when the "mock" (the db) actually takes an active role in the testing process? For example, when you have a class with some dependency (mandatory for the instantiation of the class), if you want to unit test that class, you mock that dependency, but then you simply test the methods of the class, you don't care anymore about the mocked dependency, it doesn't have such a big role in the whole unit-testing of the class thing, as HSQLDB has in these type of unit tests... Commented Feb 11, 2014 at 11:58
  • possible duplicate of Staying OO and Testable while working with a database
    – gnat
    Commented Feb 11, 2014 at 12:00
  • 1
    @gnat well, i personally see that other question as more like "how to unit test stuff at the above-data-access layer - should we mock the data-access layer or what", whilst mine is "how to i unit test suff at the data-access layer itself, is HSQLD and the like a valid way to mock things?"... Commented Feb 11, 2014 at 12:06
  • 1
    Related: how to test the data access layer
    – user40980
    Commented Feb 11, 2014 at 13:49

6 Answers 6

8
+100

I consider using something like HSQLDB (or DBUnit) when testing my DAOs as a necessary step to guarantee quality over the code I write when touching a data layer. As already pointed out, it is not a bullet-proof solution but in combination with its dialects it can cover a reasonable part of your cases.

My only point would be: let's be careful when we talk about unit-tests. When I test a class interacting with an external component that is sort of out of my control (i.e. it can produce non-deterministic behaviour) I consider it automatically as an integration test.

Assuming that you've factored out the logic where you interact directly with your datastore (via JDBC drivers, ORM, etc.) in a concrete class hidden by an interface, a proper unit-test for your DAO would be developed by passing to it a concrete mocked implementation of the interface that returns a set of well-known values against which you will test it.

5

IMO:

  • If a test requires DB, it is not the unit test. It is either integration or acceptance test.
  • To avoid the need of using DB in a test I should follow inversion of control principle by using dependency injection (or service locator).
  • If I need to test something with DB, I should create a VM with actual database (with fake or real data, depending on the task). Vagrant and Docker are great tools for the job.
  • Mocking the DB is a bad practice. It is the same as doing it wrong (mocking DB) and then redoing it right (making it work with real DB).
  • It is OK for proof-of-concept projects, but still it is then followed by redoing (if concept is proved).
  • It is perfectly fine to use HSQLDB (or whatever) by itself if it is a proper tool for the job.
4

Use the same database for (unit-)tests as you are using in your production environment.

The reason is simple: Databases behave differently. Each database has their own proprietary functions. Even if you use an ORM layer for database access, your database can behave differently, also performance-wise.

Of course, the advantage of an embedded database for unit tests is that you do not have to configure anything. But installing a database on every developer machine and configure it to be used for unit tests is not that hard.

I had the same situation in a project some time ago. We used JDBC to run SQL-statements on the database. When we decided to write tests for our application, we used an embedded database. That lead to situations where some bugs could not be reproduced with unit tests, and some parts of the application could not be tested because the embedded database did not support the functions of our production database. So every developer installed the same database we also used in the production environment on the PC to run the tests, which worked much better.

10
  • That's a lot of Oracle instances you are suggesting installing in some environments - not cheap at all. Don't forget the build server either.
    – user40980
    Commented Feb 11, 2014 at 13:51
  • @MichaelT I know, but what is the alternative?
    – Uooo
    Commented Feb 12, 2014 at 5:57
  • 2
    One could use HSQLDB working with an oracle dialect (see sql.syntax_ora in hsqldb.org/doc/guide/dbproperties-chapt.html ) - its not perfect, but its good enough for most things. As you note its not an everything solution, but it does solve the issues of multiple licenses needed and allows for stand alone builds that don't depend on another service running. Another approach would be dbunit. There are indeed tradeoffs with each approach.
    – user40980
    Commented Feb 12, 2014 at 18:08
  • @MichaelT Oracle is free for development/testing. So there's no cost to install another Oracle instance for testing. There is of course the cost of installing more hardware to run all those instances.
    – jwenting
    Commented Feb 17, 2014 at 9:19
  • 2
    I wouldn't consider it a unit test if it was accessing a non-embedded DB
    – herman
    Commented Feb 17, 2014 at 18:27
4

I've found & fixed some really bad problems and greatly speeded up my development by testing against HSQL database. Often the lowest level of adaptor classes (i.e. your DAOs) can go untested in unit testing because it can be hard to decouple them from an actual round-trip to a live database. I have mocked Connection and JDBC artifacts in the past, but it was actually much harder than testing against an in-memory database.

On one project, I developed & tested against HSQL, ran locally against MySql, and deployed to production against SQL-Server. Amazingly, it worked just fine. The biggest integration issue I encountered was due to my own misunderstanding of the spec (seconds vs. milliseconds, lol).

While the persistence framework can mask the underlying database (like Hibernate does), the persistence framework itself can add enough complicated semantics that exercising it in isolation can shake out a lot of quirks before you get to full-blown integration testing.

Consider the case where you're using Hibernate or JPA to build a complicated result set with a lot of joins. If it's nasty and takes a lot of tries, you could develop and test that against a live database. But you could also develop and test it against an in-memory database with less overhead. You also don't have to worry about setup/teardown scripts for your database fixtures, or coordinating tests and builds with others. And if you test against the in-memory database, you enjoy the benefits of having those tests around for regression, just like other unit tests.

I've heard the term "Component Testing" for tests that are more than unit test, but still done in isolation.

Unit tests, Component Tests (if that's really their name), and Integration Tests all provide value. Teams can dispense with one or another (yes, lots of teams do no unit testing & rely solely on Integration testing), but if you do that, you're losing the benefits that the other testing tiers bring.

In the case of tests that are isolated by virtue of integrating in memory, the benefit is rapid validation of code, and then fast regression -- the benefits of isolated testing and no need for additional installations, licenses or hardware. It allows you to validate more levels of code in isolation than if you just punted on testing your DAOs until the database was provisioned.

So there's practical value. It's not required, but I've been more happy rather than less when I've done it.

3

I use approaches like that in the past, i'm not sure if this is an "anti-pattern" (some people try to elevate his personal feelings or experiences to universal rules, i'm not), but i prefer another approach:

  • For unit test, real unit test don't use external things like an in-memory database (not really lightweight if you compare to a hashMap or a stub using some stub library). If you use DI and a simple patterns like Repository or DAO for your data-access this is very easy to achieve. My objective its to have a lot of unit test that execute really quick, and with quick i'm thinking in 3k test in less than a minute.

  • For integration test, i prefer to use the real data-storage software in this test, the goal of this test its to prove the integration between my software and the real data-storage, use a different data-storage here breaks the intention of this test. I try to write the less integration test as possible to demonstrate this integration, and usually this test are executed outside of the normal build (for example only in nightly builds)

I use HSQLDB sometimes to build simple demos or proof-of concepts, its a great tool, but i don't see anymore what its the place for this tool in my normal development workflow.

0

For unit testing data access products like MyBatis, tests using embedded HSQLDB are perfectly fine in my opinion (and I know at least at MyBatis they are using HSQLDB exactly for this).

For most other projects I would consider this overkill. I would make some simple integration tests (these are tests that potentially have side effects on other tests) that access the real database, ensuring that each query/statement is used at least once. There should be far less integration tests than unit tests in the project.

For the unit tests, I would mock the DAO or whatever accesses the DB (the 'adaptor').

Not the answer you're looking for? Browse other questions tagged or ask your own question.