Ramblings: Should unit tests talk to a data source?
Last week I published an article on test driven development. One person who read my article (briefly) suggested I did not know that unit tests shouldn't talk to their data source. I plan on covering how to mock JDBC connections later this week, however I wanted to do some research to see if what my critic says is an industry standard or a philosophical choice.
Surprisingly there seems to be relatively little information on this subject, or I have been incapable of finding information. From what I have gathered though, the theory that data sources should be mocked is sound, but its may not always be practical to implement it. Below is two reasons why data source connections should be mocked and three reasons why they should not.
Why you should mock connections
Speed – Talking to a data source is one the primary operations that slows an application's performance. Most data sources require some sort of network connection and retrieving data is often a processor intensive task. Obviously performing these tasks take time. Unit tests are most useful when they are run automatically and frequently to ensure changes being made are not breaking the application. Unit tests that are run automatically should be somewhat speedy, by mocking a data source connection, this can reduce considerably the amount of time it takes to run unit tests.
Remove the reliance on the data source logic – Unit tests should be testing relatively small parts of code, individual methods and/or small collections of methods. By connecting to a data source you are depending upon the logic of not only how you are connecting, but how the data source operates. This goes beyond the scope of what a unit test should be testing.
Why you shouldn't mock connections
It's time consuming – Writing mocks can be a quite laborious task. I often avoid writing mocks like the plague, because even relatively simple mocks can require a considerable amount of time to get running and running correctly. Even my very simple example of running a single query against a five column table took several hours to setup. Granted this was my first time creating a JDBC mock and future mocks would be easier and quicker to write, but it would still take longer than simply connecting to my database.
It's ugly – It takes a lot of typing to get even relatively simple mocks working. A lot of the rules that apply to how real business code should be written need not be followed when writing unit tests, that said looking at the code of a unit test shouldn't make your eyes bleed. Unit tests that cannot be easily maintained often become ignored when they break and broken unit test have no value (if ignored).
It's not real – Mocks don't really care what you put into or take out of them. Real data sources are often not so forgiving. If I misspelled a column or table name in a query, a mock would not pick up on this (you can verify the sql statements run, but again that could be misspelled), where as a real database would. By actually connecting to the real data source you can be more confident that the application will perform its intended tasks.
Why you should use your head
Even within the same project there will be instances where mocking a connection to a data source is the right choice and instances when you should actually connect to a real data source. In the early stages of a project using mocks can be more practical as the structure of the data source is more abstract and subject to change. Rewriting a mock is often quicker and easier than restructuring a data source. However as a project matures and the unit tests become more complex, connecting to the real data source may be more practical as it is not only (more) well defined, but writing the mocks begins to require more time. As with all practices, you need decide which one is best to follow (or not) based upon your requirements and restrictions. Though I think it is worth noting standards are standards for a reason. Feel free to weigh in with your own thoughts on the subject.
Additional reading:
http://www.javaranch.com/journal/2003/12/UnitTestingDatabaseCode.html
http://www.buunguyen.net/blog/unit-testing-the-data-access-layer.html
Related posts:

November 5th, 2009 - 08:28
I’ve just had an idea. Having your mocks fully support various database isolation levels can occupy you for months and years to come. That’s probably one scenario when even buying an extra test database server would be better than wasting time and money on a developer trying to implement this support in a mock.
November 9th, 2009 - 13:36
Go check out the untils.org site and use it, they have a lot of guidelines and help for testing against a database. A “unit test” that hits a database is what I call an integration test. Integration tests are very useful, especially when you want to test that your database interaction is correct. So if you are developing a new database query, what you want is to test that your ORM mapping is correct and you can only do that by hitting the real database, a mock does not substitute. When you have functionality you will find it easier to mock the database away at the persistence layer. By mocking CustomerPersistence.findCustomer(int id) you can create a unit test that depends on it. In that case you are isolating just the unit you are working on which is what I call a “unit test”.