a better mousetrap #4: integrating on top of CouchDB
I’ve recently been writing about Apache CouchDB and its various features of interest in our environment, and I will continue doing so as, after working with this platform, I came across a bunch of thoughts I quickly felt like pinning down, either in order to remember them, or in order to eventually have some discussion on that topic as I still consider myself learner as far as both CouchDB and architecture on top of CouchDB is concerned.
Pretty straightforward: Application integration has been one of my special interest for the last couple of years, more so as in our environment we still struggle how to, at best, integrate an existing “legacy” platform with tools which are more up to date. I tend to write “legacy” with quotes, as, in the end, this system ain’t really “legacy”. After taking a closer and more mature look, one eventually sees the main problem in this system to be us using it for something it never was supposed to be used for. The system, a highly specialized document management platform, surely has its sales points and does rather well when used as a highly customizable document and workflow management facility. Given its customizing abilities (using a proprietary language only used in this system and obviously not really maintained anymore, assuming it still lacks features like unit testing or XML schema support), it can easily be extended to serve as sort of an “application server” despite it never was supposed to be one. Ultimately, you end up with a platform that has grown way too much and in a manner way too uncontrolled, yet you lack any refactoring tooling or integration interfaces to cleanly rebuild and remodel your structure without rendering most of the functionality unusable.
Bottom line: In such an environment, you end up looking for integration and communication facilities which are low level enough to even work with the limited technical possibilities at hand. Given the old platform also is a distributed environment built on top of a proprietary protocol using TCP/IP, HTTP and REST comes to mind, as these approaches seem rather common, backed by usable toolings and libraries in most development platforms while also being sufficiently easy to handle in situations in which you at the very least are left with writing client library code of your own (given I definitely do not feel the need to implement a fully-fledged SOAP stack in some arcane language that doesn’t even do XML). We did so, and despite our integration still has a bunch of flaws not fixed yet, this approach so far seems the best we tried. And, all along with technical results, I have gained a rather strong opinion as far as “technology agnostic” tooling is concerned, or at least tooling easily accessible from within most of the platforms, and best of all on different levels of abstraction.
Data integration
And this is where CouchDB comes into play. Given its very nature, CouchDB has proven to be integrable in a rather flexible way, depending upon your current demands and needs:
- Knowing there is a bunch of Java client libraries of different maturity and quality, integrating CouchDB with a fully fledged Java EE stack application also is a breeze. From this point, CouchDB can be used as a straightforward persistency layer to back your EJB components and be transparently accessible to your native Java clients without even having them know it’s CouchDB working all below the stack.
- If you either don’t feel like making your project depend upon one of the pre-build client libraries in Java (or one of the other languages for which clients are available), you still can use some HTTP client class, and be that the JDK HTTP client, to mess with data objects living inside CouchDB. This allows for writing code with little external dependencies and quite a light framework environment, given you don’t need more sophisticated functionality like provided by current JPA implementations.
- While having this choice is nice on platforms supported with dedicated client libraries, the real value of this choice lies in being able to also enable data access from platforms which aren’t able to work with any other technologies without too much ado, or which don’t support your current environment, or whatever. Making the SQL layer in your proprietary system work with an unsupported RDBMS data source definitely is way more complicated than building a bare-bone HTTP client when there is basic TCP support available in your environment.
- Same way, of course, being Unix/Linux administrator, accessing data within these stores is possible in example using curl from within Unix shell scripts, making ad hoc data manipulation, extraction, processing with Unix not necessarily a breeze yet somewhat easier than using different systems.
Following this path wasn’t all too difficult, yet in the end one very understanding remained: It’s the same data. It’s not just an approach to have some kind of “central” platform accessible through different technologies, it is also an easy way to provide a modestly formal common understanding of how data should look and be used like – your Unix shell scripts make use (read, write) of the same data as your EJB client applications do, and vice versa. It is (though your mileage might vary here) more convenient and straightforward than using in example your favorite RDBMS command line client, and in some situations (talking about things like caching or transactions in EJB environments) it might be thoroughly dangerous, but at least it is possible, after all. This is more than you might get in other environments.
Messaging integration
Another neat integration “approach” about CouchDB I just recently discovered to its full extent is working with change notifications. The idea is pretty simple: Make use of the _changes
API in order to figure out what documents in your database changed, were added and the like. Libraries like the Python couchdbkit provide built-in functionality to wait for and process any changes inside a CouchDB database. But in the end, as mentioned above, it doesn’t really need these clients, as _changes
is accessible the same way any other resource in CouchDB is – via plain HTTP (GET). In the end, you end up building a bit of message oriented middleware for asynchronous communication on top of CouchDB, with all the advantages (and drawbacks, of course) arising from that. And you can easily attach virtually every client which can be automated in some way, you can make use of stock Unix tools to send out mails automatically, use some perl wrappers around TeX to create invoice documents from order events in your system. You can (talking Java EE again) attach this to JMS or any other existing messaging solution, and of course add all the bells and whistles of current web and rich client user interface frameworks. Of course it’s a pretty bare-bone MoM approach as in the end it’s up to you to deal with quite some things other MoM frameworks provide out of the box (like adapter concepts, routing, point-to-point- vs. publish/subscribe – communication), but, again as outlined above, it is possible, after all, limiting the technical obstacles to deal with in order to build a solution in an existing brownfield environment.
And now?
Nothing much else, I guess. I learnt, way beyond “just” providing lightweight (“agile?”) persistency in prototype applications or backing dynamic web projects, CouchDB offers some interesting features to be used for application integration as well. And, obviously, it is rather good and easy at doing so, by getting some fundamental obstacles out of the way yet leaving you with a lot of freedom to build your application in a sane and reasonably clean manner. Next thing to try out will be figuring out how to enhance these features by things CouchDB traditionally is good at – clustering and replication, just to mention one. Let’s see.