Tuesday, January 22, 2008

memcached, the best choice for distributed hibernate second cache

After painful evaluation of the terracotta, finally I decided to drop it.

The terracotta concept is good and attracting. But when I am testing, it causes us a lot of trouble for class loading.

After many workarounds, terracotta server is up and running. But the cpu is pretty high under load and GC time killed the application since responding time is the key for the web application.

Finally, I decide to write the memcached provider for the hibernate, use use memcached as the second level cache for hibernate. The client API I used is http://www.whalin.com/memcached/. I have not optimized the code yet, but the cpu and memory usage is much lower than terracotta server.

I also eveluate other java cache solutions, all of them have their advantages. But still none of them can bypass the limitation of heap size and will cause big GC time.

Compared to those java cache solution, memcached's get/set operation is slower, but comparing to the database operation, the difference of the get/set operations between memcached and ehcache can be ignored.

If there are more and more data to cache on the road, hibernate + memcached as second level cache is the way to go.

I am also doing research on the way to use ehcache as local and short term cache and use the memcached as the long term and distributed cache.

I will post more information about my memcached cache provider for hibernate.

6 comments:

Anonymous said...

Hank,

I don't want to come across as arguing or trying to dictate what you should be doing with your own app. If you found Terracotta hard to use, I would love to touch base and get detailed feedback.

As for memcached, it always starts out attractive because its non-native Java implementation means that its cache works around GC tuning quite easily. But, it has its trade offs:
1. memcache partitions are quite static (unless you change it or customize it). This can be a big problem as your site grows
2. memcache partitions can lose data. If you restart a memcache server, the data is lost
3. Memcache is not coherent with your app. This is why it is not usually used with Hibernate. Hibernate implies to me that you have a DB. And having your cache drift over time from your DB is not a good thing, IMHO.

Now, Terracotta does not have the 3 problems memcache does, but it does have the GC tuning challenge you came across. I wish we could be completely easy to tune but that stuff is coming / getting better over the next few months.

Again, not arguing...just want to make sure the product gets your input. Let me know if you are interested in helping. ari AT terracotta DOT org.

--Ari Zilka
CTO, Terracotta Inc.

Hank Li said...

I agreed the comments about the memcached. Here is my understanding of the issue.

1. memcache partitions are quite static (unless you change it or customize it). This can be a big problem as your site grows.

A: The simple partition works good for our current requirement. We are using the cache as the cache, not another fancy in memory DB etc.

2. memcache partitions can lose data. If you restart a memcache server, the data is lost.

A: correct. but the cache is just for caching, database is the place for storage.

3. Memcache is not coherent with your app. This is why it is not usually used with Hibernate. Hibernate implies to me that you have a DB. And having your cache drift over time from your DB is not a good thing, IMHO.

A: not really, I will put more details about this late.

Anonymous said...

Hank


"I also evaluate other java cache solutions, all of them have their advantages. But still none of them can bypass the limitation of heap size and will cause big GC time."


Have you considered GigaSpaces 2nd level cache?

You should note that hibernate 2nd level cache is not as trivial as it may sound - it assumes certain locking and behavior so i'm wondering how far you had tested your own memcache implementation.

Based on what you describe i believe that our free community edition + 2nd level cache should be sufficient enough for your needs. So if you did give a try i'll be interested to know what was the results of it, if you didn't i think its worth looking at.


Nati S.
GigaSpaces

Anonymous said...

In addition to the comments above:
Memcache is just a place to put and get chunks of memory. It offers no policy whatsoever about where the data is stored, and whether you actually get the data that you put in a group of servers. This is all responsibility of the calling client.

For example, if a memcache server in a group goes down and restarted after a few minutes, depending on the client logic, you may get older copies of the data you put for a certain key. This means that if you put multiple values for the same keys, you may not get the last one (since the values are put in multiple servers, there are multiple data entries for that key). This puts all the logic of detecting the validity of the data on the client. From reading their java client code it does nothing in that area.

So if you want to use mamcache with more than one server, you are into some heavy coding and algorithmic thinking. Some people have done some, but as far as I can tell not for the type of accuracy required for OR mapper.

There is an inherent assumption in the use case for memcahe, that in a very large web application certain level of data inaccuracy is tolerable. OR mappers like hibernate have a different notion.

Anonymous said...

But the key question:

Did you really get the benefit of Caching ? Or was going to the DB good enough...anyway.

Hank Li said...

First, the memcached is used in the production environment and it works great.


Hibernate itself has provided good logic to store/expire the data, so there is no need to worry about that part. For multiple memcached servers, the key logic can make sure one key always goes to the same server, even if the server is down. The client library has provide support for it.

We have more and more applications using it to handle core logic. such as using it to replace large hashmap. (10+ million records in the hashmap :-) )

One disadvantage is that the size of each stored object has to be less than 1M. For one object, that is fine. When I try to cached the search results from the luncene, that would be an issue since the search result may pass the limit.