We have one section of our application that accounts for a lot of intense database access for a short defined period of time (5 minutes). Each user hits different records; but once they start hitting it, they hit the same records. The records are also read/write. The first hit loads everything into memory and the subsequent hits are all updates.
So we are adding in a cache. Pretty sure this will save our database from thrashing as we add users. As the read is very complex IO intensive (4 tables for the object graph with a record count across the four something like: 1 : 2 : 50 : 200 out of 10′s of millions for the last table). However, the update should be relatively easy (2 or 3 updates). The data should under most circumstance only be accessed by one user at a time and even if another use is accessing the data – it will be for reporting purposes only and not updating so being a few minutes stale is ok.
So we are going to try the simple thing and add Ehcache for this one parent object (need to verify that all of it’s collections are actually cached also once initialized). The largest question I have is about the different cache concurrency strategies. Not being a concurrency definition expert, the definition of read/write makes sense – but I can’t really find any practical examples of the downside.
If I use this strategy in my app (not clustered – single server per data instance), what problems can I run into? I”m guessing the largest one being two threads accessing the data at the same time for updating. This could be an issue for us; but we are going to test to see what the real affect is of this strategy. Hopefully there is enough of a delay between updates for an individual user that two of their updates don’t cross or are performed at the same time. Typically there will be at least 2-3 seconds between a user update – which should be plenty of time to finish the transaction.
The other thing I need to verify – I believe this is how it works: whenever the secondary cache object is updated, the actually update is flushed to the database. (BTW, this environment is using hibernate and all requests are wrapped in transactions). So whenever the request is over, the update should be flushed naturally to the database by hibernate.
So I have a quick guide, here is our minimal configuration for our first cache:
Added to our hibernate.xml
<property name="hibernate.cache.provider_class">net.sf.ehcache.hibernate.SingletonEhCacheProvider</property> <property name="hibernate.cache.provider_configuration">/ehcache.cfg.xml</property> <property name="hibernate.cache.use_second_level_cache">true</property> <property name="hibernate.cache.use_structured_entries">true</property>
Added a new echache.xml file
<ehcache> <diskStore path="java.io.tmpdir"/> <defaultCache maxElementsInMemory="10000" eternal="false" timeToIdleSeconds="300" timeToLiveSeconds="300" overflowToDisk="true" /> </ehcache>
Added to our entity that we want to cache
@Entity @Cache(usage = CacheConcurrencyStrategy.READ_WRITE) @Table(name="table_name")
We then added this to our servlet container init so that we could see cache statistics
CacheManager manager = CacheManager.getInstance(); MBeanServer mBeanServer = ManagementFactory.getPlatformMBeanServer(); ManagementService.registerMBeans(manager, mBeanServer, true, true, true, true);
Here are some useful links that I used to compile all of this:
Update: ok – it turns out things were a little more complicated than I hoped for. Basically, relationships are not cached. So I had to add a cache for each relationship (on the collection or entity depending on the relation type). Once I did that (and added in some FetchType.LAZYs), I’m down to what I expect. 1 query of everything up front and 2-3 updates for each subsequent hit after that. Next step is to install into QA and test and then production. Keeping our fingers cross that read/write strategy works for us…