Friday, November 11, 2011

Fewer deadlocks, higher throughput

Here's the problem: first transaction (T1) writes to key a and b in this order. Second transaction (T2) writes to key b and a - again order is relevant. Now with some "right timing" T1 manages to acquire lock on a and T2 acquires lock on b. And then they wait one for the other to release locks so that they can progress. This is what is called a deadlock and is really bad for your system throughput - but I won't insist on this aspect, as I've mentioned it a lot in my previous posts.

What I want to talk about though is a way to solve this problem. Quit a simple way - just force an order on your transaction writes and you're guaranteed not to deadlock: if both T1 and T2 write to a then b (lexicographical order) there won't be any deadlock. Ever.
But there's a catch. It's not always possible to define this order, simply because you can't or because you don't know all your keys at the very beginning of the transaction.

Now here's the good news: Infinispan orders the keys touched in a transaction for you. And it even defines an order so that you won't have to do that. Actually you don't have to anything, not even enable this feature, as it is already enabled for you by default.

Does it sound too good to be true? That's because it's only partially true. That is lock reordering only works if you're using optimistic locking. For pessimistic locking you still have to do it the old way - order your locks (that's of course if you can).

Wanna know more about it? Read this.

Stay tunned!
Mircea

Wednesday, November 9, 2011

Single lock owner: an important step forward for Infinispan's transactions

The single lock owner is a highly requested Infinispan improvement. The basic idea behind it is that, when writing to a key, locks are no longer acquired on all the nodes that own that key, but only on a single designated node (named "main owner").

How does it help me?


Short version: if you use transactions that concurrently write to the same keys, this improvement significantly increases your system' throughput.


Long version: If you're using Infinispan with transactions that modify the same key(s) concurrently then you can easily end up in a deadlock. A deadlock can also occur if two transaction modify the same key at the same time - which is both inefficient and counter-intuitive. Such a deadlock means that at one transaction(or both) eventually rollback but also the lock on the key is held for the duration of a lockAquistionTimout config option (defaults to 10 seconds). These deadlocks reduces the throughput significantly as transactions threads are held inactive during deadlock time. On top of that, other transactions that want to operate on that key are also delayed, potentially resulting in a cascade effect.

What's the added performance penalty?


The only encountered performance penalty is during cluster topology changes. At that point the cluster needs to perform some additional computation (no RPC involved) to fail-over the acquired locks from previous to new owners.
Another noticeable aspect is that locks are now being released asynchronously, after the transaction commits. This doesn't add any burden to the transaction duration, but it means that locks are being held slightly longer. That's not something to be concerned about if you're not using transactions that compete for same locks though.
We plan to benchmark this feature using Radargun benchmark tool - we'll report back!

Want to know more?


You can read the single lock owner design wiki or/and follow the JIRA JIRA discussions.

Transaction remake in Infinispan 5.1

If you ever used Infinispan in a transactional way you might be very interested in this article as it describes some very significant improvements in version 5.1 "Brahma" (released with 5.1.Beta1):
  • starting with this release an Infinispan cache can accessed either transactionally or non-transactionally. The mixed access mode is no longer supported (backward compatibility still maintained, see below). There are several reasons for going this path, but one of them most important result of this decision is a cleaner semantic on how concurrency is managed between multiple requestors for the same cache entry.

  • starting with 5.1 the supported transaction models are optimistic and pessimistic. Optimistic model is an improvement over the existing default transaction model by completely deferring lock acquisition to transaction prepare time. That reduces lock acquisition duration and increases throughput; also avoids deadlocks. With pessimistic model, cluster wide-locks are being acquired on each write and only being released after the transaction completed (see below).


Transactional or non transactional cache?


It's up to you as an user to decide weather you want to define a cache as transactional or not. By default, infinispan caches are non transactional. A cache can be made transactional by changing the transactionMode attribute:

transactionMode can only take two values: TRANSACTIONAL and NON_TRANSACTIONAL. Same thing can be also achieved programatically:

Important:for transactional caches it is required to configure a TransactionManagerLookup.

Backward compatibility


The autoCommit attribute was added in order to assure backward compatibility. If a cache is transactional and autoCommit is enabled (defaults to true) then any call that is performed outside of a transaction's scope is transparently wrapped within a transaction. In other words Infinispan adds the logic for starting a transaction before the call and committing it after the call.

So if your code accesses a cache both transactionally and non-transactionally, all you have to do when migrating to Infinispan 5.1 is mark the cache as transactional and enable autoCommit (that's actually enabled by default, so just don't disable it :)

The autoCommit feature can be managed through configuration:

or programatically:


Optimistic Transactions


With optimistic transactions locks are being acquired at transaction prepare time and are only being held up to the point the transaction commits (or rollbacks). This is different from the 5.0 default locking model where local locks are being acquire on writes and cluster locks are being acquired during prepare time.

Optimistic transactions can be enabled in the configuration file:

or programatically:

By default, a transactional cache is optimistic.

Pessimistic Transactions


From a lock acquisition perspective, pessimistic transactions obtain locks on keys at the time the key is written. E.g.

When cache.put(k1,v1) returns k1 is locked and no other transaction running anywhere in the cluster can write to it. Reading k1 is still possible. The lock on k1 is released when the transaction completes (commits or rollbacks).

Pessimistic transactions can be enabled in the configuration file:

or programatically:


What do I need - pessimistic or optimistic transactions?


From a use case perspective, optimistic transactions should be used when there's not a lot of contention between multiple transactions running at the same time. That is because the optimistic transactions rollback if data has changed between the time it was read and the time it was committed (writeSkewCheck).

On the other hand, pessimistic transactions might be a better fit when there is high contention on the keys and transaction rollbacks are less desirable. Pessimistic transactions are more costly by their nature: each write operation potentially involves a RPC for lock acquisition.

The path ahead


This major transaction rework has opened the way for several other transaction related improvements:

  • Single node locking model is a major step forward in avoiding deadlocks and increasing throughput by only acquiring locks on a single node in the cluster, disregarding the number of redundant copies (numOwners) on which data is replicated

  • Lock acquisition reordering is a deadlock avoidance technique that will be used for optimistic transactions

  • Incremental locking is another technique for minimising deadlocks.




Stay tuned!
Mircea

Friday, June 24, 2011

Me @ jazoon

I've just returned from Jazoon where I spoke about in-memory data grids/Infinispan and how can they be used to complement or even replace databases. There was a good and enthusiastic crowd, and the discussions ended late in the night - of course cooled down by cold swiss beer :)
Infinispan was also present in Hardy Ferentschik presentation about Hibernate OGM: the bran new JBoss project which exposes the grid ( to be read Infinispan) through Hibernate's API.

Thank to Jazoon organizers for an excellent conference and the chance to meet other enthusiasts from all over the world!

Cheers,
Mircea

Wednesday, February 2, 2011

Consistent hashing and performance

Most of today's in-memory data grids rely on Consistent hashing for achieving scalability.
From a performance perspective it is generally preferable to have CH functions that spread the data evenly between the nodes in the grid.
Uneven data distribution causes increased stress on the more heavily loaded nodes. That slows down the cluster and also increases the risk of that node crashing (e.g. OOM).
In order to measure the performance fault caused by uneven distribution I enhanced Radargun with with an "ideal" consistent hash that guarantees an equal number of keys per node. Then I benchmarked Infinispan using Radargun. The benchmark was run twice, once using Infinispan's built in CH and then using the "ideal" CH.
Following graph shows the result of these runs (cluster size on X and throughput on Y):

The actual number of keys per node is:
Configuration : dist-sync.xml
Cluster size: 3 -> ( 2175 494 331)
Cluster size: 5 -> ( 500 660 1855 585 1400)
Cluster size: 7 -> ( 1775 487 37 239 1396 457 2609)
Cluster size: 9 -> ( 1548 3 545 2854 201 2025 7 1609 208)

Configuration : idealdistribution/dist-sync-ideal-distribution.xml
Cluster size: 3 -> ( 1000 1000 1000)
Cluster size: 5 -> ( 1000 1000 1000 1000 1000)
Cluster size: 7 -> ( 1000 1000 1000 1000 1000 1000 1000)
Cluster size: 9 -> ( 1000 1000 1000 1000 1000 1000 1000 1000 1000)


Observations:
  1. For a 3-nodes cluster uneven distribution over performed. This is explained by the fact that one of the node holds most of the data, so it needed to do very few RPC calls to other cluster members, drastically increasing the average performance.
  2. For clusters made out of 5, 7 and 9 nodes the "ideal" distribution is 5-20% better
  3. The discrepancy tends to increase together with the cluster size.
Would be interesting to see the difference on a 30+ nodes machine!

More...
- vendor-specific approaches to data partitioning: Infinispan, Coherence, Gigaspaces.
- project Radargun