Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Resolving Deadlock in HBase

Save for later
  • 240 min read
  • 2016-10-20 00:00:00

article-image

In this post, I will walk you through how to resolve a tricky deadlock scenario when using HBase. To get a better idea of the details for this scenario, take a look at the following JIRA. This tricky scenario relates to HBASE-13651, which tried to handle the case where one region server removes the compacted hfiles, leading to FileNotFoundExceptions on another machine.

Unlike the deadlocks that I have resolved in the past, this deadlock rarely happens, but it occurs when one thread tries to obtain a write lock, while the other thread holds a read lock of the same ReentrantReadWriteLock.

Understanding the Scenario

To fully understand this scenario, let's go ahead and take a look at a concrete example. 

For handler 12, HRegion#refreshStoreFiles() obtains a lock on writestate (line 4919).And then it tries to get the write lock of updatesLock (a ReentrantReadWriteLock) in dropMemstoreContentsForSeqId():

"B.defaultRpcServer.handler=12,queue=0,port=16020" daemon prio=10 tid=0x00007f205cf8d000nid=0x8f0b waiting on condition [0x00007f203ea85000]    java.lang.Thread.State: WAITING (parking) 	at sun.misc.Unsafe.park(Native Method) - parking to wait for<0x00000006708113c8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.HRegion.dropMemstoreContentsForSeqId(HRegion.java:4568) at org.apache.hadoop.hbase.regionserver.HRegion.refreshStoreFiles(HRegion.java:4919) - locked <0x00000006707c3500> (a org.apache.hadoop.hbase.regionserver.HRegion$WriteState) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6104) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5736) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5875)

For handler 24, HRegion$RegionScannerImpl.next() gets a read lock, and tries to obtain a lock on writestate in handleFileNotFound():

"B.defaultRpcServer.handler=24,queue=0,port=16020" daemon prio=10 tid=0x00007f205cfa6000nid=0x8f17 waiting for monitor entry [0x00007f203de79000]   java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.regionserver.HRegion.refreshStoreFiles(HRegion.java:4887) - waiting to lock <0x00000006707c3500> (a org.apache.hadoop.hbase.regionserver.HRegion$WriteState) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6104) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5736) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5875) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5653) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5630) - locked <0x00000007130162c8> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5616) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6810) at org.apache.hadoop.hbase.regionserver.HRegion.getIncrementCurrentValue(HRegion.java:7673) at org.apache.hadoop.hbase.regionserver.HRegion.applyIncrementsToColumnFamily(HRegion.java:7583) at org.apache.hadoop.hbase.regionserver.HRegion.doIncrement(HRegion.java:7480) at org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:7440)

As you can see, these two handlers get into a deadlock.

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime

Fixing the Deadlock

So, how can you fix this? The fix breaks the deadlock (on hander 12) by remembering the tuples to be passed to dropMemstoreContentsForSeqId(), releasing the read lock and then calling dropMemstoreContentsForSeqId().

Mingmin, the reporter of the bug, kindly added a patched JAR onto his production cluster so that the deadlock of this form no longer exists. Take a look at this.

I hope my experiences encountering this tricky situation will be of some help to you in the event that you see such a scenario in the future.

About the author

Ted Yu is astaff engineer at HortonWorks. He has also been an HBase committer/PMC for 5years. His work on HBase covers various components: security, backup/restore, load balancer, MOB, and so on. He has provided support for customers at eBay, Micron, PayPal, and JPMC. He is also a Spark contributor.

Modal Close icon
Modal Close icon