Skip navigation


2 posts

Sometimes we use caches for speeding up by alleviating database load.
And the Memcached is the bestknown in-memory key-value store(cache). For using Memcached, you need clients and many clients already exist. You can also findMemcached clients based on Java.
Though there are already good Memcached clients which have optimized Memcached operations a long time, I would like to introduce GrizzlyMemcached based on Grizzly framework which is very scalable and gives high performance.

Main features

Improving and supporting bulk operations such as getMulti and setMulti as well as basic operations of Memcached

  • Using high performance connection pool
  • Using Grizzly Framework for I/O
  • Using only Memcached binary protocol
  • Supporting setMulti, deleteMulti, getsMulti and casMulti as well as getMulti

Supporting failover/failback of Memcached

  • Using consistent hashing
  • Allowing Memcached's changes dynamically
  • Providing an option for enabling/disabling failover/failback

Synchronizing many clients for preventing stale cache data automatically when Memcacheds are failed, removed and added dynamically

  • Using ZooKeeper
  • Using the Barrier for synchronizing Memcached's list


I/O Model

It is very important that clients as well as servers should have stable and robust I/O base.
Grizzly NIO framework has high performance, scalability and stability and it can be integrated into various modules easily.
So GrizzlyMemcached uses Grizzly NIO framework for sending/parsing/receiving packets corresponding to Memcached's binary protocol.
Grizzly NIO framework also provides several I/O strategies.
I chose the same-thread IOStrategy with default and it showed good results in my benchmark because GrizzlyMemcached is not server but client(but, you can change it as your needs in configuration).

Connection Model

Some Memcached clients such as SpyMemcached andXMemcached use only one connection about requests of multi threads.
If multi threads share one connection, the client can optimize a set of continuous single get/set operations into a bulk operation like getMulti by using the request queue because a bulk opertaion is very fast and effective than many single operations.
But, one connection can also lack scalability if many/large requests of many threads are queued concurrently.
So some Memcached clients such as JavaMemcached use many connections(a connection per a thread) and pool of connections.
This is trade-off issue(more scalable but less effective than one connection model).

Finally, I chose "a connection per a thread" model because our company(Kakao) already has experienced a connection's overload. Most of cases were that hundreds of threads had requested many different kinds of keys simultaneously.

Stale cache data

Sometimes Memcacheds can be failed/added/removed or some Memcached clients can meet temporal network failures.
Of course clients use consistent hasing algorithm for choosing Memcacheds so they minimize side effects of Memcacheds' changes if a specific Memcached is failed because only keys of the failure's Memcached will be distributed to living Memcacheds.
Then, is the consistent hashing algorithm enough?
If you are using many clients with Memcacheds, you can't avoid stale cache data issue. If you need to build additional Memcacheds in real environments, all Memcached clients should share the same configuration of Memcached's list at the same time in order to minimize stale data.

Assuming that A, B are Memcacheds and there are hundreds of Memcached clients which know only A, B.
If new Memcached C should join the existing configuration set, some clients know A, B and C but others know only A, B while new configurations are being applied.
For preventing this issue, I chose the central configuration with ZooKeeper.
If the central configuration will be changed, all GrizzlyMemcacheds will detect and receive it(1 phase, prepare stage). If all GrizzlyMemcacheds receive it successfully, it will be applied simultaneously at the specific system time(2 phase, commit stage).
(I assumed all clients' system times are synchronized)


Test Information

  • Memcached and client machines
    • CPU: Intel Xeon 3.3G, 8 Processors
    • Memory: 16G
    • OS: Linux SentOS
    • JDK: 1.6
    • Network: 1Gbit
  • Server/Clients versions 
    • Memcached(v1.4.13)
    • GrizzlyMemcached, SpyMemcached(v2.7.3), JavaMemcached(v2.6.0) and XMemcached(v1.3.5)


  • packets
    • 32, 64, 128, 256 and 512 bytes
  • operations 
    • get, set, getMulti and setMulti(which is supported by only GrizzlyMemcached)
  • threads 
    • 1, 50, 100, 200 and 400
  • Etc 
    • multi keys are 200, Loop counts are 200(loops per a thread)


You can see the benchmark codes and results here

Examples of Use

Simple usecase

// creates a singleton CacheManager
final GrizzlyMemcachedCacheManager manager = new GrizzlyMemcachedCacheManager.Builder().build();

// gets the cache builder
final GrizzlyMemcachedCache.Builder<String, String> builder = manager.createCacheBuilder("user");
// initializes Memcached's list
// creates the cache
final MemcachedCache<String, String> userCache =;

// if you need to add more Memcached

// cache operation
final boolean result = userCache.set("name", "foo", expirationTimeoutInSec, false);
final String value = userCache.get("name", false);

// clean

ZooKeeper usecase

// gets the cache manager builder
final GrizzlyMemcachedCacheManager.Builder managerBuilder = new GrizzlyMemcachedCacheManager.Builder();

// setup zookeeper server
final ZooKeeperConfig zkConfig = ZooKeeperConfig.create("cache-manager", DEFAULT_ZOOKEEPER_ADDRESS);

// create a cache manager
final GrizzlyMemcachedCacheManager manager =;
final GrizzlyMemcachedCache.Builder<String, String> cacheBuilder = manager.createCacheBuilder("user");
// setup memcached servers
final Set<SocketAddress> memcachedServers = new HashSet<SocketAddress>();

// create a user cache
final GrizzlyMemcachedCache<String, String> cache =;

// ZooKeeperSupportCache's basic operations
if (cache.isZooKeeperSupported()) {
   final String serverListPath = cache.getZooKeeperServerListPath();
   final String serverList = cache.getCurrentServerListFromZooKeeper();
// ...

// clean

You can also see various unit test codes for more GrizzlyMemcached's examples here



GrizzlyMemcached is released with v1.0(2012/03/21). And it has a different repository from Grizzly project.

Here are sources and git information.

git:// (read-only)

Just try to check out sources and experience it.
And any feedbacks, questions and thoughts/opinions are all welcome!

Grizzly mailing: or

This page is for introducing Grizzly-Thrift server/client modules and sharing various benchmarking results.

Object serialization/deserialization of Java comes expensive. For improving this lack, we sometimes used to use other frameworks for RPC such as Protobuf and Thrift which support various programming languages, RPC and own data structures.

Especilally, Thrift has already provided various transport types. Basically, there are TSimpleServer, TThreadPoolServer, TNonblockingServer and THsHaServer for server and there is TSocket for client.

But Thrift's transport layer can be replaced for performance improvement by other NIO frameworks. So I tried to make another transports based on Grizzlyand benchmark it experimentally. It's Grizzly-Thrift server/client module.

Grizzly framework is for building scalable and robust servers using NIO and also offering extended framework components: Web Framework (HTTP/S), Bayeux Protocol, Servlet, HttpService OSGi and Comet.

Therefore it was not difficult for me to support Thrift server/client using Grizzly.

- Grizzly-Thrift Server/Client Modules -

Grizzly-Thrift included in Grizzly version 2.2 which released at 2011/12/20 but it was moved into different repository after Grizzly v2.2.2.

You can review and download sources as the following.

git:// (read-only)

For using Grizzly-Thrift server, you should add ThriftFrameFilter and ThriftServerFilter to Grizzly transport. For using Grizzly-Thrift client, you should add ThriftFrameFilter and ThriftClientFilter to Grizzly transport.

ThriftFrameFilter encodes/decodes Thrift's TFramedTransport which is composed of frame-length header(4bytes) and body. And ThriftServerFilter/ThriftClientFilter interconnects the user's processor/handler with TGrizzlyServerTransport/TGrizzlyClientTransport which extends Thrift's TTransport.

Here are examples for Thrift server/client based on Grizzly.

--- Grizzly-Thrift Server ---

final FilterChainBuilder serverFilterChainBuilder = FilterChainBuilder.stateless();

final user-generated.thrift.Processor tprocessor = new user-generated.thrift.Processor(new user-generated.thrift.Handler());

serverFilterChainBuilder.add(new TransportFilter()).add(new ThriftFrameFilter()).add(new ThriftServerFilter(tprocessor));

final TCPNIOTransport grizzlyTransport = TCPNIOTransportBuilder.newInstance().build();





--- Grizzly-Thrift Client ---

final FilterChainBuilder clientFilterChainBuilder = FilterChainBuilder.stateless();

clientFilterChainBuilder.add(new TransportFilter()).add(new ThriftFrameFilter()).add(new ThriftClientFilter());

final TCPNIOTransport grizzlyTransport = TCPNIOTransportBuilder.newInstance().build();



final Future<Connection> future = grizzlyTransport.connection(ip, port);

final Connection connection = future.get(10, TimeUnit.SECONDS);

final TTransport tGrizzlyTransport = TGrizzlyClientTransport.create(connection);

final TProtocol tprotocol = new TBinaryProtocol(tGrizzlyTransport); // or TCompactProtocol

user-generated.thrift.client.Client client = new user-generated.thrift.Client(tprotocol);

// ... user specific client call

If you are already familiar to Thrift, this is easy. If you aren't, I recommend that you review Thrift's tutorial examples first. (See the and in Thrift's tutorial.)

Grizzly-Thrift modules already include basic unit tests based on Thrift's tutorial. (See the in Grizzly-Thrift.)

If you are using maven in your project, here are pom.xml's dependencies.

--- pom.xml ---













In addition, Grizzly provides various IO strategies such as worker-thread, same-thread and leader-follower so I also tried to test Grizzly-Thrift modules with each IO strategy. If you would like to know more Grizzly's IO strategies, please see the this.

- Benchmarking -

I also benchmarked various Thrift Server-Client modules which are TSocketServer/Client, TThreadpoolServer, TTNonblockingServer, Netty Server/Client and Grizzly Server/Client. I used business operations based on Thrift's tutorial for test but modified a bit logic for packet size.

Test Information

  • Server Type/Client Type: TServer-TSocketClient vs TServer-NettyClient vs TServer-GrizzlyClient vs GrizzlyServer-TSocketClient vs GrizlzyServer-GrizzlyClient vs etc...
  • Message Size: About 3M Bytes, 3K Bytes, 300 Bytes
  • Thrift Protocol: Binary, Compact
  • Client Connections: 20~1000
  • Server and Client Test Machine Information 
  • Scenario 
    • After 1min warming-up, testing 5min and collecting total results

Benchmarking Results

  • 3M + Compact + 40 Connections                                               
    Server TypesTSocket ClientNetty ClientGrizzly Client
    • Netty client had the performance issue unfortunately, so I would exclude it for next benchmarking.
  • 3M + Binary + 40 Connections                                   
    Server TypesTSocket ClientGrizzly Client

In 3M test, Compact/Binary and Server/Client tests were meaningless with regards to performance.

  • 3K + Compact + 40 Connections                                   
    Server TypesGrizzly Client
    • TNonblockingServer had the performance issue. And Netty and Grizzlys' results were better than Thrift server modules'.
  • 3K + Binary + 40 Connections                                         
    Server TypesTSocket ClientGrizzly Client
    • Grizzly client module had better performance than TSocket client so I would use only Grizzly client for next benchmarking.

In 3K test, Compact protocol was better than Binary protocol. And Netty and Grizzlys' results were better than Thrift server modules' so I would use only Netty and Grizzly server for next benchmarking.

  • 300Bytes + Compact + 20 Connections                         
    Server TypesGrizzly Client
  • 300Bytes + Compact + 40 Connections                         
    Server TypesGrizzly Client
  • 300Bytes + Compact + 60 Connections                         
    Server TypesGrizzly Client
  • 300Bytes + Compact + 80 Connections                              
    Server TypesGrizzly Client
  • 300Bytes + Compact + 100 Connections                              
    Server TypesGrizzly Client
  • 300Bytes + Compact + 120 Connections                              
    Server TypesGrizzly Client
  • 300Bytes + Compact + 150 Connections                              
    Server TypesGrizzly Client
  • 300Bytes + Compact + 500 Connections                              
    Server TypesGrizzly Client
  • 300Bytes + Compact + 1000 Connections                              
    Server TypesGrizzly Client

In many connections(more than 120 connections), most of servers didn't receive proper requests of client because the client machine of this environment used too much resouces such as high CPU usages. So I think that more client machines are needed to calculate meaningful data of more connections. In 100 connections, Netty and Grizzly's same-thread IO strategy's throughput decreased but Grizzly's woker-thread IO and leader-follower IO strategies' throughput increased.

In my test cases and environments, worker-thread IO strategy and leader-follower IO strategy were more effective than same-thread IO strategy if servers should have more than 100 connections.

- Conclusion -


  • Results of 300Bytes + Compact + 40 Connections                                               
    Server TypesTSocket ClientNetty ClientGrizzly Client
    TServer741,417 604,558
    TThreadPoolServer14,731,560 12,747,230
    TNonblockingServer6,060,111 6,723,402
    Netty14,749,519 14,569,820
  • Results of 3KBytes + Compact + 40 Connections                                               
    Server TypesTSocket ClientNetty ClientGrizzly Client
    TServer631,300 526,341
    TThreadPoolServer7,708,088 8,283,705
    TNonblockingServer5,264,995 5,801,319
    Netty8,372,804 9,058,550
  • 300Bytes + Compact + 100 Connections                              
    Server TypesGrizzly Client
  • Server Module 
    • Grizzly same-thread IO strategy was best in a few connections. Grizzly leader-follower IO strategy was best in many connections.
    • CPU Usages: Netty==GrizzlySameIO < GrizzlyLeaderFollowerIO < GrizzlyWorkerIO
  • Client Module 
    • In small packets, TSocket was best. In larget packets, Grizzly client was best.
  • Thrift Protocol 
    • In this scenario, Compact protocol was best.

Finally, I decided that our company, Kakao would use worker-thread IO strategy for Grizzly-Thrift server in real fields because it was very stable. For client I decided that I would use same-thread IO Strategy because of efficiency.

If you are already using Thrift or have a plan to use Thrift for RPC, just try to apply Grizzly-Thrift!

(I think these results are only for reference so you could meet different results according to your environments and benchmarking logic).