carryel

2 posts

Sometimes we use caches for speeding up by alleviating database load.
And the Memcached is the bestknown in-memory key-value store(cache). For using Memcached, you need clients and many clients already exist. You can also findMemcached clients based on Java.
Though there are already good Memcached clients which have optimized Memcached operations a long time, I would like to introduce GrizzlyMemcached based on Grizzly framework which is very scalable and gives high performance.

Main features

Improving and supporting bulk operations such as getMulti and setMulti as well as basic operations of Memcached

  • Using high performance connection pool
  • Using Grizzly Framework for I/O
  • Using only Memcached binary protocol
  • Supporting setMulti, deleteMulti, getsMulti and casMulti as well as getMulti

Supporting failover/failback of Memcached

  • Using consistent hashing
  • Allowing Memcached's changes dynamically
  • Providing an option for enabling/disabling failover/failback

Synchronizing many clients for preventing stale cache data automatically when Memcacheds are failed, removed and added dynamically

  • Using ZooKeeper
  • Using the Barrier for synchronizing Memcached's list

Considerations

I/O Model

It is very important that clients as well as servers should have stable and robust I/O base.
Grizzly NIO framework has high performance, scalability and stability and it can be integrated into various modules easily.
So GrizzlyMemcached uses Grizzly NIO framework for sending/parsing/receiving packets corresponding to Memcached's binary protocol.
Grizzly NIO framework also provides several I/O strategies.
I chose the same-thread IOStrategy with default and it showed good results in my benchmark because GrizzlyMemcached is not server but client(but, you can change it as your needs in configuration).

Connection Model

Some Memcached clients such as SpyMemcached andXMemcached use only one connection about requests of multi threads.
If multi threads share one connection, the client can optimize a set of continuous single get/set operations into a bulk operation like getMulti by using the request queue because a bulk opertaion is very fast and effective than many single operations.
But, one connection can also lack scalability if many/large requests of many threads are queued concurrently.
So some Memcached clients such as JavaMemcached use many connections(a connection per a thread) and pool of connections.
This is trade-off issue(more scalable but less effective than one connection model).

Finally, I chose "a connection per a thread" model because our company(Kakao) already has experienced a connection's overload. Most of cases were that hundreds of threads had requested many different kinds of keys simultaneously.

Stale cache data

Sometimes Memcacheds can be failed/added/removed or some Memcached clients can meet temporal network failures.
Of course clients use consistent hasing algorithm for choosing Memcacheds so they minimize side effects of Memcacheds' changes if a specific Memcached is failed because only keys of the failure's Memcached will be distributed to living Memcacheds.
Then, is the consistent hashing algorithm enough?
If you are using many clients with Memcacheds, you can't avoid stale cache data issue. If you need to build additional Memcacheds in real environments, all Memcached clients should share the same configuration of Memcached's list at the same time in order to minimize stale data.

Assuming that A, B are Memcacheds and there are hundreds of Memcached clients which know only A, B.
If new Memcached C should join the existing configuration set, some clients know A, B and C but others know only A, B while new configurations are being applied.
For preventing this issue, I chose the central configuration with ZooKeeper.
If the central configuration will be changed, all GrizzlyMemcacheds will detect and receive it(1 phase, prepare stage). If all GrizzlyMemcacheds receive it successfully, it will be applied simultaneously at the specific system time(2 phase, commit stage).
(I assumed all clients' system times are synchronized)

Benchmark

Test Information

  • Memcached and client machines
     
    • CPU: Intel Xeon 3.3G, 8 Processors
    • Memory: 16G
    • OS: Linux SentOS
    • JDK: 1.6
    • Network: 1Gbit
  • Server/Clients versions 
    • Memcached(v1.4.13)
    • GrizzlyMemcached, SpyMemcached(v2.7.3), JavaMemcached(v2.6.0) and XMemcached(v1.3.5)

Senario

  • packets
     
    • 32, 64, 128, 256 and 512 bytes
  • operations 
    • get, set, getMulti and setMulti(which is supported by only GrizzlyMemcached)
  • threads 
    • 1, 50, 100, 200 and 400
  • Etc 
    • multi keys are 200, Loop counts are 200(loops per a thread)

Result

You can see the benchmark codes and results here

Examples of Use

Simple usecase

// creates a singleton CacheManager
final GrizzlyMemcachedCacheManager manager = new GrizzlyMemcachedCacheManager.Builder().build();

// gets the cache builder
final GrizzlyMemcachedCache.Builder<String, String> builder = manager.createCacheBuilder("user");
// initializes Memcached's list
builder.servers(initialServerSet);
// creates the cache
final MemcachedCache<String, String> userCache = builder.build();

// if you need to add more Memcached
//userCache.addServer(ADDITIONAL_MEMCACHED_ADDRESS);

// cache operation
final boolean result = userCache.set("name", "foo", expirationTimeoutInSec, false);
final String value = userCache.get("name", false);
//...

// clean
manager.removeCache("user");
manager.shutdown();

ZooKeeper usecase

// gets the cache manager builder
final GrizzlyMemcachedCacheManager.Builder managerBuilder = new GrizzlyMemcachedCacheManager.Builder();

// setup zookeeper server
final ZooKeeperConfig zkConfig = ZooKeeperConfig.create("cache-manager", DEFAULT_ZOOKEEPER_ADDRESS);
zkConfig.setRootPath(ROOT);
zkConfig.setConnectTimeoutInMillis(3000);
zkConfig.setSessionTimeoutInMillis(30000);
zkConfig.setCommitDelayTimeInSecs(60);
managerBuilder.zooKeeperConfig(zkConfig);

// create a cache manager
final GrizzlyMemcachedCacheManager manager = managerBuilder.build();
final GrizzlyMemcachedCache.Builder<String, String> cacheBuilder = manager.createCacheBuilder("user");
// setup memcached servers
final Set<SocketAddress> memcachedServers = new HashSet<SocketAddress>();
memcachedServers.add(MEMCACHED_ADDRESS1);
memcachedServers.add(MEMCACHED_ADDRESS2);
cacheBuilder.servers(memcachedServers);

// create a user cache
final GrizzlyMemcachedCache<String, String> cache = cacheBuilder.build();

// ZooKeeperSupportCache's basic operations
if (cache.isZooKeeperSupported()) {
   final String serverListPath = cache.getZooKeeperServerListPath();
   final String serverList = cache.getCurrentServerListFromZooKeeper();
   cache.setCurrentServerListOfZooKeeper("localhost:11211,localhost:11212");
}
// ...

// clean
manager.removeCache("user");
manager.shutdown();

You can also see various unit test codes for more GrizzlyMemcached's examples here

Pom.xml

<dependency>
    <groupId>org.glassfish.grizzly</groupId>
    <artifactId>grizzly-memcached</artifactId>
    <version>1.0</version>
</dependency>

GrizzlyMemcached is released with v1.0(2012/03/21). And it has a different repository from Grizzly project.

Here are sources and git information.

http://java.net/projects/grizzly/sources/memcached/show

git://java.net/grizzly~memcached (read-only)

Just try to check out sources and experience it.
And any feedbacks, questions and thoughts/opinions are all welcome!

Grizzly mailing: users@grizzly.java.net or dev@grizzly.java.net

This page is for introducing Grizzly-Thrift server/client modules and sharing various benchmarking results.

Object serialization/deserialization of Java comes expensive. For improving this lack, we sometimes used to use other frameworks for RPC such as Protobuf and Thrift which support various programming languages, RPC and own data structures.

Especilally, Thrift has already provided various transport types. Basically, there are TSimpleServer, TThreadPoolServer, TNonblockingServer and THsHaServer for server and there is TSocket for client.

But Thrift's transport layer can be replaced for performance improvement by other NIO frameworks. So I tried to make another transports based on Grizzlyand benchmark it experimentally. It's Grizzly-Thrift server/client module.

Grizzly framework is for building scalable and robust servers using NIO and also offering extended framework components: Web Framework (HTTP/S), Bayeux Protocol, Servlet, HttpService OSGi and Comet.

Therefore it was not difficult for me to support Thrift server/client using Grizzly.

- Grizzly-Thrift Server/Client Modules -

Grizzly-Thrift included in Grizzly version 2.2 which released at 2011/12/20 but it was moved into different repository after Grizzly v2.2.2.

You can review and download sources as the following.

http://java.net/projects/grizzly/sources/thrift/show

git://java.net/grizzly~thrift (read-only)

For using Grizzly-Thrift server, you should add ThriftFrameFilter and ThriftServerFilter to Grizzly transport. For using Grizzly-Thrift client, you should add ThriftFrameFilter and ThriftClientFilter to Grizzly transport.

ThriftFrameFilter encodes/decodes Thrift's TFramedTransport which is composed of frame-length header(4bytes) and body. And ThriftServerFilter/ThriftClientFilter interconnects the user's processor/handler with TGrizzlyServerTransport/TGrizzlyClientTransport which extends Thrift's TTransport.

Here are examples for Thrift server/client based on Grizzly.

--- Grizzly-Thrift Server ---

      
final FilterChainBuilder serverFilterChainBuilder = FilterChainBuilder.stateless();

final user-generated.thrift.Processor tprocessor = new user-generated.thrift.Processor(new user-generated.thrift.Handler());

serverFilterChainBuilder.add(new TransportFilter()).add(new ThriftFrameFilter()).add(new ThriftServerFilter(tprocessor));

final TCPNIOTransport grizzlyTransport = TCPNIOTransportBuilder.newInstance().build();

grizzlyTransport.setProcessor(serverFilterChainBuilder.build());

grizzlyTransport.bind(port);

grizzlyTransport.start();

 

--- Grizzly-Thrift Client ---

          
final FilterChainBuilder clientFilterChainBuilder = FilterChainBuilder.stateless();

clientFilterChainBuilder.add(new TransportFilter()).add(new ThriftFrameFilter()).add(new ThriftClientFilter());

final TCPNIOTransport grizzlyTransport = TCPNIOTransportBuilder.newInstance().build();

grizzlyTransport.setProcessor(clientFilterChainBuilder.build());

grizzlyTransport.start();

final Future<Connection> future = grizzlyTransport.connection(ip, port);

final Connection connection = future.get(10, TimeUnit.SECONDS);

final TTransport tGrizzlyTransport = TGrizzlyClientTransport.create(connection);

final TProtocol tprotocol = new TBinaryProtocol(tGrizzlyTransport); // or TCompactProtocol

user-generated.thrift.client.Client client = new user-generated.thrift.Client(tprotocol);

// ... user specific client call

If you are already familiar to Thrift, this is easy. If you aren't, I recommend that you review Thrift's tutorial examples first. (See the JavaServer.java and JavaClient.java in Thrift's tutorial.)

Grizzly-Thrift modules already include basic unit tests based on Thrift's tutorial. (See the ThriftTutorialTest.java in Grizzly-Thrift.)

If you are using maven in your project, here are pom.xml's dependencies.

--- pom.xml ---

...

<dependency>

    <groupId>org.glassfish.grizzly</groupId>

    <artifactId>grizzly-framework</artifactId>

    <version>2.2.3</version>

</dependency>

<dependency>

    <groupId>org.glassfish.grizzly</groupId>

    <artifactId>grizzly-thrift</artifactId>

    <version>1.0</version>

</dependency>

...

In addition, Grizzly provides various IO strategies such as worker-thread, same-thread and leader-follower so I also tried to test Grizzly-Thrift modules with each IO strategy. If you would like to know more Grizzly's IO strategies, please see the this.

- Benchmarking -

I also benchmarked various Thrift Server-Client modules which are TSocketServer/Client, TThreadpoolServer, TTNonblockingServer, Netty Server/Client and Grizzly Server/Client. I used business operations based on Thrift's tutorial for test but modified a bit logic for packet size.

Test Information

  • Server Type/Client Type: TServer-TSocketClient vs TServer-NettyClient vs TServer-GrizzlyClient vs GrizzlyServer-TSocketClient vs GrizlzyServer-GrizzlyClient vs etc...
  • Message Size: About 3M Bytes, 3K Bytes, 300 Bytes
  • Thrift Protocol: Binary, Compact
  • Client Connections: 20~1000
  • Server and Client Test Machine Information 
  • Scenario 
    • After 1min warming-up, testing 5min and collecting total results

Benchmarking Results

  • 3M + Compact + 40 Connections                                               
    Server TypesTSocket ClientNetty ClientGrizzly Client
    TServer8,6374788,510
    TThreadPoolServer11,2212,27311,220
    TNonblockingServer11,2231,83211,221
    Netty11,2202,31111,220
    Grizzly11,2211,76511,225
    • Netty client had the performance issue unfortunately, so I would exclude it for next benchmarking.
  • 3M + Binary + 40 Connections                                   
    Server TypesTSocket ClientGrizzly Client
    TThreadPoolServer11,21911,215
    TNonblockingServer11,22111,221
    Netty11,21311,221
    Grizzly11,22011,222

In 3M test, Compact/Binary and Server/Client tests were meaningless with regards to performance.

  • 3K + Compact + 40 Connections                                   
    Server TypesGrizzly Client
    TThreadPoolServer8,283,705
    TNonblockingServer5,801,319
    Netty9,058,550
    Grizzly8,964,358
    Grizzly(SameIO)9,098,590
    • TNonblockingServer had the performance issue. And Netty and Grizzlys' results were better than Thrift server modules'.
  • 3K + Binary + 40 Connections                                         
    Server TypesTSocket ClientGrizzly Client
    TThreadPoolServer7,619,6938,163,692
    TNonblockingServer5,444,6306,032,290
    Netty8,254,1688,930,896
    Grizzly8,204,0978,833,978
    Grizzly(SameIO)8,257,9188,960,497
    • Grizzly client module had better performance than TSocket client so I would use only Grizzly client for next benchmarking.

In 3K test, Compact protocol was better than Binary protocol. And Netty and Grizzlys' results were better than Thrift server modules' so I would use only Netty and Grizzly server for next benchmarking.

  • 300Bytes + Compact + 20 Connections                         
    Server TypesGrizzly Client
    Netty10,269,876
    Grizzly(SameIO)10,349,440
    Grizzly(LeaderF)9,654,216
  • 300Bytes + Compact + 40 Connections                         
    Server TypesGrizzly Client
    Netty14,569,820
    Grizzly(SameIO)14,770,452
    Grizzly(LeaderF)13,674,641
  • 300Bytes + Compact + 60 Connections                         
    Server TypesGrizzly Client
    Netty15,783,774
    Grizzly(SameIO)15,962,425
    Grizzly(LeaderF)15,227,426
  • 300Bytes + Compact + 80 Connections                              
    Server TypesGrizzly Client
    Netty16,964,578
    Grizzly(SameIO)16,712,315
    Grizzly(Worker)15,890,537
    Grizzly(LeaderF)16,252,280
  • 300Bytes + Compact + 100 Connections                              
    Server TypesGrizzly Client
    Netty15,879,803
    Grizzly(SameIO)15,781,153
    Grizzly(Worker)16,136,977
    Grizzly(LeaderF)16,437,650
  • 300Bytes + Compact + 120 Connections                              
    Server TypesGrizzly Client
    Netty15,904,968
    Grizzly(SameIO)15,985,106
    Grizzly(Worker)16,097,609
    Grizzly(LeaderF)16,164,636
  • 300Bytes + Compact + 150 Connections                              
    Server TypesGrizzly Client
    Netty15,952,442
    Grizzly(SameIO)16,109,154
    Grizzly(Worker)16,261,584
    Grizzly(LeaderF)15,923,040
  • 300Bytes + Compact + 500 Connections                              
    Server TypesGrizzly Client
    Netty12,463,442
    Grizzly(SameIO)12,499,963
    Grizzly(Worker)12,461,131
    Grizzly(LeaderF)12,532,517
  • 300Bytes + Compact + 1000 Connections                              
    Server TypesGrizzly Client
    Netty11,867,630
    Grizzly(SameIO)11,903,400
    Grizzly(Worker)11,906,507
    Grizzly(LeaderF)11,812,262

In many connections(more than 120 connections), most of servers didn't receive proper requests of client because the client machine of this environment used too much resouces such as high CPU usages. So I think that more client machines are needed to calculate meaningful data of more connections. In 100 connections, Netty and Grizzly's same-thread IO strategy's throughput decreased but Grizzly's woker-thread IO and leader-follower IO strategies' throughput increased.

In my test cases and environments, worker-thread IO strategy and leader-follower IO strategy were more effective than same-thread IO strategy if servers should have more than 100 connections.

- Conclusion -

 

  • Results of 300Bytes + Compact + 40 Connections                                               
    Server TypesTSocket ClientNetty ClientGrizzly Client
    TServer741,417 604,558
    TThreadPoolServer14,731,560 12,747,230
    TNonblockingServer6,060,111 6,723,402
    Netty14,749,519 14,569,820
    Grizzly(SameIO)14,931,7459,066,52514,770,452
  • Results of 3KBytes + Compact + 40 Connections                                               
    Server TypesTSocket ClientNetty ClientGrizzly Client
    TServer631,300 526,341
    TThreadPoolServer7,708,088 8,283,705
    TNonblockingServer5,264,995 5,801,319
    Netty8,372,804 9,058,550
    Grizzly(SameIO)8,381,3523,718,4319,098,590
  • 300Bytes + Compact + 100 Connections                              
    Server TypesGrizzly Client
    Netty15,879,803
    Grizzly(SameIO)15,781,153
    Grizzly(Worker)16,136,977
    Grizzly(LeaderF)16,437,650
  • Server Module 
    • Grizzly same-thread IO strategy was best in a few connections. Grizzly leader-follower IO strategy was best in many connections.
    • CPU Usages: Netty==GrizzlySameIO < GrizzlyLeaderFollowerIO < GrizzlyWorkerIO
  • Client Module 
    • In small packets, TSocket was best. In larget packets, Grizzly client was best.
  • Thrift Protocol 
    • In this scenario, Compact protocol was best.

Finally, I decided that our company, Kakao would use worker-thread IO strategy for Grizzly-Thrift server in real fields because it was very stable. For client I decided that I would use same-thread IO Strategy because of efficiency.

If you are already using Thrift or have a plan to use Thrift for RPC, just try to apply Grizzly-Thrift! http://weblogs.java.net/sites/all/modules/fckeditor/fckeditor/editor/images/smiley/msn/regular_smile.gif

(I think these results are only for reference so you could meet different results according to your environments and benchmarking logic).