Resource Consumption Management with WebLogic Multitenant [Article]

Version 7

    A premium feature in WebLogic Server Multitenant 12.2.1, Resource Consumption Management (RCM) provides resource isolation and helps to ensure that resources are allocated fairly to the partitions. This article by members of the WebLogic product team presents details on what  RCM is, what it can do, and how to make it work for you.


     

    Article written by:

     

    • Rahul Srivastava
    • Jagadish Ramu
    • Kshitiz Saxena
    • Naman Mehta
    • Sivakumar Thyagarajan
    • Larry Feigen

     

    What is WebLogic Server Multitenant?

     

    Multi-tenancy (MT) in WebLogic Server (WLS) provides a sharable infrastructure for use by multiple organizations. These organizations are a conceptual grouping of your own choosing, which you can think of as tenants. By allowing one domain to support multiple tenants, WebLogic-MT improves density and achieves a more efficient use of resources.

     

    WebLogic-MT provides resource isolation within a domain partition, an administrative and runtime slice of a WebLogic domain that is dedicated to running application instances and related resources for a tenant. Domain partitions achieve greater density by allowing application instances and related resources to share the domain, WebLogic itself, the Java virtual machine (JVM), and the operating system, while isolating tenant-specific application data, configuration, and runtime traffic. Read more about WebLogic-MT here.

    figure-01.png

    Figure 1: Collocated partitions in WebLogic-MT

     

    What is Resource Consumption Management?

     

    A premium feature in WebLogic-MT 12.2.1, Resource Consumption Management (RCM) provides resource isolation and tries to ensure that resources are allocated fairly to the partitions. It provides a policy infrastructure to limit usage of the shared resources and take appropriate actions when those specified limits are breached. It can also help maximize resource utilization in consolidated deployments.

     

    Why is RCM important?

     

    As we saw, in WebLogic-MT there can be one or more co-located partitions in a single JVM. When partitions are co-located, they may consume or compete for the low-level resources offered by the OS/JVM. Low-level resources are often limited in nature. The (over-) consumption of these resources by one partition may (adversely) impact the other co-located partitions. Therefore, in WLS-MT, where partitions are co-located, it is important to isolate these partitions and the resources consumed by these partitions.

     

    For example: If there are 100 file-descriptors available on a particular OS running WebLogic-MT that has 2 co-located partitions, one partition may end up consuming most of the available file-descriptors, leaving absolutely nothing for the other partition (implying the affected partition cannot function as expected). The affected partition has to bear the cost of being co-located with a resource-hogging partition.

    Figure-2-NewAdd.jpg

    Figure 2: Collocated tenants - Red and Blue, waiting to consume the shared 100 File Descriptors

     

    Figure-3-NewAdd.jpg

    Figure 3: Red tenant already consumed 99 File Descriptors, leaving just 1 File Descriptor for the Blue tenant

     

    figure-02.png

    Figure 4: Not enough File Descriptor left for the Blue tenant because the Red tenant has already consumed most of the shared File Descriptor

     

    As we can see, the Blue tenant is affected adversely because the Red tenant consumed most of the shared resources. The solution is to enforce policies through the RCM, so that one partition does not end up consuming all the low-level resources. With RCM, the system admin can define policies so the consumption of resources by one partition does not adversely affect the other co-located partitions.

       figure-03.png

    Figure 5: RCM policy defining the resource consumption threshold for the collocated partitions and the corresponding recourse action.

     

    Resources Controlled by RCM

     

    Here are the resources controlled by RCM:

     

    •   Heap Retained: Heap Retained (in MB) for a partition is the heap consumed by the partition and all the apps running within that partition. This figure might not be accurate if the garbage collector (GC) has not run for a while. The apps deployed in the partition could be generating garbage all the time so, to accurately determine the heap retained, the GC has to run. This creates a dependency with   the GC used in the JDK. For heap, Garbage First Garbage Collector (G1GC) is a requirement. The notifications for heap retained are post consumption, and after the GC has run, therefore, there might be a lag when the consumption threshold is breached and the recourse action is taken.

    • CPU (Utilization): CPU is the % of cpu consumed by a partition with regard to cpu available to the WLS process. This takes into account the CPU cores, load factor, etc., to determine the CPU Utilization of the partition.

    • FileOpen: FileOpen is a limited and bounded resource in an OS that gets consumed when files are opened for read/write and relinquished when they are closed. As per the Javadoc of the java.io.FileDescriptor, “class serve[s] as an opaque handle to the underlying machine-specific structure representing an open file, an open socket, or another source or sink of bytes.” FileOpen is related to the FileDescriptor, but is not the same. FileOpen resource accounts only for the number of open files, and not for the other resources (e.g., Sockets, Pipes, etc.).

     

    Types of RCM Policy

     

    There are two types of policies available in RCM: FairShare and Capping.

    • FairShare: In FairShare, one defines the share of a partition relative to other partitions, and the WLS self-tunes to process the incoming requests so that the specified FairShare across partitions is achieved and maintained over a period of time, when there is contention. A FairShare value is always relative to the other partitions. For example, if there are two partitions with FairShare as 50 each, each partition's share is computed as 50/100 = 0.5. If there are three partitions with FairShare as 50 each, then each partition's share is computed as: 50/150 = 0.3. The FairShare of a partition may be affected when partitions (with a FairShare) are added or deleted. When there is only one partition in the domain, then there is no contention, and the partition gets all the available resources.

    • Capping: The capping policy limits the maximum usage of a resource by a partition. It is for a specific resource type, and of the trigger and action format ( i.e., when this trigger is breached, take this action; when the consumption goes below the trigger, withdraw the action taken, when possible). There could be multiple such triggers and actions defined in a policy. For example, when the HeapRetained, for the RedPartition, crosses 4GB, slow down the partition. In this example, 4GB is the trigger, and “slow” is the action for resource type: HeapRetained of the Red Partition.

     

    Recourse Actions

     

    In case of capping, when triggers are breached, one of the following recourse actions can be taken:

     

    1. Notify
    2. Slow
    3. Fail
    4. Shutdown

     

    • Notify: Generates a log notification. One could then use these notifications to perform more meaningful tasks (e.g., text the system administrator to notify him/her about the threshold being breached, etc.).

    • Slow: Slows down the entire partition (i.e., the processing of the requests submitted to this partition would be slowed down as compared to the other partitions).

    • Fail: Fails the operation performed on behalf of the request to open a file. An exception would be thrown to the application trying to open the file. Fail is applicable only for FileOpen resource.

    • Shutdown: Shuts down the partition. This is the extreme action one could take when the resource consumption of a partition has gone way beyond the specified quota. However, it might be prudent to at least give the partition an opportunity to slow down before shutting it down.

     

    Note: Fail and Shutdown actions cannot be specified together.

     

    <domain>
    ...
       <!--Define RCM Configuration -->
       <resource-management>
    
            <resource-manager>
                <name>PolicyForRedTenant</name>
                <file-open>
                    <trigger>
                        <name>SlowAt_50</name>
                        <value>50</value><!-- in units-->
                        <action>slow</action>
                    </trigger>
                </file-open>
            </resource-manager>
    
            <resource-manager>
                <name>PolicyForBlueTenant</name>
                <file-open>
                    <trigger>
                        <name>ShutdownAt_50</name>
                        <value>50</value><!-- in units-->
                        <action>shutdown</action>
                    </trigger>
                </file-open>
            </resource-manager>
            ...
        </resource-management>
    
        <partition>
            <name>Partition-Red</name>
            <resource-group>
                <name>ResourceTemplate-0_group</name>
                <resource-group-template>ResourceTemplate-0</resource-group-template>
            </resource-group>
            ...
            <partition-id>1741ad19-8ca7-4339-b6d3-78e56d8a5858</partition-id>
            <!-- RCM Managers are
                then targetted to Partitions during partition creation time or later
                by system administrators -->
            <resource-manager-ref>PolicyForRedTenant</resource-manager-ref>
        ...
        </partition>
    
    ...
    </domain>
    
    
    
    
    
    

     

     

     

    Resource Recourse Action Matrix

     

    The following matrix shows the actions supported for each of the RCM resource types.

    figure-04.png

     

    Policy Configuration

     

    There are many tools available in WebLogic-MT to configure an RCM policy:

     

     

    Enable RCM/RM in JDK

     

    The WebLogic RCM feature is built on top of the JDK Resource Management (RM) API , which is introduced in Oracle JDK 8u40. Therefore, one should use Oracle JDK 8u40 (or later) when starting WebLogic to use the RCM feature, with JDK RM enabled.

     

    JDK RM API is not enabled by default. To enable RM in 8u40, start JVM with the following options:
      -XX:+UnlockCommercialFeatures -XX:+ResourceManagement -XX:+UseG1GC

     

    In JDK 8u40, G1GC is mandatory when RM is enabled. However, in JDK 8u60, G1GC is mandatory only for the HeapRetained resource. All other resources would work with any other supported garbage collector.

     

    Limitations

     

    Here are some of the limitations in RCM in WebLogic-MT 12.2.1:

     

    1. Heap resource consumption tracking and management is supported only when run with the G1 garbage collector (there is no RCM support for other JDK collectors).

    2. There is no support to measure and account for resource consumption metrics for activities happening in JNI/native code.

    3. Measurements of Retained Heap and CPU Utilization are performed asynchronously and hence do not represent "current" (a "point-in-time") value.

    4. Discrimination of heap usage for objects in static fields, and singleton objects of classes loaded from system and shared classloaders are problematic and may not be accurately represented in the final accounting values. If an instance of a class loaded from system and shared classloaders is loaded by a partition, the instance's use of heap is accounted against that partition.

    5. Garbage collection activity is not isolated to specific domain partitions in WLS 12.2.1 with Oracle JDK 8u40.

    6. There is a performance impact to enabling the WLS RCM feature due to the additional tracking and management of resource consumption in the server instance.

     

     

    References

     

     

     

     

    About the Authors

     

    • Rahul Srivastava, Principal Member of Technical Staff, Oracle
    • Jagadish Ramu, Principal Member of Technical Staff, Oracle
    • Kshitiz Saxena, Principal Member of Technical Staff, Oracle
    • Naman Mehta, Principal Member of Technical Staff, Oracle
    • Sivakumar Thyagarajan, Consulting Member of Technical Staff, Oracle
    • Larry Feigen, WebLogic Server Architect, Oracle

     

     


      Note: This article has been reviewed by the relevant Oracle product team and found to be in compliance with standards and practices for the use of Oracle products.