Skip to Main Content
IBM Sterling


This portal is to open public enhancement requests for IBM Sterling products and services. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:

Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,

Post your ideas
  1. Post an idea.

  2. Get feedback from the IBM team and other customers to refine your idea.

  3. Follow the idea through the IBM Ideas process.


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

Status Delivered
Created by Guest
Created on Mar 9, 2020

Control Center 6.1 + improve/speed up when monitored servers get temporarily assigned to another EP

When an EP is stopped, it can take several minutes for the monitored servers to get reassigned to another EP. If there is anyway that your company can improve and/or speed up this process, it will be greatly appreciated.

What is your industry? Non-Industry Specific
How will this idea be used?

When an EP is stopped, it can take several minutes for the monitored servers to get reassigned to another EP. If there is anyway that your company can improve and/or speed up this process, it will be greatly appreciated.

  • Guest
    Reply
    |
    Oct 23, 2020

    Hello!

    What you desire is achievable, now, but please let me explain some things to you first.

    There are two aspects to server reassignment.

    • The first is ascertaining that an EP has stopped running, and

    • The second is to reassign servers to other EPs in the cluster when one EP has stopped.

    Know that Control Center 6.1.3.0 has significant performance improvements compared to 6.1.2, including one in the area of server reassignment. That said, there can still be a greater than one minute, but less than two minute, delay between when an EP actually stops, and the CEP deciding it has stopped.

    This time can be lessened, but at cost to Control Center performance, and it comes with a risk of the CEP falsely deciding that an EP has stopped.

    Details:

    • All running EPs are expected to update the LAST_CHECKIN value for their row in the CC_SERVER table at the rate dictated by their HEARTBEAT_INTERVAL value.

    • The CEP checks CC_SERVER.LAST_CHECKIN, for each EP, periodically, and if it determines the value is too old (more on that algorithm below) it will set the status of the that server to DOWN and reassign its servers according to the policies set. (Note before actually changing the status of an EP to DOWN, the CEP will make an attempt to communicate with it, and only if that fails too, will it change its status to DOWN.)

    • The CC_SERVER.HEARTBEAT_INTERVAL value is set when you install and configure an EP. For EPs, the value set, by default, for HEARTBEAT_INTERVAL is 30000 (the unit is milliseconds, so this is actually 30 seconds).

    • You may manually change this via SQL (while all EPs are stopped). By making this value smaller, you would make the CEP decide sooner that an EP is down than it does now, and then server reassignment would start faster.

    • Making this value smaller, would increase the risk of the CEP mistakenly thinking an EP has stopped, when perhaps instead a temporary condition has prevented the EP from updating its LAST_CHECKIN value in a timely fashion. And then unnecessary, and unwanted, server reassignments would be initiated by the CEP.

    • The CEP only checks the CC_SERVER.LAST_CHECKIN value for other EPs, at most, every 30 seconds (this is a hard coded value and may not be changed). Because of this, and because of the desire to not mistakenly believe a running EP has stopped, the CEP will actually wait double the HEARTBEAT_INTERVAL value (plus an extra 5 seconds for good measure) for the LAST_CHECKIN value to be updated (before it initiates the last ditch communication attempt).

    So you can cause the CEP to initiate server reassignments for a downed EP faster than what happens now, but I'm not sure you really want to, and I would advise against it.

  • Guest
    Reply
    |
    Oct 23, 2020

    Improvements have been made to server reassignment process.