IBM Sterling Ideas

Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:

Post your ideas

IBM is transforming its request for enhancement (RFE) process. The purpose of the transformation is to provide a more consistent experience for you to submit requests and to enable IBM product owners to respond to your requests more quickly. For more information click here.

Start by posting ideas and requests to enhance a product or service. Take a look at ideas others have posted and upvote them if they matter to you,
1. Post an idea
2. Upvote ideas that matter most to you
3. Get feedback from the IBM team to refine your idea

Help IBM prioritize your ideas and requests

The IBM team may need your help to refine the ideas so they may ask for more information or feedback. The offering manager team will then decide if they can begin working on your idea. If they can start during the next development cycle, they will put the idea on the priority list. Each team at IBM works on a different schedule, where some ideas can be implemented right away, others may be placed on a different schedule.

Receive notifications on the decision

Some ideas can be implemented at IBM, while others may not fit within the development plans for the product. In either case, the team will let you know as soon as possible. In some cases, we may be able to find alternatives for ideas which cannot be implemented in a reasonable time.

Control Center 6.1 + improve/speed up when monitored servers get temporarily assigned to another EP

When an EP is stopped, it can take several minutes for the monitored servers to get reassigned to another EP. If there is anyway that your company can improve and/or speed up this process, it will be greatly appreciated.

  • Guest
  • Mar 9 2020
  • Delivered
What is your industry? Non-Industry Specific
How will this idea be used?

When an EP is stopped, it can take several minutes for the monitored servers to get reassigned to another EP. If there is anyway that your company can improve and/or speed up this process, it will be greatly appreciated.

  • Guest commented
    23 Oct, 2020 11:01pm

    Hello!

    What you desire is achievable, now, but please let me explain some things to you first.

    There are two aspects to server reassignment.

    • The first is ascertaining that an EP has stopped running, and

    • The second is to reassign servers to other EPs in the cluster when one EP has stopped.

    Know that Control Center 6.1.3.0 has significant performance improvements compared to 6.1.2, including one in the area of server reassignment. That said, there can still be a greater than one minute, but less than two minute, delay between when an EP actually stops, and the CEP deciding it has stopped.

    This time can be lessened, but at cost to Control Center performance, and it comes with a risk of the CEP falsely deciding that an EP has stopped.

    Details:

    • All running EPs are expected to update the LAST_CHECKIN value for their row in the CC_SERVER table at the rate dictated by their HEARTBEAT_INTERVAL value.

    • The CEP checks CC_SERVER.LAST_CHECKIN, for each EP, periodically, and if it determines the value is too old (more on that algorithm below) it will set the status of the that server to DOWN and reassign its servers according to the policies set. (Note before actually changing the status of an EP to DOWN, the CEP will make an attempt to communicate with it, and only if that fails too, will it change its status to DOWN.)

    • The CC_SERVER.HEARTBEAT_INTERVAL value is set when you install and configure an EP. For EPs, the value set, by default, for HEARTBEAT_INTERVAL is 30000 (the unit is milliseconds, so this is actually 30 seconds).

    • You may manually change this via SQL (while all EPs are stopped). By making this value smaller, you would make the CEP decide sooner that an EP is down than it does now, and then server reassignment would start faster.

    • Making this value smaller, would increase the risk of the CEP mistakenly thinking an EP has stopped, when perhaps instead a temporary condition has prevented the EP from updating its LAST_CHECKIN value in a timely fashion. And then unnecessary, and unwanted, server reassignments would be initiated by the CEP.

    • The CEP only checks the CC_SERVER.LAST_CHECKIN value for other EPs, at most, every 30 seconds (this is a hard coded value and may not be changed). Because of this, and because of the desire to not mistakenly believe a running EP has stopped, the CEP will actually wait double the HEARTBEAT_INTERVAL value (plus an extra 5 seconds for good measure) for the LAST_CHECKIN value to be updated (before it initiates the last ditch communication attempt).

    So you can cause the CEP to initiate server reassignments for a downed EP faster than what happens now, but I'm not sure you really want to, and I would advise against it.

  • Guest commented
    23 Oct, 2020 08:02pm

    Improvements have been made to server reassignment process.

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.