header

How to identify session stickiness problem on load balancer in PeopleSoft environment?


Simple example of load balancer with session stickiness problem

Suppose you have load balancer in front of two WebLogic webservers. In this simple example, we'll call them webserver A and webserver B. When end users logs into PeopleSoft though the load balancer, the end user connects to one webserver.  Suppose the load balancer routes an end user to webserver A. This end user should remain with webserver A during the entire session. While the end user is working in PeopleSoft, if this end user inadvertently connects to webserver B, this means the load balancer is not maintaining session stickiness. End users may experience intermittent problems such as getting kicked back to search page, gettting kicked out of PeopleSoft, premature timeout, seeing "Page is no longer available", images disappearing, 403 errors, appserver showing that certain UserID are logging in PeopleSoft multiple times (looping) and other strange intermittent problems in PeopleSoft.

To determine if session stickiness may be a problem

1. Test without load balancer

If the problem does not appear without the load balancer, the reported problem is likely caused by the load balancer.

2. Test with one webserver running behind load balancer. 

This is a good test, but results may be unreliable since intermittent problem may still appear with just one webserver because various load balancer configuration options may still point at the load balancer

3. Collect browser headers while replicating problem

Please ask some end users to collect browser headers when they log into PeopleSoft until the intermittent problem appears.  Please download free tools such as: iehttpheaders, LIVE HTTP HEADERS, Fiddler, and etc.  You can find these tools with a simple web search.  Once you have header logs, please look at all the POST/GET requests and cookie values before problem appears, exact time problem appears, and after problem appears. Pay close attention to the weblogic webserver cookie called *PORTAL-PSJESSIONID. This webserver cookie value should remain constant (a long encrypted string value) during the end user's entire PeopleSoft session. If you see that the PORTAL-PSJESSIONID webserver cookie value is randomly changing, your load balancer is not maintaining session stickiness.

4.  Turn on WebLogic extended access logging in both webservers behind the load balancer

For instructions on turning on extended access logging, please see Document 644602.1.  This is the most time consuming option and may impact performance in production, but it may provide the most valuable information.   

The extended logging collects all cookies in the same domain that are forwarded from browser to webserver for all end users logging into PeopleSoft.  These include PeopleSoft cookies, WebLogic cookies, 3rd party cookies, and any custom cookies.
Example 1: Session stickiness problems can cause the following behavior:
  • End user is randomly kicked out of PeopleSoft in load balanced environment.
  • End user kicked back to search page in load balanced environment.
  • Premature timeout in load balanced environment.
  • Random "Page is no longer available" in load balanced environment
  • 403 errors

Here are some troubleshooting tips for the above behavior:
a) Turn on extended access logs in both webservers 
b) Ask end users to attempt to replicate problem.  When problem appears, ask for PeopleSoft UserID and exact time.
c) In webserver A, in extended access logs, search for that UserID. You'll see a PeopleSoft cookie called SignOnDefault that will match that UserID.  For that UserID, please look at all the POST/GET requests and cookie values before problem appears, exact time problem appears, and after problem appears. Pay close attention to the weblogic webserver cookie called *PORTAL-PSJESSIONID. This webserver cookie value should remain constant (a long encrypted string value) during the end user's entire PeopleSoft session. If you see that the PORTAL-PSJESSIONID webserver cookie value is randomly changing, your load balancer is not maintaining session stickiness.
d) In webserver B, in extended access logs, do the same as step c.

Remember, if end user logs into PeopleSoft and connects to webserver A (you should see this UserID in access log for WebServer A) and you start to see entries in webserver B access logs for this UserID, that's a clear sign that load balancer is not maintaining session stickiness.  There is also also a possibility that end user stays on webserver A, but PORTAL-PSJESSIONID is randomly changing.  This also indicates a session stickiness problem.

The extended access logs from the webserver are evidence that the load balancer is not maintaining session stickiness.  Customers should follow up with their load balancer vendor to properly setup session stickiness on the load balancer. 
Example 2: Session stickiness problems can cause the following behavior:
In PeopleSoft load balanced environment, UserID is unknowingly logging in PeopleSoft multiple times (looping) and using up all webserver and appserver resources.

a) Turn on extended access logs in both webservers
b) Make sure appserver psappsrv.cfg has LogFence=3 or higher. Save and bounce appserver
c) In appserver psadmin, select 3 "Domain status menu". Select 2 "Client status". If you see the same UserID appears multiple times, it could be indication of random looping behavior where UserID is unknowningly logging into PeopleSoft hundreds of times. Check appserver log file and search for that UserID to see if this UserID is authenticating multiple times in a short time period. Find the timestamp when this UserID first logged into PeopleSoft in appserver log file

d) In webserver A, in extended access logs, search for that UserID. You'll see a PeopleSoft cookie called SignOnDefault that will match that UserID. For that UserID, please look at all the POST/GET requests and cookie values before problem appears, exact time problem appears, and after problem appears. Pay close attention to the weblogic webserver cookie called *PORTAL-PSJESSIONID. This webserver cookie value should remain constant (a long encrypted string value) during the end user's entire PeopleSoft session. If you see that the PORTAL-PSJESSIONID webserver cookie value is randomly changing, your load balancer is not maintaining session stickiness.

e) In webserver B, in extended access logs, do the same as step d.

Remember, if end user logs into PeopleSoft and connects to webserver A (you should see this UserID in access log for WebServer A) and you start to see entries in webserver B access logs for this UserID, that's a clear sign that load balancer is not maintaining session stickiness. There is also also a possibility that end user stays on webserver A, but PORTAL-PSJESSIONID is randomly changing. This also indicates a session stickiness problem.

The extended access logs from the webserver are evidence that the load balancer is not maintaining session stickiness. Customers should follow up with their load balancer vendor to properly setup session stickiness on the load balancer. 
Example 3: Other random problems that only appear in load balanced environment
This last section just provides some general tips to troubleshoot issues in a load balanced environment. 

a) Turn on extended access logs in both webservers
b) Make sure appserver psappsrv.cfg has LogFence=3 or higher. Save and bounce appserver
c) Ask end user to report when problem appears. Get UserID and time if possible.
d) Review extended access log, appserver log, and all webserver log file for that time period when problem appears.  You may need to cross reference the different log files so see if a certain error or action may have caused the reported problem:

Since extended access logs contain all POST/GET requests and all cookies, it may give a clue what the end user was doing when the problem first appeared. 
  • Are there certain POST/GET requests that seems to trigger the problem?
  • Do you see multiple PeopleSoft webserver cookies *PORTAL-PSJESSIONID? Perhaps end users was visiting other PeopleSoft sites or certain navigation may have caused problem What values do PS_LOGINLIST and ExpirePage contain?
  • Do you see other custom or 3rd party cookies such as load balancer cookies, siteminder cookies, and etc that could be causing problems in load balanced environments? Are these custom cookies suppose to maintain constant or change values?

No comments: