Example Issue

UBE runs for three days and never finishes.

Need a valid profile to answer the question

WHERE IS THE CODE/SYSTEM SPENDING ITS TIME?

The problem question:

How to get that profile?
How to “get your arms around the process”?

Analysis of Long running UBE’s

In JDE.INI [DEBUG] DumpLPDS=0
These settings:

Reduce log file size; debug gets large less quickly
Give a more accurate profile by stripping out stuff which does not help with performance analysis

UBEDebugLevel=0 eliminates UBE ER entries which do not have time stamps and do not add performance relevant data DumpLPDS=0 eliminates dumping BSFN data structures twice for each call.
Approaches

Run the unmodified production use case with debug on
Run the UBE with Reduced Data Selection
Run for 30-60 minutes, then terminate job
Run for a long period, collecting log samples at intervals
Capture runtime call stacks over an extended interval

Run the unmodified production use case with debug on
Pros:

This is the ideal case – the entire run is captured beginning to end
No compromise to data sample.

Cons:

For UBEs longer than two hours (absolute maximum), this method is almost certainly not viable.
Logs get too large too fast: several GB per hour
Some customers have 2GB file size limit on the operating system.
- Note that this is usually modifiable
Debug logging adds a factor of 2-3 to the runtime

Reduced Data Selection

Pros:

Shorter, manageable run
Smaller log size
Job can finish, so a complete start-to-finish picture results
No need to terminate the job

Cons:

Under sampling
Short time frame skews the profile
Fixed-cost / one-time hits at startup will be exaggerated
A 10-minute run of a 10 hour job will NOT give a reliable profile
Avoidance of problems you are trying to observe
If the problem is related to a sp specific data range, this may be missed
Job may be in infinite loop

Run for 30-60 minutes, then terminate job

Pros:

Less risk from under sampling
A reasonable sized log in the few GB range

Cons:

UBE may have multiple sections, and the real problems may occur in a section that is never reached in the 30-60 minutes
- Get a listing of the UBE’s ER to help get a clearer picture
- Look at the UBE in the RDA tool
What the UBE is doing in the first hour may NOT be the same as what it’s doing in the third hour
- Remember: a one-hour run with debugging on is probably 20-30 minutes of runtime without debug logging on
If the job is running the same code for a long period – but slowing down gradually – this will also be missed by just getting the first hour
Killing the job prematurely means not seeing the complete picture
The graceful end of a UBE contains important indications of cache memory leaks:
- Such as jdeCacheInit()not matched with jdeCacheTerminateAll()
- Failure to call jdeCacheTerminateAll() will call the jdeCacheDestroyAllUserCaches()
  - This will prove difficult to count and match all jdeCacheInit() calls, as they may be initialized and closed in as many BSFNs.
Multiple sections in UBEs
- When UBE processing contains multiple sections which process in serial fashion, a one-hour sample at the start of the run may never even capture a sample of the serious problem.
Problem data ranges in UBEs
- Even if there is only a single section which does all the processing, specific data ranges later in the process may trigger slower throughput.
- Other factors may cause a precipitous drop in throughput later in the process, such as memory consumption reaching thresholds.

Run for a long period, collecting log samples at intervals

Pros:

Obtain profiles for a much longer period
Perhaps collect a 30 minute log every 2-4 hours

Cons:

The run needs monitoring, babysitting
Process is a bit of a “kludge”
Takes a long time, requires machine resources during that time

Capture runtime call stacks over an extended interval

DO NOT enable EOne debug logging

Capture a set of snapshots at different intervals over a long run

Pros :

Can combine this method with other monitoring
Poor man’s “manual” sampling…can be effective
Use existing operating system commands
Debug code NOT required
Can help to spot infinite looping behavior

Cons :

Raw call stacks are a bit obscure, not as intuitive to read
The run needs monitoring, babysitting
Takes a long time, requires machine resources during that time

Conclusion

If the use case is less than two hours:
- Run the unmodified production use case with debug on
In general, avoid reducing data selection
- But this MAY help in identifying cache leaks – since the UBE finishes gracefully
- Can view the end of the log file for missing cache terminates
Perhaps a reduced data selection case could be run in addition to one of the longer use cases
- This would allow the end of the job to be captured
Try a one hour terminated run first
- In general one single long-running SELECT should NOT drive the analysis
Next – try log samples throughout the run
Finally, try call stack samples

JD Edwards CNC Knowledge Share

Labels

Tuesday, May 13, 2014

How to Analyze Long running UBE’s

Example Issue

Need a valid profile to answer the question

The problem question:

Analysis of Long running UBE’s

Reduced Data Selection

Run for 30-60 minutes, then terminate job

Run for a long period, collecting log samples at intervals

Capture runtime call stacks over an extended interval

No comments:

Post a Comment

Featured Post

Generate single index on table

Popular Posts

About Me