Workflow Deferred Agent Listener not starting
Understanding Workflow Deferred Agent
WF_DEFERRED is the main queue for the deferred events that need to be processed offline. "Workflow Deferred Agent Listener" must be running to clear the events from this queue. Several EBS modules have subscribers to various Workflow BES events with a phase >100 which means that the subscribers will be deferred for the agent listeners to process later. Unless these deferred events are processed by agent listeners, QMON or any other module-specific cleanup event will not remove any events from the WF_DEFERRED queue and hence the backlog. You should configure agent listeners to run by default in Oracle Applications even if you do not use Oracle Workflow.
Trouble shooting
1) Take a snapshot of events in WF_DEFERRED at the beginning of troubleshooting
This is important so as to compare with the results of the same select during the course of troubleshooting.
select wfd.user_data.event_name EVENT_NAME,
decode(wfd.state,
0, '0 = Ready',
1, '1 = Delayed',
2, '2 = Retained',
3, '3 = Exception',
to_char(substr(wfd.state,1,12))) State,
count(*) COUNT
from applsys.wf_deferred wfd
group by wfd.user_data.event_name, wfd.state
order by 3 desc, 1 asc;
select corrid,
decode(state,
0, '0 = Ready',
1, '1 = Delayed',
2, '2 = Retained',
3, '3 = Exception',
to_char(state)) State,
count(*) COUNT
from wf_deferred
group by corrid, state;
Take note of the results of this query and make sure there are agent listeners running that will include these events or the parent of the events, for example, oracle.apps.fnd.% for all FND events.
This select will provide information on all Agents Listeners, including Thread Counts, Correlation_Id, Startup_Mode, and current Status:
Check the status of the Workflow Mailer(s) and their thread counts and when they started
select fsc.COMPONENT_ID, fcq.USER_CONCURRENT_QUEUE_NAME Container_Name,
fsc.COMPONENT_NAME,
DECODE(fcp.OS_PROCESS_ID,NULL,'Not Running',fcp.OS_PROCESS_ID) PROCID,
fcq.MAX_PROCESSES TARGET,
fcq.RUNNING_PROCESSES ACTUAL,
v.PARAMETER_VALUE "#THREADS",
fcq.ENABLED_FLAG ENABLED,
fsc.CORRELATION_ID,
fsc.STARTUP_MODE,
fsc.component_type,
fsc.COMPONENT_STATUS,
to_char(fsc.last_update_date,'DD-MON-YYYY HH24:MI:SS') "STARTED", round(sysdate-fsc.last_update_date,0) "DAYS RUNNING"
from APPS.FND_CONCURRENT_QUEUES_VL fcq, APPS.FND_CP_SERVICES fcs,
APPS.FND_CONCURRENT_PROCESSES fcp, apps.fnd_svc_components fsc, apps.FND_SVC_COMP_PARAM_VALS_V v
where v.COMPONENT_ID=fsc.COMPONENT_ID
and fcq.MANAGER_TYPE = fcs.SERVICE_ID
and fcs.SERVICE_HANDLE = 'FNDCPGSC'
and fsc.concurrent_queue_id = fcq.concurrent_queue_id(+)
and fcq.concurrent_queue_id = fcp.concurrent_queue_id(+)
and fcq.application_id = fcp.queue_application_id(+)
and fcp.process_status_code(+) = 'A'
and v.PARAMETER_NAME = 'PROCESSOR_IN_THREAD_COUNT'
and fsc.component_type = 'WF_MAILER'
order by fsc.COMPONENT_STATUS desc;
2) Restart the Workflow Agent Listener Service
Bounce the AQ Time Manager and try to restart the services.
SQL > exec dbms_aqadm.stop_time_manager;
SQL> exec dbms_aqadm.start_time_manager;
In some cases, restarting the Service Containers and Service Components gets the processing going again. Where possible, this should be tried first and monitor the queue to see if events start to process.
3) Create additional generic agent listeners for WF_DEFERRED agent. (Optional)
Sometimes the performance problem is with some specific events. Creating additional generic agent listeners (Correlation_ID is null) may help as it would make other threads available to continue processing other events in the Queue. Monitor if events start processing by re-running the SQL in Step 1 and comparing the count of each correlation with the results of the initial run of this SQL.
The same result could be achieved by multithreading the generic agent listener by increasing the thread count for Workflow Deferred Agent Listener to 5 from 1. Threads can be increased using one of the following methods:
- Using the admin script $FND_TOP/sql/afsvcpup.sql by changing the value of parameter PROCESSOR_IN_THREAD_COUNT from 1 to 5.
- From OAM, Workflow Manager:
Select "Agent Listeners"
Select the radio button for the "Workflow Deferred Agent Listener" then click on "Edit"
Next to step 2 of 4
Set Inbound Thread Count to 5 from the current of 1
Next
Next
Finish
Restart the Agent Listener.
4) Create specific agent listeners for each of the correlation in READY mode(Optional)
If additional generic agent listeners do not alleviate the problem, then consider creating specific listeners using a correlation_id for each of the event_names in READY mode and monitor if events start processing.
Creating specific agent listeners to process events or groupings of events such as one for oracle.apps.fnd% could help in isolating the events that are causing the performance. The events causing the problem would be those whose number of events in READY(0) state are not reducing as compared to the others.
The following select statement lists the Correlation of existing Agents listeners and makes it easy to get a picture of how existing Agent listeners are setup instead of going through OAM:
select component_id, component_name, correlation_id
from apps.fnd_svc_components;
Additional Agent listeners can be created in OAM with the following steps:
Log in to OAM and select the "Site Map" link
Select "Service Components" link under Workflow
Select "Create" button in the Service Components page.
Select "Workflow Agent Listener' as the type in the Pick Component Type page
Select "Continue" button
Enter the Listener details on each of the 4 pages and submit.
5) Run the problematic events listener from SQLPlus with trace enabled
After isolating the problem to specific events, the next step would be to run a listener from SQLPlus for those specific events with trace enabled in order to capture more information on the problem area. This can be done from SQLplus while connected to APPS schema:
alter session set max_dump_file_size=unlimited;
alter session set events '10046 trace name context forever, level 12';
execute WF_EVENT.Listen(p_agent_name=>'WF_DEFERRED', p_correlation=> '<the correlation of the poorly performing listener>');
This will generate the trace files in user_dump_dest which can be located with the following select:
select value from V$PARAMETER where name like 'user_dump%';
6) Change retention time for WF_DEFERRED to 0
If during the agent's listeners outage the WF_DEFERRED queue has grown to an unmanageable size, set the retention time for WF_DEFERRED to 0 from the default value of 86400. This will ensure that events that have been processed are removed from the queue immediately by QMON instead of remaining in the queue for 1 day which is the default. The following SQL can be used for updating the queue retention while logged into APPS schema:
dbms_aqadm.alter_queue
( queue_name => 'APPLSYS.WF_DEFERRED',
max_retries => 5,
retry_delay => 3600,
retention_time => 0
);
Please note that the recommendation is having retention of 1 day for debugging purposes and this should be reverted as soon as the queue is back to a manageable size.
We can manually also purge the processed events:
All the WF's with "Retained" Status should not be there in your WF_DEFERRED table, it needs to purge no matter what the dates are. There should be no records with the status "Retained" as per the WF's functionality.
Stop the WF services and run below
declare
po dbms_aqadm.aq$_purge_options_t;
begin
po.block := true;
dbms_aqadm.purge_queue_table(queue_table => 'APPLSYS.WF_DEFERRED',
purge_condition => NULL,
purge_options => po);
end;
/
7) Process events in batches in SQL
If for any reason the agent listener is not able to process the events without erroring, the following can be used to process the messages in small batches while logged into SQLplus as APPS:
set serveroutput on size 1000000;
declare
msgCount integer := 10000;
errCount integer := 10000;
begin
wf_event.listen(p_agent_name => 'WF_DEFERRED',
p_correlation => '<correlation_to_processed>',
p_message_count => msgCount,
p_max_error_count => errCount);
dbms_output.put_line('MESAGES: '||to_char(msgCount)||' '||
'ERRORED: '||to_char(errCount));
end;
Post a Comment
Post a Comment