There is nothing more frustrating for a Database Administrator than flying blind during a critical database crisis. Imagine staring at a production session consuming 100% CPU. You know the standard
EXPLAIN PLAN output is lying to you because of bind variable peeking or adaptive cursor sharing. You desperately need the ground truth of what the execution engine is actually doing. Naturally, you reach for the DBA’s ultimate weapon: DBMS_SQLTUNE.REPORT_SQL_MONITOR.
However, instead of rendering a beautiful, interactive HTML report showing the granular execution flow and wait events, the Oracle database throws a massive curveball. The function fails unexpectedly, throwing cryptic internal errors like ORA-27196 or ORA-06512. This isn’t just a simple error; it’s a complete deadlock in your troubleshooting workflow. You are trying to diagnose a severe performance bottleneck, but the primary diagnostic tool itself is broken due to a memory allocation failure during XML serialization. Below is the definitive guide to bypassing this internal bug using a surgical event trace.
- 1. My Troubleshooting Logs: The Discovery
- 2. Architectural Deep Dive: Why XML Serialization Fails
- 3. The Fix: Enabling Event Trace 31156
- 4. Strategic Execution: Forcing the Monitor Hint
- 5. Precision Retrieval: Extracting the SQL_ID
- 6. The Payoff: Generating the Active HTML Report
- 7. The Cleanup and FAQ
1. My Troubleshooting Logs: The Discovery
I vividly recall working on an Oracle 12.1 legacy system (running on Linux x86-64) during a critical month-end close. A complex aggregation query was hanging the system. I executed the SQL Monitor report command, fully expecting the standard CLOB output. Instead, my SQL*Plus session spat this out:
ERROR at line 1:
ORA-27196: skgfcls: write error
ORA-06512: at "SYS.DBMS_SQLTUNE", line 13524
ORA-06512: at line 1
My very first instinct was to check the filesystem space, as ORA-27xxx errors often mimic underlying OS-level I/O issues. However, the disk mount points had plenty of free space. Next, I assumed it was a PGA memory exhaustion issue caused by an excessively large execution plan. I wasted 45 minutes tweaking the PGA_AGGREGATE_TARGET parameter with absolutely no luck. Finally, the realization hit me: this wasn’t a resource configuration issue; it was an internal C-code bug within Oracle’s XML DB engine.
2. Architectural Deep Dive: Why XML Serialization Fails
To understand the fix, you must understand how REPORT_SQL_MONITOR operates under the hood. When you request an Active HTML report, Oracle doesn’t just pull text from a view. It aggregates massive amounts of granular data from V$SQL_MONITOR (the header information) and V$SQL_PLAN_MONITOR (the step-by-step execution details, including parallel skew and I/O waits).
Once this raw data is collected from the SGA (System Global Area), the database attempts to serialize it into a massive, highly structured XML object. This XML is then wrapped in HTML and JavaScript to create the interactive UI. The ORA-27196 (which technically translates to an “OS Dependent Error”) occurs because a specific memory pointer fails during the construction of this large XML buffer. The Oracle XML DB kernel essentially panics during the memory allocation phase of the XML write process, crashing the entire PL/SQL function call.
3. The Fix: Enabling Event Trace 31156
Since we cannot apply a binary patch to a production database in the middle of a business day to fix an XML bug, we need a session-level workaround. We are going to use a diagnostic event trace. The specific event is 31156. This is not a parameter you will find in standard Oracle tuning documentation; it interacts directly with the lowest levels of the Oracle XML DB kernel.
-- Apply the surgical trace to your current troubleshooting session only
ALTER SESSION SET EVENTS '31156 trace name context forever, level 0x400';
By setting this event to level 0x400, we are essentially instructing the database engine to bypass the specific integrity checks and memory management code paths that are triggering the crash. Activating this flag allows the XML serializer to finish its job without hitting the invalid pointer. It is a bypass, not a permanent fix, but it works perfectly for emergency data extraction. Without this, similar to the complex issues I discussed in my guide on DBMS_HPROF PL/SQL Bottlenecks, you simply cannot gather the required diagnostics.
4. Strategic Execution: Forcing the Monitor Hint
Now that the safety trace is active on our session, we need to run the problematic query. However, we cannot run it blindly; we must inject intelligence into it.
First, we must use the /*+ MONITOR */ hint. Oracle has an internal parameter called _sqlmon_threshold (usually defaulting to 5 seconds). If your query is running extremely fast but executing thousands of times per minute (a high-frequency execution issue), it will not trigger the monitor automatically. We want to force monitoring regardless of the duration.
Second, we add a unique “fingerprint” comment, such as /* kolee_test_20250526_1 */. In a production system with tens of thousands of active cursors, finding your specific execution in V$SQL is incredibly difficult. This tag allows us to programmatically retrieve the exact ID later.
SELECT /*+ monitor */ /* kolee_test_20250526_1 */
SUBSTR(join_dt, 1, 4) AS join_yyyy,
COUNT(*) AS cnt
FROM tuner.tb_cust
GROUP BY SUBSTR(join_dt, 1, 4)
ORDER BY cnt DESC;
5. Precision Retrieval: Extracting the SQL_ID
The query has executed, and the statistics are currently sitting in the transient SGA memory. Now we need the key to unlock them: the SQL_ID. We cannot pass the raw SQL text to the reporting function; it requires the internal hash identifier.
The script below searches V$SQL using our unique tag. Crucially, we include exclusions like AND a.sql_text NOT LIKE '%v$sql%'. If we do not exclude the search query itself, we might accidentally retrieve the ID of our search operation instead of the target application query!
SELECT a.sql_id,
a.plan_hash_value,
a.child_number,
SUBSTR(a.sql_text, 1, 50) AS sql_text_50
FROM v$sql a
WHERE a.sql_text LIKE '%kolee_test_20250526_1%'
AND a.sql_text NOT LIKE '%v$sql%'
AND a.sql_text NOT LIKE '%explain plan%';
Automate this retrieval. Do not eyeball it. I once had a junior DBA copy a SQL_ID from an email sent the previous day. He didn’t realize that due to nightly statistics gathering, the execution plan had changed, and the SQL_ID mapped to a completely new child cursor. We spent hours analyzing a plan that wasn’t even running in production anymore. Always verify the
PLAN_HASH_VALUE alongside the SQL_ID.
6. The Payoff: Generating the Active HTML Report
This is what we came for. We call DBMS_SQLTUNE.REPORT_SQL_MONITOR, passing the exact SQL_ID we just retrieved, and we set type=>'ACTIVE' and event_detail=>'YES'. Setting event details is mandatory; without it, you see the total time spent, but you are blind to what the database was actually waiting for (e.g., ‘db file scattered read’ vs ‘latch free’).
SET TRIMSPOOL ON
SET TRIM ON
SET PAGES 0
SET LINESIZE 1000
SET LONG 1000000
SET LONGCHUNKSIZE 1000000
SPOOL /tmp/sql_monitor_report.html
SELECT DBMS_SQLTUNE.REPORT_SQL_MONITOR(
sql_id => '3509ydhmfxrhj',
type => 'ACTIVE',
event_detail => 'YES'
)
FROM dual;
SPOOL OFF
Because we have the 31156 trace enabled, the internal XML serializer completely bypasses the memory check and successfully builds the CLOB. The resulting HTML file is the “Smoking Gun.” I once dealt with a third-party vendor who aggressively insisted their code was perfectly optimized. I sent them the HTML file generated by this exact process. It visually displayed a bright red bar indicating 99% DB Time spent on a “Table Access Full” of a 50GB table, executed 20,000 times inside a nested loop. There was no arguing with the red bar.
7. The Cleanup and FAQ
We successfully extracted our report, but we must act as professionals. Leaving diagnostic events enabled in a session is dangerous, especially if your application uses connection pooling where this modified session might be handed back to another user.
-- Disable the trace and return the session to normal
ALTER SESSION SET EVENTS '31156 trace name context off';
Frequently Asked Questions (FAQ)
Q: Is this ORA-27196 bug fixed in newer versions of Oracle?
A: Yes. This issue is formally documented under Oracle Bug 22843562 (“IMPORT OF A XML FILE WITH A COMMENT AT THE END FAILS WITH ORA-27163”). While it heavily impacts 11g and 12cR1, the underlying XML DB memory allocation logic has been rewritten and permanently resolved in Oracle 19c (specifically 19.18 RU and higher). Until you upgrade, this trace is your lifeline.
Q: Does the /*+ MONITOR */ hint add overhead to the query execution?
A: Yes, but it is incredibly minimal (usually less than 1-2%). The execution engine must perform extra memory allocations in the SGA to track the row counts and timings for each step of the execution plan. For a troubleshooting session, this overhead is absolutely negligible compared to the value of the diagnostic data you receive.
Q: My report generated, but the ‘Plan’ section is completely blank. Why?
A: This typically happens if the cursor has already aged out of the shared pool, or if you did not properly set the SET LONG and SET LONGCHUNKSIZE parameters in SQL*Plus before spooling. The XML gets truncated mid-stream, causing the browser’s JavaScript engine to fail when rendering the graphical plan. Always ensure your SQL*Plus environment variables are configured to handle massive CLOB data.
