- 1. the core concept: why smart scan changes everything
- 2. the hidden killers: checking your environment parameters
- 3. the dangerous trap: storage full versus real offloading
- 4. the silent blocker: tracking down non-offloadable functions
- 5. empirical verification: proving io savings with hard numbers
1. the core concept: why smart scan changes everything
if your company is paying the massive premium for oracle exadata hardware, you absolutely expect blazing fast input output performance. yet, i constantly see massive enterprise environments running on oracle 19c where the smart scan feature is silently and completely failing.
in a completely traditional, non-exadata database environment, when you execute a seemingly simple query against a massive table, the underlying storage array blindly sends every single data block across the physical network directly to the database server. the database server cpu then has to desperately filter out the millions of rows that do not match your query conditions. this legacy method is incredibly inefficient and severely clogs the network. oracle exadata smart scan completely flips this outdated model on its head. it intelligently pushes the complex sql filtering logic and column projection directly down to the storage layer, known as the cell server. the individual storage cells filter the raw data locally on their own processors and return only the relevant matching rows back across the network. this massive, unprecedented reduction in interconnect traffic is exactly what we call cell offloading.
i clearly remember a massive migration project where we proudly moved a fifty-terabyte data warehouse from a legacy unix system straight onto a brand new exadata machine. management expected a ten times performance boost right out of the box. instead, the very first nightly batch run was actually slower. why? because the migration team had quickly imported the database using data pump but completely forgot to reset the table statistics and specific physical attributes. the system was doing generic multiblock reads but absolutely no smart table scans. smart scan is definitely not automatic just because you bought the expensive hardware; it absolutely requires specific access paths, like direct path read, to actually trigger.
2. the hidden killers: checking your environment parameters
the absolute most common reason smart scan silently fails is the sudden loss of direct path reads. smart scan relies entirely on reading data directly from the physical disk straight into the private program global area, completely bypassing the shared buffer cache. if the oracle optimizer stubbornly decides to read data via the buffer cache, which we call a conventional path read, smart scan becomes physically impossible. this specific decision is heavily controlled by deeply hidden internal parameters. furthermore, if the master switch parameter is set to false, the entire offloading feature is globally disabled across the machine. before i ever start deep diving into execution plans, i always run this specific script to check the sanity of the core environment parameters.
-- strictly check the deeply hidden parameters controlling smart scan behavior
-- 1. cell_offload_processing absolutely must be set to true
-- 2. _serial_direct_read strictly controls the physical read decision for serial queries
select a.ksppinm "parameter", b.ksppstvl "session value", c.ksppstvl "instance value"
from x$ksppi a, x$ksppcv b, x$ksppsv c
where a.indx = b.indx and a.indx = c.indx
and a.ksppinm in (
'cell_offload_processing',
'_serial_direct_read',
'cell_offload_plan_display'
);
if the serial direct read parameter is bizarrely set to never or false, your basic serial queries will aggressively flood the buffer cache and completely kill your server performance. in modern 19c environments, the default is usually auto. we once had a highly critical nightly batch job that suddenly degraded horribly right after a minor patch. the execution plan looked perfectly fine, but the actual runtime went from twenty minutes to three agonizing hours. i ran the script above and shockingly found that the direct read parameter was set to false at the system level. some junior admin had changed it to fix a tiny online transaction issue, unknowingly sabotaging the entire data warehouse batch. setting it back to auto instantly restored smart scan and the twenty-minute runtime.
3. the dangerous trap: storage full versus real offloading
this is exactly where ninety percent of database administrators get completely confused. you generate a standard execution plan, and you happily see the words table access storage full in the operation column. you instantly assume smart scan is working perfectly. you are completely wrong.
the specific keyword storage in that column only means the internal optimizer is casually considering sending the work to the cell servers. it absolutely does not confirm that your specific where clause predicates were actually pushed down. to definitively confirm success, you must aggressively inspect the predicate information section located at the very bottom of the execution plan. you are specifically looking for the word storage acting as a wrapper around your filter conditions. if you only see the word filter without that specific wrapper, all the heavy filtering math is still happening on the database server cpu, completely wasting the exadata hardware.
-- aggressively pull the advanced execution plan to see predicate distribution
-- look closely for the "storage" wrapper in the predicate information section
select * from table(dbms_xplan.display_cursor(
sql_id => 'your_sql_id_here',
cursor_child_no => 0,
format => 'advanced allstats last -rows'
));
4. the silent blocker: tracking down non-offloadable functions
the massive exadata storage cell basically runs a highly stripped-down, specialized version of the oracle database software. it perfectly knows how to do basic math, simple string comparisons, and standard logic. it absolutely does not know how to execute complex custom pl/sql code, heavy xml parsing, or certain highly obscure analytical functions. if your query where clause contains a custom function written by your developers, the storage cell simply throws its hands up in defeat and sends absolutely all the raw data up to the database kernel to handle the logic. this completely kills your offloading efficiency. we heavily use the internal metadata view to check exactly which functions are currently supported.
-- check if a specific developer function is actually supported for offloading
select name, version, offloadable
from v$sqlfn_metadata
where offloadable = 'yes'
and name like '%upper%'
order by name;
we once had a massive fraud detection system that filtered millions of transactions using a custom developer function. the developers proudly thought wrapping logic in a function was super clean code. on an exadata machine, it was an absolute disaster. because it was a user-defined function, the cells had to return one hundred percent of the massive table rows to the database server. we aggressively rewrote the logic to use standard sql case statements instead. this simple change allowed the storage cells to successfully filter ninety-nine percent of the clean transactions locally, massively reducing the physical network payload by several terabytes.
5. empirical verification: proving io savings with hard numbers
ultimately, the only proof that matters to management is in the hard numbers. the dynamic sql performance view contains two incredibly critical columns. the eligible bytes column tells you exactly how much physical data you would have normally read from the disk. the returned bytes column tells you exactly how much data actually crossed the physical network wire after the storage cells did their filtering job. a truly healthy smart scan should show a massive, undeniable gap between these two numbers. if they are exactly equal, you are not saving a single byte of network bandwidth.
-- the ultimate money script: perfectly quantify your smart scan efficiency
-- this script clearly returns the exact megabytes saved by physical offloading
select sql_id,
round(physical_read_bytes/1024/1024, 1) as phy_rd_mb,
round(io_cell_offload_eligible_bytes/1024/1024, 1) as eligible_mb,
round(io_cell_offload_returned_bytes/1024/1024, 1) as returned_mb,
round((io_cell_offload_eligible_bytes - io_cell_offload_returned_bytes) /
nullif(io_cell_offload_eligible_bytes, 0) * 100, 2) as savings_pct
from v$sql
where sql_id = 'your_sql_id_here'
and child_number = 0;
during a highly stressful quarterly performance audit, the network engineering team aggressively flagged high bandwidth usage on our core infiniband switches. i quickly ran this exact script and found a terrible query with one terabyte eligible and exactly one terabyte returned, meaning absolute zero percent savings. the developer’s query was literally asking for all rows where a specific column was not null. since that specific column was physically defined as not null in the table structure, the storage cells stupidly had to send every single row back across the wire. we strictly advised the business team to add a simple date filter, which immediately boosted the network savings to ninety-five percent and finally silenced the angry network alarms.
