jump to navigation

Bitmap Index On Column With 212552698 Distinct Values, What Gives? (I’d Rather Be High) April 23, 2019

Posted by Richard Foote in Advanced Index Compression, Autonomous Data Warehouse, Autonomous Database, Index Compression, Oracle Indexes.
add a comment

The Next Day

In my previous post on Indexing The Autonomous Warehouse, I highlighted how it might be necessary to create indexes to improve the performance and scalability of highly selective queries, as it might on any Data Warehouse running on an Exadata platform. In the post, I created a Bitmap Index and showed how improve SQL performance insured.

I’ve had two subsequent correspondences asking why I decided to use a Bitmap Index in my example, when the column in question had a (relatively) massive number of distinct values, some 212552698. What gives, wouldn’t it have been far more efficient to have used a standard B-Tree index instead, as Bitmap Indexes are only suitable for columns that have low-medium numbers of distinct values? There’s nothing particularly low-medium cardinality about 212552698 distinct values.

It’s a good question and as 2 people asked basically the same question, thought it worthy of a separate post to try to address it.

Firstly, I must admit the only calculation I performed mentally when deciding which type of index to use was to look at the number of distinct values, 212552698 and the number of rows in the table, 1 billion and determine that there’s approximately 5 rows per distinct value. With 5 repeated values on average, this was sufficient for the Bitmap Index to likely be the more efficient type of index and so I created a Bitmap Index in my demo, considering this was a Data Warehouse environment where potential locking issues relating to concurrent DMLs was not an issue.

As I’ve discussed previously, as with this classic post from way back in 2010, “So What Is A Good Cardinality Estimate For A Bitmap Index Column“, it’s not the number of distinct values that’s important, it’s the column cardinality and the average number of repeated column values that’s important when deciding if a Bitmap Index might be appropriate in a Data Warehouse environment.

As above post discusses, a Bitmap Index might only need the one index entry per column value and as the resultant index bitmap string can be very very highly compressed if there are few actual occurrences per value, a column with as many as 5 repeated values of average would likely be more efficient than a corresponding B-Tree Index. A B-Tree Index with 5 repeated values would require the value to be stored 5 times, with their 5 corresponding rowids, whereas the Bitmap Index might only need to store the indexed value once with its 2 corresponding rowids and a trivial amount for the highly compressed bitmap string.

If we create an equivalent B-Tree Index to the previously created Bitmap Index:

SQL> create index big_ziggy_lo_orderkey_i2 on big_ziggy(lo_orderkey) invisible;

SQL> select index_name, index_type, leaf_blocks, compression from user_indexes
     where table_name = 'BIG_ZIGGY';

INDEX_NAME                     INDEX_TYPE LEAF_BLOCKS COMPRESSION
------------------------------ ---------- ----------- -------------
BIG_ZIGGY_LO_ORDERKEY_I        BITMAP          843144 DISABLED
BIG_ZIGGY_LO_ORDERKEY_I2       NORMAL         2506552 DISABLED

We notice the Bitmap Index at 843144 leaf blocks is indeed substantially smaller than the equivalent B-Tree Index at 2506552 leaf blocks.

However, this is the Oracle Autonomous Data Warehouse environment where I have accessed to the Advanced Compression option and the capability to potentially create highly compressed B-Tree index structures. See various previous posts on Index Advanced Compression.

Perhaps, I can create a smaller structure to the Bitmap Index by using Index Advanced Compress. Let’s try initially with Advanced Compression Low:

SQL> drop index big_ziggy_lo_orderkey_i2;

Index dropped.

SQL> create index big_ziggy_lo_orderkey_i3 on big_ziggy(lo_orderkey) compress advanced low invisible;
...

SQL> select index_name, index_type, leaf_blocks, compression from user_indexes
     where table_name = 'BIG_ZIGGY';

INDEX_NAME                     INDEX_TYPE LEAF_BLOCKS COMPRESSION
------------------------------ ---------- ----------- -------------
BIG_ZIGGY_LO_ORDERKEY_I        BITMAP          843144 DISABLED
BIG_ZIGGY_LO_ORDERKEY_I3       NORMAL         1921296 ADVANCED LOW

We notice the B-Tree Index is indeed now smaller at just 1921296 leaf blocks, down from 2506552 leaf blocks, but still not as small as the 843144 leaf blocks of the Bitmap Index.

However, this is a Data Warehouse environment where the DML overheads associated with the maintenance of Advanced Compression High indexes may not be of such concern. Additionally, I might have ample CPU resources to cater for any additional CPU requirements of accessing such Advanced Compression High indexes. Therefore, let’s create a B-Tree Index with Advanced Compression High:

SQL> drop index big_ziggy_lo_orderkey_i3;

Index dropped.

SQL> create index big_ziggy_lo_orderkey_i4 on big_ziggy(lo_orderkey) compress advanced high invisible;
...

SQL> select index_name, index_type, leaf_blocks, compression from user_indexes
     where table_name = 'BIG_ZIGGY';

INDEX_NAME                     INDEX_TYPE LEAF_BLOCKS COMPRESSION
------------------------------ ---------- ----------- -------------
BIG_ZIGGY_LO_ORDERKEY_I        BITMAP          843144 DISABLED
BIG_ZIGGY_LO_ORDERKEY_I4       NORMAL          786537 ADVANCED HIGH

We notice the index has significantly reduced further in size and at just 786537 leaf blocks in now indeed smaller than the corresponding Bitmap Index.

Note that although it might be beneficial to use Advanced Compressed High on Bitmap Indexes, such an option is not currently supported.

Of course your mileage will vary on which Index Type and Compression Option will be most appropriate depending on the characteristics of your data, however it might be worth considering these options in environments where you have access to these indexing options.

But yes, a Bitmap Index might indeed be the better option, even if there are billions of distinct values (if there are say 10s of billions of rows) for the columns to be indexed…

Advertisements

Indexing The Oracle Autonomous Data Warehouse (Autobahn) April 2, 2019

Posted by Richard Foote in Autonomous Data Warehouse, Autonomous Database, Oracle Indexes.
2 comments

autobahn

When the Autonomous Data Warehouse was initially released in March 2018, one of the notable restrictions was that it did not support typical Data Warehouse performance related structures such as Indexes, Partitioning or Materialized Views. The platform was deemed efficient enough to not require these features, which would unnecessarily complicate the autonomous environment.

In September 2018, these restrictions were finally lifted with the (for now) manual creation of Indexes, Partitioning and Materialized Views now all supported. Oracle however states that “Oracle recommends that you do not manually create these structures, and leave performance optimizations to the Autonomous Data Warehouse. If you’re a highly skilled Oracle Database tuning expert and decide to manually create these access structures, please be advised to test the impact of your manual tuning efforts on your full workload.

If you’re not a “highly skilled Oracle Database tuning expert”, let me give you some advice on when to create indexes in the Autonomous Data Warehouse (on Oracle Database 18c where these features are NOT automatically created and maintained by autonomous processes).

I’ll initially focus on two of the key capabilities available in the Autonomous Data Warehouse environment that might indeed make some index structures unnecessary, although neither is unique to autonomous databases.

Let’s begin by creating a relatively large table based on the huge LINEORDER table found in the provided SSB schema. The following BIG_ZIGGY table is based on 1 billion rows from the LINEORDER table:

SQL> create table big_ziggy select * from ssb.lineorder where rownum<=1000000000;

Frustratingly, if you have a session that performs a database operation for an extended period, the session looses its connection and hangs indefinitely. You need to start a separate session to confirm when the task actually completes and track progress by say looking a v$session_longops or querying dba_segments.

So we have a decently sized BIG_ZIGGY table at around 27GB:

SQL> select table_name, num_rows, blocks, round(blocks/128) MB
from user_tables where table_name='BIG_ZIGGY';

TABLE_NAME     NUM_ROWS     BLOCKS         MB
------------ ---------- ---------- ----------
BIG_ZIGGY    1000000000    3534714      27615

Here are a couple of key points.

Firstly, if you run a query on a table using filtering columns for the first time, then none of the smarts in the autonomous database environments are going to help you here. Things will initially run slowly, because Oracle would not have detected yet there is anything that needs to be tuned. The tuning process is “reactive”, in that Oracle needs to find an issue first before it can potentially address it.

The second key point is that if only a small fraction of the overall data is required or need be accessed, if the data being accessed is very highly filtered, then a database index is likely going to be highly effective, even within the Autonomous Data Warehouse. This is precisely why indexes are typically required in databases.

Although very often, large volumes of data are indeed accessed in typical Data Warehouse workloads, this is not always the case. It’s certainly not uncommon for ad-hoc queries and the such to only access a small fraction of an available data set.

The following query on the large BIG_ZIGGY table only returns 9 rows:

SQL> select * from big_ziggy
where lo_orderkey = 2294059393;

9 rows selected.

Elapsed: 00:06:23.99

Execution Plan
----------------------------------------------------------
Plan hash value: 2097953955

---------------------------------------------------------------------------------------------------------
| Id | Operation                   | Name                       | Rows | Bytes | Cost (%CPU)| Time      |
---------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                            |    5 |   505 |  67250 (12)|  00:00:03 |
|  1 |  RESULT CACHE               | 2u5qwsq68ghc5bqn5spqhyqjyj |      |       |            |           |
|* 2 |   TABLE ACCESS STORAGE FULL | BIG_ZIGGY                  |    5 |   505 |  67250 (12)|  00:00:03 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage("LO_ORDERKEY"=2294059393)
     filter("LO_ORDERKEY"=2294059393)

Result Cache Information (identified by operation id):
------------------------------------------------------

1 - column-count=17; dependencies=(BOWIE.BIG_ZIGGY); name="select * from big_
ziggy where lo_orderkey = 2294059393"

Note
-----

- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
      1 recursive calls
      0 db block gets
3530785 consistent gets
3530770 physical reads
      0 redo size
   2116 bytes sent via SQL*Net to client
    552 bytes received via SQL*Net from client
      2 SQL*Net roundtrips to/from client
      0 sorts (memory)
      0 sorts (disk)
      9 rows processed

The query takes nearly 6.5 minutes to complete which is clearly a long time to return a few rows.

Note that the Autonomous Data Warehouse platform is Exadata and Exadata has a number of key capabilities, such a Hybrid Columnar Compression (HCC), Smarts Scans, Storage Indexes and the such that can be extremely useful in Data Warehouse environments.

For example, the table is smaller and can be accessed more efficiently thanks to HCC:

SQL> select table_name, compression, compress_for
from user_tables where table_name='BIG_ZIGGY';

TABLE_NAME   COMPRESS COMPRESS_FOR
------------ -------- ------------------------------
BIG_ZIGGY     ENABLED QUERY HIGH ROW LEVEL LOCKING

Exadata Smart Scans and associated Storage Indexes can result in extremely efficient Full Table Scans by potentially skipping large sections of the physical data from having to be accessed and by returning just the required data set back to the database. I’ve blogged previously a number of times on the power of Storage Indexes and Exadata related smarts.

However, if a query is executed using specific filtering columns for the first time, then Storage Indexes will not have yet been created. If we look at the effectiveness of Storage Indexes when running the previous query:

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s
where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                             0
cell physical IO interconnect bytes returned by smart scan       7.79956818

Note that initially, no bytes are saved by storage indexes as they don’t initially exist. Although only a relatively small amount of data is actually returned to the database server, having to read the entire table is expensive and why the query is taking so long to complete.

If we run a number of different queries with a filter predicate on the same LO_CUSTKEY column and eventually run the following query:

SQL> select lo_orderkey, lo_linenumber, lo_custkey, lo_partkey, lo_quantity
from big_ziggy where lo_orderkey = 2294059491;

6 rows selected.

Elapsed: 00:01:41.95

Execution Plan
----------------------------------------------------------
Plan hash value: 2097953955

---------------------------------------------------------------------------------------------------------
| Id | Operation                   | Name                       | Rows | Bytes | Cost (%CPU)| Time      |
---------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                            |    5 |   125 |  67250 (12)|  00:00:03 |
|  1 |  RESULT CACHE               | 9ycjfft9fx7gy506tr9d98djdj |      |       |            |           |
|* 2 |   TABLE ACCESS STORAGE FULL | BIG_ZIGGY                  |    5 |   125 |  67250 (12)|  00:00:03 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage("LO_ORDERKEY"=2294059491)
     filter("LO_ORDERKEY"=2294059491)

Result Cache Information (identified by operation id):
------------------------------------------------------

1 - column-count=5; dependencies=(BOWIE.BIG_ZIGGY); name="select lo_orderkey,
lo_linenumber, lo_custkey, lo_partkey, lo_quantity from big_ziggy where lo_orde
rkey = 2294059491"

Note
-----

- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
      1 recursive calls
      0 db block gets
3530785 consistent gets
3530770 physical reads
      0 redo size
    760 bytes sent via SQL*Net to client
    552 bytes received via SQL*Net from client
      2 SQL*Net roundtrips to/from client
      0 sorts (memory)
      0 sorts (disk)
      6 rows processed

We note that the overall elapsed times have now decreased, taking only 1 minute, 45 seconds to complete.

If we now look at the effectiveness of storage indexes:

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s
where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                     731.78125
cell physical IO interconnect bytes returned by smart scan       7.59265137

The storage index is now providing some benefit but as I’ve discussed previously, there are limitations to their effectiveness depending on data distribution and the physical clustering of the data.  So your mileage will vary on the effectiveness of smart scans and no, Oracle does not (currently) automatically re-cluster data to improve smart scan performance when most effective.

If you re-run the same query multiple times, another performance feature kicks in:

SQL> select lo_orderkey, lo_linenumber, lo_custkey, lo_partkey, lo_quantity from big_ziggy where lo_orderkey = 2294059491;

6 rows selected.

Elapsed: 00:00:01.74

Execution Plan
----------------------------------------------------------
Plan hash value: 2097953955

---------------------------------------------------------------------------------------------------------
|  Id | Operation                   | Name                       | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                            |    5 |   125 |  67250 (12)| 00:00:03 |
|   1 |  RESULT CACHE               | 9ycjfft9fx7gy506tr9d98djdj |      |       |            |          |
| * 2 |   TABLE ACCESS STORAGE FULL | BIG_ZIGGY                  |    5 |   125 |  67250 (12)| 00:00:03 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage("LO_ORDERKEY"=2294059491)
     filter("LO_ORDERKEY"=2294059491)

Result Cache Information (identified by operation id):
------------------------------------------------------

1 - column-count=5; dependencies=(BOWIE.BIG_ZIGGY); name="select lo_orderkey,
lo_linenumber, lo_custkey, lo_partkey, lo_quantity from big_ziggy where lo_orderkey = 2294059491"

Note
-----

- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  0 consistent gets
  0 physical reads
  0 redo size
760 bytes sent via SQL*Net to client
552 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  6 rows processed

Response times are now under 2 seconds (Australia is a long way from Ashburn, USA), with Oracle automatically using the Result Cache to directly access the necessary result and hence not requiring any table related consistent reads. This is as efficient as it gets, but it’s dependant on similar result sets being returned and no data changes that would invalidate the results cache such as a new data load into the data warehouse:

SQL> insert into big_ziggy select * from big_ziggy where rownum  commit;

Commit complete.

SQL> select lo_orderkey, lo_linenumber, lo_custkey, lo_partkey, lo_quantity from big_ziggy where lo_orderkey = 2294059491;

6 rows selected.

Elapsed: 00:01:44.79

Execution Plan
----------------------------------------------------------
Plan hash value: 2097953955

---------------------------------------------------------------------------------------------------------
|  Id | Operation                   | Name                       | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                            |    5 |   125 |  67250 (12)| 00:00:03 |
|   1 |  RESULT CACHE               | 9ycjfft9fx7gy506tr9d98djdj |      |       |            |          |
| * 2 |   TABLE ACCESS STORAGE FULL | BIG_ZIGGY                  |    5 |   125 |  67250 (12)| 00:00:03 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage("LO_ORDERKEY"=2294059491)
     filter("LO_ORDERKEY"=2294059491)

Result Cache Information (identified by operation id):
------------------------------------------------------

1 - column-count=5; dependencies=(BOWIE.BIG_ZIGGY); name="select lo_orderkey,
lo_linenumber, lo_custkey, lo_partkey, lo_quantity from big_ziggy where lo_orde
rkey = 2294059491"

Note
-----

- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
      1 recursive calls
      0 db block gets
3530808 consistent gets
3530786 physical reads
      0 redo size
    760 bytes sent via SQL*Net to client
    552 bytes received via SQL*Net from client
      2 SQL*Net roundtrips to/from client
      0 sorts (memory)
      0 sorts (disk)
      6 rows processed

After new data gets loaded into the table, we’re back to being reliant on the effectiveness of the Exadata smart scan.

So we have a scenario here where we either have excellent performance via the result cache or just average performance is we want data for specific data based on a moderately effective storage index and resultant smart scan (in a different session, the following query is run):

SQL> select lo_orderkey, lo_linenumber, lo_custkey, lo_partkey, lo_quantity from big_ziggy where lo_orderkey = 2294059492;

Elapsed: 00:02:03.99

Execution Plan
----------------------------------------------------------
Plan hash value: 2097953955

---------------------------------------------------------------------------------------------------------
|  Id | Operation                   | Name                       | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                            |    5 |   125 |  67250 (12)| 00:00:03 |
|   1 |  RESULT CACHE               | 4s0z6mkqpufc33mxb9ddvuc69r |      |       |            |          |
| * 2 |   TABLE ACCESS STORAGE FULL | BIG_ZIGGY                  |    5 |   125 |  67250 (12)| 00:00:03 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage("LO_ORDERKEY"=2294059492)
     filter("LO_ORDERKEY"=2294059492)

Result Cache Information (identified by operation id):
------------------------------------------------------

1 - column-count=5; dependencies=(BOWIE.BIG_ZIGGY); name="select lo_orderkey,
lo_linenumber, lo_custkey, lo_partkey, lo_quantity from big_ziggy where lo_orde
rkey = 2294059492"
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1
Statistics
----------------------------------------------------------
      1 recursive calls
      0 db block gets
3530785 consistent gets
3530770 physical reads
      0 redo size
    695 bytes sent via SQL*Net to client
    552 bytes received via SQL*Net from client
      2 SQL*Net roundtrips to/from client
      0 sorts (memory)
      0 sorts (disk)
      2 rows processed

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                     731.78125
cell physical IO interconnect bytes returned by smart scan       7.59265137

To get both consistent and excellent performance when running a query that performs a high level of filtering, we need to create an appropriate index as we would in any other regular data warehouse database environment:

SQL> create bitmap index big_ziggy_lo_orderkey_i on big_ziggy(lo_orderkey);
SQL> select index_name, leaf_blocks, blevel, distinct_keys, status from user_indexes where table_name='BIG_ZIGGY';

INDEX_NAME                LEAF_BLOCKS     BLEVEL DISTINCT_KEYS STATUS
------------------------- ----------- ---------- ------------- --------
BIG_ZIGGY_LO_ORDERKEY_I        843144          3     212552698 VALID

If we now run random queries based on the highly selective LO_ORDERKEY column:

SQL> select lo_orderkey, lo_linenumber, lo_custkey, lo_partkey, lo_quantity from big_ziggy where lo_orderkey = 2294059489;

Elapsed: 00:00:01.67

Execution Plan
----------------------------------------------------------
Plan hash value: 1677294303

------------------------------------------------------------------------------------------------------------
| Id | Operation                      | Name                       | Rows | Bytes | Cost (%CPU)| Time      |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT               |                            |    5 |   125 |       5 (0)|  00:00:01 |
|  1 |  RESULT CACHE                  | 7tn121mujfw3kfg39z5pg4duwy |      |       |            |           |
|  2 |   TABLE ACCESS BY INDEX ROWID  | BIG_ZIGGY                  |    5 |   125 |       5 (0)|  00:00:01 |
|  3 |    BITMAP CONVERSION TO ROWIDS |                            |      |       |            |           |
|* 4 |     BITMAP INDEX SINGLE VALUE  | BIG_ZIGGY_LO_ORDERKEY_I    |      |       |            |           |
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("LO_ORDERKEY"=2294059489)

Result Cache Information (identified by operation id):
------------------------------------------------------

1 - column-count=5; dependencies=(BOWIE.BIG_ZIGGY); attributes=(ordered); nam
e="select lo_orderkey, lo_linenumber, lo_custkey, lo_partkey, lo_quantity from b
ig_ziggy where lo_orderkey = 2294059489"
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1 because of no expensive
parallel operation

Statistics
----------------------------------------------------------
  1 recursive calls
  0 db block gets
  8 consistent gets
  0 physical reads
  0 redo size
647 bytes sent via SQL*Net to client
553 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

We now get the required, consistent, high performance that we require, without having to resort to consuming additional CPU resources by increasing say the number of available CPUs and degree of parallelism to reduce response times.

In short, you index the Autonomous Data Warehouse as you would any database running on an Exadata platform…

Remember, this is all effectively version one of the Autonomous Data Warehouse. The capability of automatically creating necessary indexes is coming very soon…

Intro: Initial Thoughts On Oracle Autonomous Database Cloud Services (Automatic For The People) March 22, 2019

Posted by Richard Foote in 18c, 19c, 19c New Features, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Oracle Indexes.
4 comments

automatic for the people

I’m currently writing up a few blog pieces on indexing in the Oracle Autonomous Database environments, but I thought I’ll begin by introducing what exactly are Oracle Autonomous Database Cloud Services and some of my initial thoughts, as there’s still some confusion on all this.

Introduced by Uncle Larry at Oracle OpenWorld 2017, Oracle Autonomous Databases are positioned as “self-driving” Cloud- Based database environments, that automatically address many of the tasks usually performed by DBAs such as patching, backups, security and what is of most interest to me, database tuning.

The first of these cloud environments, the Oracle Autonomous Data Warehouse Cloud Service  (ADW) was first released in March 2018, with the Oracle Autonomous Transaction Processing Cloud Service (ATP) released in August 2018.

So key point number one. These are all Oracle Autonomous Database Cloud Services, there are no actual Autonomous Databases as such. These environments currently consist of the following key components:

  • Oracle Database 18c (and above)
  • Oracle Cloud Environment
  • Oracle Exadata Infrastructure
  • Oracle Policy-Driven Automation

So the Oracle Database alone is not an autonomous database. An Oracle 18c database running on your On-Premises Exadata is still not enough to be an autonomous database platform. It requires all the above components, which is why this is only available on the Oracle Cloud environment (and will likely be soon available on the Oracle Cloud at Customer environment, where all these components can also be replicated).

Now having a database environment is which many of the mundane DBA tasks (such as database provisioning, patching, taking backups, etc.) can be automated can only be a good thing and the Autonomous Database Cloud services delivers on all these. Creating an Autonomous Database environment truly just takes a few minutes and I could easily connect via SQL*PLUS and SQL Developer on my laptop a few minutes later.

However, my key interest here is in database tuning (hey, I’m a database tuning guy) and the capability of the Autonomous Database Cloud Services in being able to self-tune and self-repair itself when database performance issues and inefficiencies are encountered.

So, are we there yet?

So my second key point is that is many ways, we already have a “self-tuning” database with Oracle and have had so for many many years. I’m old enough to remember the Rule-Based Optimizer, when the structure of the SQL was critical to performance. I remember tuning Rollback Segments and manually setting extent sizes to make sure no segment got too close to 121 extents else it couldn’t grow any further. I remember manually setting Freelists, I remember having to write my own jobs to run maintenance tasks and setting tablespaces to be in backup mode etc. etc. etc.

The Oracle Database has evolved over the years, where Oracle DBAs don’t have to worry about these things. If you wish, you can now configure the Oracle database to also automatically adjust memory components, automatically collect necessary schema statistics, automatically create baselines and to tune SQL queries, etc. etc. etc.

All of these Oracle database capabilities are crucial in the new Oracle autonomous database environments, as are new Oracle 18c features and as will be new Oracle 19c features (especially in relation to self-tuning the Autonomous Transaction Processing Cloud Service). The newer autonomous database capabilities are just part of this Oracle database self-tuning evolution, with in the future some new policy-driven based automation thrown into the mix.

So are we indeed there yet with a completely self-tuning database with these new autonomous database services? In a word, no.

Which brings me to my next key point. This is all very very new. All these autonomous database services are effectively at “Edition One” status. This will all take time to eventually evolve to be truly, fully self-tuning database environments. There’ll be some new cool capabilities and features which will assist in some areas but be initially deficient in other areas. But clearly this is the future and clearly in future releases, more and more self-tuning capabilities will be added that will make things easier to both manage and tune Oracle database environments.

Note Oracle Corporation itself (depending on who you talk to) is quite clear that it isn’t there yet, with the web-page on the Autonomous Transaction Processing Cloud services clearly stating that “Workload Optimization* (coming soon)“, but with lots of clues on what’s to come with features such as “Database tunes itself (indexes, memory, partitions, SQL plans, parallelism, optimizer stats) for the incoming workload as data changes over time“.

Do many of these upcoming features sound familiar? If you’re not sure, check out the Oracle Database 19c New Features manual.

Which brings me to my final key point here. Even if you’re not particularly interested in the Cloud, if you view managing On-Premises environments as being your foreseeable future, some the best things that has happened to you in relation to the Oracle Database comes courtesy to you as a result of Oracle’s strategic direction with the cloud. Many of the best new features introduced in the past few Oracle Database releases, especially in relation to the CBO and much of the online stuff such moving tables and partitions online, moving data files online, converting tables to be partitioned online, converting partitioned tables differently online, merging/splitting partitions online, etc. etc. are clearly going to be critical in a self-managing/tuning database. As a DBA of an On-Premises database environment, you can also take advantage of these new capabilities.

It will enable Oracle in its autonomous environments for example to automatically convert that table to be partitioned in this specific manner to improve overall performance, all behind the scenes, without anyone necessarily knowing it’s done so.

Is it there yet? No. Is it coming? You bet.

That said, some newer Oracle Database 19c features that will clearly be critical to a self-tuning autonomous databases moving forward such as Real-Time Statistics, SQL Quarantine and Automatic Indexing will only be available in the Oracle Cloud and Exadata platforms. So take note…

Which brings me to indexing.

When the first Oracle Autonomous Data Warehouse cloud service was announced in March 2018, one of the key “features” of the new autonomous platform was that indexing was disabled (as were other traditional Data Warehouse database capabilities such as Partitioning, Materialized Views, etc.). But how can you effectively “tune” a database when you take away some of the key tuning capabilities? The short answer is that you can’t, unless the database has somewhat simplistic data workloads and data structures.

Remember, this was “Version One” with the first autonomous database environment and the initial strategy was to make the Oracle database not smarter, but dumber. Simplify it such that DBAs can’t easily “break” things and by simply making the Exadata platform do all the heavy lifting with regards to database tuning. A more simplified environment makes things a little easier to “self-tune” as there are fewer moving parts to worry about.

For more simplistic Data Warehouse environments, this might all be adequate. For those of you who follow my blog and my previous discussions on indexing on Exadata, dropping all indexes on Exadata, even on Data Warehouse environments was generally a bad idea. So providing an Oracle database platform, even an autonomous one running on an Exadata platform, where all indexing was effectively disabled, was always going to have performance shortfalls in some scenarios. Try running ad-hoc queries on the supplied SSB LINEORDER table for example. Smart Scans, Storage Indexes, HCC, Result Caches, etc. will only take you so far.

So as I said, Oracle evolves and improves and now allows database indexes to be created in the Autonomous Data Warehouse cloud service.

Which will be the focus on upcoming blog posts, so stay tuned.

Oracle Database 19c Automatic Indexing: Predicted Back In 2008 (A Better Future) March 19, 2019

Posted by Richard Foote in Automatic Indexing, Autonomous Database, Oracle Indexes.
add a comment

heathen

I’ve recently received a number of correspondences regarding one of my most popular blog posts, dating back to February 2008: Index Rebuild vs. Coalesce vs. Shrink Space (Pigs – 3 Different Ones).

In the comments section, there’s an interesting discussion where I mention:

If Oracle19 does everything for you and all the various indexes structures get automatically created, used, applied, tuned, maintained, partitioned, etc. etc., then I’ll have nothing to write about 🙂.

Actually, when I think about it, it may not take as long as Oracle19 for Oracle to get there.

The fact this was stated some 10 years before the release of Oracle Database 19c with indeed the new Automatic Indexing capability is kinda funny.

I would however like to state a few points.

Firstly, congratulations for the impressive timely predication should go to Robert who made the initial reference to Oracle 19c in the comments.

Although the changing in Oracle database versioning helped in getting the predicted Oracle version correct, I did at least make the comment that I thought some form Automatic Indexing was likely before Oracle 19c.

Although not all the predicted capabilities are there yet (no auto index maintenance, no auto index partitioning, not all index structures are yet supported with auto indexing, etc.), I do predict that these will all be implemented within a few Oracle database releases (early 20s).

I stated there would be nothing to blog about if automatic indexing became a reality, which of course is not true either. The more one understands how Oracle actually works, the more one can proactively identify and address any potential issues and deficiencies. Automatic Indexing is ONLY going to be available on Exadata and the Oracle Cloud platform, so the majority of Oracle On-Premises database users will still need to manually manage indexes. Understanding how Automatic Indexing capability works means one can potentially highlight and learn how to more appropriately manage indexes manually.

So finally, yes I will of course be blogging about Automatic Indexing and about indexing in general in the new Autonomous Database cloud environments.

I’ll start with some initial thoughts and examples of indexing the Autonomous Data Warehouse Database.

Stay tuned.