jump to navigation

Quiz Time. Why Do Deletes Cause An Index To Grow ? (Solution) June 29, 2015

Posted by Richard Foote in Oracle Indexes.
add a comment

OK, time to reveal how a couple of simple deletes can cause an index to double in size.

If we go back and look at the tree dump before the delete operation:

—– begin tree dump
branch: 0x180050b 25167115 (0: nrow: 19, level: 1)
leaf: 0x180050c 25167116 (-1: row:540.540 avs:4)
leaf: 0x180050d 25167117 (0: row:533.533 avs:10)
leaf: 0x180050e 25167118 (1: row:533.533 avs:11)
leaf: 0x180050f 25167119 (2: row:533.533 avs:10)
leaf: 0x1800510 25167120 (3: row:533.533 avs:10)
leaf: 0x1800511 25167121 (4: row:533.533 avs:11)
leaf: 0x1800512 25167122 (5: row:533.533 avs:10)
leaf: 0x1800513 25167123 (6: row:533.533 avs:10)
leaf: 0x1800514 25167124 (7: row:533.533 avs:11)
leaf: 0x1800515 25167125 (8: row:533.533 avs:10)
leaf: 0x1800516 25167126 (9: row:533.533 avs:10)
leaf: 0x1800517 25167127 (10: row:533.533 avs:11)
leaf: 0x1800519 25167129 (11: row:533.533 avs:10)
leaf: 0x180051a 25167130 (12: row:533.533 avs:10)
leaf: 0x180051b 25167131 (13: row:533.533 avs:11)
leaf: 0x180051c 25167132 (14: row:533.533 avs:10)
leaf: 0x180051d 25167133 (15: row:533.533 avs:10)
leaf: 0x180051e 25167134 (16: row:533.533 avs:11)
leaf: 0x180051f 25167135 (17: row:399.399 avs:2019)
—– end tree dump

We notice there’s only 10 or 11 free bytes available (the avs) in most of the leaf blocks as we created the index with a pctfree of 0.

If we look at a partial block dump of the first leaf block after the deletes:

Block header dump:  0x0180050c
Object id on Block? Y
seg/obj: 0x18244  csc: 0x00.4b9940  itc: 2  flg: E  typ: 2 - INDEX
brn: 0  bdba: 0x1800508 ver: 0x01 opc: 0
inc: 0  exflg: 0

Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0009.000.0000097f  0x0140010a.0186.01  -BU-    1  fsc 0x0000.004b9956
0x02   0x0005.001.00000a03  0x01400ba7.0156.28  ----    0  fsc 0x0000.00000000
Leaf block dump
===============
header address 37170788=0x2372e64
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 1
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 278
kdxcofbo 592=0x250
kdxcofeo 4523=0x11ab
kdxcoavs 3931
kdxlespl 0
kdxlende 0
kdxlenxt 25167140=0x1800524
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[4523] flag: -------, lock: 0, len=12
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 80 04 e7 00 00
row#1[4535] flag: -------, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 04 e7 00 01

.....

The key thing to note here is that the leaf block has two Interested Transaction List (ITL) slots, each of which use 24 bytes. Two is the default number of ITL slots in an index leaf block (index branch blocks only have 1 by default) and are used by transactions to store vital information such as transaction id, locking information, location of undo and SCN details. However, the first slot (No. 1) is only used by recursive transactions such as those required to perform index block splits and can’t be used for standard user-based transactions. I discuss this in my (in)famous Rebuilding The Truth presentation.

Now my quiz demo had two concurrent delete transactions occurring within the same leaf block(s) but with effectively just the one free ITL slot available for the two transactions. Ordinarily, Oracle would just allocate another ITL slot so both transactions can both concurrently delete the different index entries within the same leaf block. However, Oracle is unable to simply add another ITL slot in this scenario as it requires 24 bytes of free space and there is only the 10 or 11 bytes free in our leaf blocks.

In a similar scenario with a table segment, being unable to allocate another ITL slot like this would result in a nasty ITL wait event for one of the transactions. But for indexes, there is a “naturally occurring” event that results in plenty of additional free space as required.

The index block split.

So rather than have one transaction having to hang and wait for a ITL slot to become available (i.e. for the other transaction to complete), Oracle simply performs a 50-50 block split and allocates the additional ITL slot as necessary, if both transactions still occur within the same leaf block after the block split.

In my quiz demo, both delete transactions were actually performed on index entries that existed in the other half of the block split. Therefore, the number of ITL slots in the first leaf block remains at just the default two and the kdxlende value which denotes deleted index entries remains at 0.

If we look at a partial block dump of the other half of the leaf block split, now the second logical leaf block (as identified by kdxlenxt 25167140=0x1800524):

Block header dump:  0x01800524
Object id on Block? Y
seg/obj: 0x18244  csc: 0x00.4b9956  itc: 3  flg: E  typ: 2 - INDEX
brn: 1  bdba: 0x1800518 ver: 0x01 opc: 0
inc: 0  exflg: 0

Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0009.000.0000097f  0x0140010c.0186.01  CB--    0  scn 0x0000.004b9956
0x02   0x0005.001.00000a03  0x01400ba7.0156.28  ----    1  fsc 0x000f.00000000
0x03   0x0003.01b.00000944  0x01400dfd.01f1.19  --U-    1  fsc 0x000e.004b9970
Leaf block dump
===============
header address 37170812=0x2372e7c
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 262
kdxcofbo 560=0x230
kdxcofeo 4609=0x1201
kdxcoavs 4049
kdxlespl 0
kdxlende 2
kdxlenxt 25167117=0x180050d
kdxleprv 25167116=0x180050c
kdxledsz 0
kdxlebksz 8012
row#0[4609] flag: -------, lock: 0, len=13
col 0; len 3; (3):  c2 03 50
col 1; len 6; (6):  01 80 04 e7 01 16
row#1[4622] flag: -------, lock: 0, len=13
col 0; len 3; (3):  c2 03 51
col 1; len 6; (6):  01 80 04 e7 01 17

....

row#219[7454] flag: -------, lock: 0, len=13
col 0; len 3; (3):  c2 05 63
col 1; len 6; (6):  01 80 04 e3 00 93
row#220[7467] flag: ---DS--, lock: 2, len=13
col 0; len 3; (3):  c2 05 64
col 1; len 6; (6):  01 80 04 e3 00 94
row#221[7480] flag: ---D---, lock: 3, len=12
col 0; len 2; (2):  c2 06
col 1; len 6; (6):  01 80 04 e3 00 95

....

The first thing we notice is that this leaf block as three, not the default two ITL slots. As both concurrent delete transactions deleted entries from this particular leaf block, an additional ITL slot was allocated as we now have plenty of free space.

The kdxlende value is set to 2 as we now have the two index entries marked as deleted (these are index entries 220 and 221 within the block). Index entry 220 was deleted by the transaction logged in ITL slot 2 and index entry 221 was deleted by the transaction logged in the new ITL slot 3.

So having two concurrent transactions wanting to delete from the same “full” leaf block resulted in the leaf block performing a 50-50 block split with a new leaf block being added to the index in order to accommodate the additional required ITL slot.

I was very careful when deleting rows from the table to cause maximum “damage” to the index. Both delete transactions in my demo effectively deleted every 500th pair of index entries. As there was previously approx. 533 index entries per leaf block, this resulted in every leaf block in the index splitting in this exact manner as all the leaf blocks had two index entries deleted. This is why the deletes resulted in the index practically doubling in size.

The only index leaf block that didn’t have to split was the very last leaf block as it had plenty of free space (2019 bytes) to accommodate the additional ITL slot. This last leaf block only had 1995 bytes of free space after the deleted, as it lost the 24 bytes due to the additional ITL slot being allocated. You can see these numbers in the tree dumps (following is the tree dump after the delete operations):

—– begin tree dump
branch: 0x180050b 25167115 (0: nrow: 37, level: 1)
leaf: 0x180050c 25167116 (-1: row:278.278 avs:3931)
leaf: 0x1800524 25167140 (0: row:262.260 avs:4049)
leaf: 0x180050d 25167117 (1: row:271.271 avs:3938)
leaf: 0x1800525 25167141 (2: row:262.260 avs:4048)
leaf: 0x180050e 25167118 (3: row:271.271 avs:3938)
leaf: 0x1800526 25167142 (4: row:262.260 avs:4049)
leaf: 0x180050f 25167119 (5: row:271.271 avs:3937)
leaf: 0x1800527 25167143 (6: row:262.260 avs:4049)
leaf: 0x1800510 25167120 (7: row:271.271 avs:3938)
leaf: 0x1800520 25167136 (8: row:262.260 avs:4048)
leaf: 0x1800511 25167121 (9: row:271.271 avs:3938)
leaf: 0x1800521 25167137 (10: row:262.260 avs:4049)
leaf: 0x1800512 25167122 (11: row:271.271 avs:3937)
leaf: 0x1800522 25167138 (12: row:262.260 avs:4049)
leaf: 0x1800513 25167123 (13: row:271.269 avs:3914)
leaf: 0x1800523 25167139 (14: row:262.262 avs:4072)
leaf: 0x1800514 25167124 (15: row:271.269 avs:3914)
leaf: 0x1800529 25167145 (16: row:262.262 avs:4073)
leaf: 0x1800515 25167125 (17: row:271.269 avs:3913)
leaf: 0x180052a 25167146 (18: row:262.262 avs:4073)
leaf: 0x1800516 25167126 (19: row:271.269 avs:3914)
leaf: 0x180052e 25167150 (20: row:262.262 avs:4072)
leaf: 0x1800517 25167127 (21: row:271.269 avs:3914)
leaf: 0x180052f 25167151 (22: row:262.262 avs:4073)
leaf: 0x1800519 25167129 (23: row:271.269 avs:3913)
leaf: 0x180052b 25167147 (24: row:262.262 avs:4073)
leaf: 0x180051a 25167130 (25: row:271.269 avs:3914)
leaf: 0x180052c 25167148 (26: row:262.262 avs:4072)
leaf: 0x180051b 25167131 (27: row:271.269 avs:3914)
leaf: 0x180052d 25167149 (28: row:262.260 avs:4049)
leaf: 0x180051c 25167132 (29: row:271.271 avs:3937)
leaf: 0x1800534 25167156 (30: row:262.260 avs:4049)
leaf: 0x180051d 25167133 (31: row:264.264 avs:4042)
leaf: 0x1800535 25167157 (32: row:269.267 avs:3944)
leaf: 0x180051e 25167134 (33: row:271.271 avs:3938)
leaf: 0x1800536 25167158 (34: row:262.260 avs:4049)
leaf: 0x180051f 25167135 (35: row:399.397 avs:1995)
—– end tree dump

Be very careful allocating a pctfree of 0 to indexes as it may not ultimately help in keeping the indexes as compact as you might have hoped, even if you don’t insert new index entries into the existing full portions of the index.

Thanks to all of those that had a go at the quiz and well done to those that got it right :)

Quiz Time. Why Do Deletes Cause An Index To Grow ? (Up The Hill Backwards) June 25, 2015

Posted by Richard Foote in Oracle Indexes.
13 comments

OK, time for a little quiz.

One of the things I’ve seen at a number of sites is the almost fanatical drive to make indexes as small as possible because indexes that are larger than necessary both waste storage and hurt performance.

Or so the theory goes …   :)

In many cases, this drives DBAs to create or rebuild indexes with a PCTFREE set to 0 as this will make the index as compact and small as possible.

Of course, this is often the very worst setting for an index to remain small because the insert of a new index entry is likely to cause a 50-50 block split and result in two 1/2 empty leaf blocks (unless the index entry is the maximum current value). Before very long, the index is back to a bloated state and in some sad scenarios, the process is repeated again and again.

A point that is often missed though is that it doesn’t even take an insert to cause the index to expand out. A few delete statements is all that’s required.

To illustrate I create my favorite little table and populate it with a few rows:

SQL> create table bowie (id number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, 'DAVID BOWIE' from dual connect by level <=10000;

10000 rows created.

SQL> commit;

Commit complete.

OK, time to create an index but because I’m paranoid about having indexes larger than necessary, I’ll create it with a pctfree of 0:

SQL> create index bowie_id_i on bowie(id) pctfree 0;

Index created.

If we look at a tree dump of the index:

SQL> select object_id from dba_objects where object_name='BOWIE_ID_I';

OBJECT_ID
----------
98884

SQL> alter session set events 'immediate trace name treedump level 98884';

Session altered.

—– begin tree dump
branch: 0x180050b 25167115 (0: nrow: 19, level: 1)
leaf: 0x180050c 25167116 (-1: row:540.540 avs:4)
leaf: 0x180050d 25167117 (0: row:533.533 avs:10)
leaf: 0x180050e 25167118 (1: row:533.533 avs:11)
leaf: 0x180050f 25167119 (2: row:533.533 avs:10)
leaf: 0x1800510 25167120 (3: row:533.533 avs:10)
leaf: 0x1800511 25167121 (4: row:533.533 avs:11)
leaf: 0x1800512 25167122 (5: row:533.533 avs:10)
leaf: 0x1800513 25167123 (6: row:533.533 avs:10)
leaf: 0x1800514 25167124 (7: row:533.533 avs:11)
leaf: 0x1800515 25167125 (8: row:533.533 avs:10)
leaf: 0x1800516 25167126 (9: row:533.533 avs:10)
leaf: 0x1800517 25167127 (10: row:533.533 avs:11)
leaf: 0x1800519 25167129 (11: row:533.533 avs:10)
leaf: 0x180051a 25167130 (12: row:533.533 avs:10)
leaf: 0x180051b 25167131 (13: row:533.533 avs:11)
leaf: 0x180051c 25167132 (14: row:533.533 avs:10)
leaf: 0x180051d 25167133 (15: row:533.533 avs:10)
leaf: 0x180051e 25167134 (16: row:533.533 avs:11)
leaf: 0x180051f 25167135 (17: row:399.399 avs:2019)
—– end tree dump

We note the index only has 19 leaf blocks and that most leaf blocks have 533 index entries and only an avs (available free space) of some 10 or 11 bytes. Only the last leaf block is partly full with some 2019 free bytes.

That’s fantastic, the index really is a small as can be. Trying to use index compression will be futile as the indexed values are effectively unique.

I’m now going to delete just a few rows. Surely deleting rows from the table (and hence entries from the index) can only have a positive impact (if any) on the index structure.

In one session, I delete some 20 odd rows:

SQL> delete bowie where id in (select rownum*499 from dual connect by level<=20);

20 rows deleted.

Meanwhile, in another session, I delete another 20 different rows:

SQL> delete bowie where id in (select rownum*500 from dual connect by level<=20); 

20 rows deleted.

I then commit the delete statement in both sessions.

So in total, I’ve just deleted some 40 rows in total out of the 10000 rows in the table.

If I look at a tree dump of the index now:

—– begin tree dump
branch: 0x180050b 25167115 (0: nrow: 37, level: 1)
leaf: 0x180050c 25167116 (-1: row:278.278 avs:3931)
leaf: 0x1800524 25167140 (0: row:262.260 avs:4049)
leaf: 0x180050d 25167117 (1: row:271.271 avs:3938)
leaf: 0x1800525 25167141 (2: row:262.260 avs:4048)
leaf: 0x180050e 25167118 (3: row:271.271 avs:3938)
leaf: 0x1800526 25167142 (4: row:262.260 avs:4049)
leaf: 0x180050f 25167119 (5: row:271.271 avs:3937)
leaf: 0x1800527 25167143 (6: row:262.260 avs:4049)
leaf: 0x1800510 25167120 (7: row:271.271 avs:3938)
leaf: 0x1800520 25167136 (8: row:262.260 avs:4048)
leaf: 0x1800511 25167121 (9: row:271.271 avs:3938)
leaf: 0x1800521 25167137 (10: row:262.260 avs:4049)
leaf: 0x1800512 25167122 (11: row:271.271 avs:3937)
leaf: 0x1800522 25167138 (12: row:262.260 avs:4049)
leaf: 0x1800513 25167123 (13: row:271.269 avs:3914)
leaf: 0x1800523 25167139 (14: row:262.262 avs:4072)
leaf: 0x1800514 25167124 (15: row:271.269 avs:3914)
leaf: 0x1800529 25167145 (16: row:262.262 avs:4073)
leaf: 0x1800515 25167125 (17: row:271.269 avs:3913)
leaf: 0x180052a 25167146 (18: row:262.262 avs:4073)
leaf: 0x1800516 25167126 (19: row:271.269 avs:3914)
leaf: 0x180052e 25167150 (20: row:262.262 avs:4072)
leaf: 0x1800517 25167127 (21: row:271.269 avs:3914)
leaf: 0x180052f 25167151 (22: row:262.262 avs:4073)
leaf: 0x1800519 25167129 (23: row:271.269 avs:3913)
leaf: 0x180052b 25167147 (24: row:262.262 avs:4073)
leaf: 0x180051a 25167130 (25: row:271.269 avs:3914)
leaf: 0x180052c 25167148 (26: row:262.262 avs:4072)
leaf: 0x180051b 25167131 (27: row:271.269 avs:3914)
leaf: 0x180052d 25167149 (28: row:262.260 avs:4049)
leaf: 0x180051c 25167132 (29: row:271.271 avs:3937)
leaf: 0x1800534 25167156 (30: row:262.260 avs:4049)
leaf: 0x180051d 25167133 (31: row:264.264 avs:4042)
leaf: 0x1800535 25167157 (32: row:269.267 avs:3944)
leaf: 0x180051e 25167134 (33: row:271.271 avs:3938)
leaf: 0x1800536 25167158 (34: row:262.260 avs:4049)
leaf: 0x180051f 25167135 (35: row:399.397 avs:1995)
—– end tree dump

I notice the index has basically doubled in size (37 leaf blocks up from the original 19), with most index leaf blocks now 1/2 empty.

So the key question. How could just deleting a handful of rows result in my beautifully compact index bloating to double the size ???

I’ve given away one clue within the post. I’ll post the answer in the coming days :)

Empty Leaf Blocks After Rollback Part II (Editions of You) June 24, 2015

Posted by Richard Foote in Block Dumps, Index Internals, Oracle Indexes, Tree Dumps, Unique Indexes, Update Indexes.
5 comments

In my last post, I discussed how both 1/2 empty and totally empty leaf blocks can be generated by rolling back a bulk update operation.

An important point I made within the comments of the previous post is that almost the exact scenario would have taken place had the transaction committed rather than rolled back. A commit would also have resulted with the leaf blocks being 1/2 empty in the first example (with the previous index entries now all marked as deleted) and with effectively empty leaf blocks in the second example (with the previous leaf blocks all now containing index entries marked as deleted). The important aspect here is not the rollback but the fact that update statements result in the deletion of the previous indexed value and the re-insertion of the new value. (BTW, it’s always a useful exercise to read through the comments on this blog as this is often where some of the best learning takes place due to some of the really nice discussions) :)

That said, the previous post used a Non-Unique index. Let’s now repeat the same scenario but this time use a Unique Index instead.

So let’s start with another table with the same data but this time with a unique index on the ID column:

SQL> create table ziggy (id number, name varchar2(42));

Table created.

SQL> insert into ziggy select rownum, 'DAVID BOWIE' from dual connect by level <=10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL> create unique index ziggy_id_i on ziggy(id);

Index created.

 

OK, let’s have a look at a tree dump of this index:

—– begin tree dump
branch: 0x180017b 25166203 (0: nrow: 20, level: 1)
leaf: 0x180017c 25166204 (-1: row:520.520 avs:824)
leaf: 0x180017d 25166205 (0: row:513.513 avs:823)
leaf: 0x180017e 25166206 (1: row:513.513 avs:823)
leaf: 0x180017f 25166207 (2: row:513.513 avs:823)
leaf: 0x18004c8 25167048 (3: row:513.513 avs:823)
leaf: 0x18004c9 25167049 (4: row:513.513 avs:823)
leaf: 0x18004ca 25167050 (5: row:513.513 avs:823)
leaf: 0x18004cb 25167051 (6: row:513.513 avs:824)
leaf: 0x18004cc 25167052 (7: row:513.513 avs:823)
leaf: 0x18004cd 25167053 (8: row:513.513 avs:823)
leaf: 0x18004ce 25167054 (9: row:513.513 avs:823)
leaf: 0x18004cf 25167055 (10: row:513.513 avs:823)
leaf: 0x18004d1 25167057 (11: row:513.513 avs:823)
leaf: 0x18004d2 25167058 (12: row:513.513 avs:823)
leaf: 0x18004d3 25167059 (13: row:513.513 avs:824)
leaf: 0x18004d4 25167060 (14: row:513.513 avs:823)
leaf: 0x18004d5 25167061 (15: row:513.513 avs:823)
leaf: 0x18004d6 25167062 (16: row:513.513 avs:823)
leaf: 0x18004d7 25167063 (17: row:513.513 avs:823)
leaf: 0x18004d8 25167064 (18: row:246.246 avs:4559)
—– end tree dump

Now we notice a bit of a difference already. Here, the index consists of 20 leaf blocks with 513 index entries in most leaf blocks whereas the non-unique index had 21 leaf blocks and just 479 index entries per leaf block. One of the advantages of unique indexes over non-unique as I’ve discussed previously.

Let’s now perform our first bulk update where I increment the ID of each value by 1:

SQL> update ziggy set id=id+1;

10000 rows updated.

SQL> rollback;

Rollback complete.

Now with the non-unique index, this resulted in the index doubling in size as we created an additional index entry for each and every row. After the rollback, we were effectively left with an index that not only was twice the size but had only 1/2 empty leaf blocks.

With a unique index though, things differ. The most important characteristic of a unique index of course is that each index value can only ever exist once, each index entry must be unique. So for a unique index, the rowid is not actually part of the indexed column list, but treated as additional “overhead” or metadata associated with the index entry.

When we perform our update here, we’re effectively replicating each value, except for the very last ID value where 10001 doesn’t exist. But with the first row, when the ID=1 becomes 2 after the update, we already have an index entry with an ID value of 2 (the second row). So Oracle can mark the first index entry as deleted (as ID=1 no longer exists) but rather than insert a new index entry simply update the rowid associated with the unique index entry with the ID of 2. Oracle then updates the rowid of the index entry with a value of 3 with the rowid of that previously referenced ID=2 . And so on and so on for all the other index entries except for index value 100001 which has to be inserted as it didn’t previously exist. So Oracle nicely maintains the consistency of the index during the single update operation by effectively recycling the existing index entries.

The net result is that the index remains the same size as the index entries are not reinserted as they are for a non-unique index. The effective change that occurs during this update is that the first index entry is marked as deleted and one new index entry is added at the very end.

If we look at a partial block dump of the first leaf block before the rollback operation:

Leaf block dump
===============
header address 375991908=0x16692e64
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 1
kdxcosdc 0
kdxconro 520
kdxcofbo 1076=0x434
kdxcofeo 1900=0x76c
kdxcoavs 824
kdxlespl 0
kdxlende 1
kdxlenxt 25166205=0x180017d
kdxleprv 0=0x0
kdxledsz 6
kdxlebksz 8036
row#0[8025] flag: —D—, lock: 2, len=11, data:(6):  01 80 01 57 00 00
col 0; len 2; (2):  c1 02
row#1[8014] flag: ——-, lock: 2, len=11, data:(6):  01 80 01 57 00 00
col 0; len 2; (2):  c1 03
row#2[8003] flag: ——-, lock: 2, len=11, data:(6):  01 80 01 57 00 01
col 0; len 2; (2):  c1 04
row#3[7992] flag: ——-, lock: 2, len=11, data:(6):  01 80 01 57 00 02
col 0; len 2; (2):  c1 05
row#4[7981] flag: ——-, lock: 2, len=11, data:(6):  01 80 01 57 00 03
col 0; len 2; (2):  c1 06
row#5[7970] flag: ——-, lock: 2, len=11, data:(6):  01 80 01 57 00 04
col 0; len 2; (2):  c1 07

We notice that the first index entry is marked as deleted (as we now no longer have an ID=1) but all the other index entries have been “recycled” with their updated rowids. Note how the rowid of the deleted index entry (01 80 01 57 00 00) is now associated with the second index entry (which is effectively now the first index entry now).

If we look at a tree dump after the rollback was performed:

—– begin tree dump
branch: 0x180017b 25166203 (0: nrow: 20, level: 1)
leaf: 0x180017c 25166204 (-1: row:520.520 avs:824)
leaf: 0x180017d 25166205 (0: row:513.513 avs:823)
leaf: 0x180017e 25166206 (1: row:513.513 avs:823)
leaf: 0x180017f 25166207 (2: row:513.513 avs:823)
leaf: 0x18004c8 25167048 (3: row:513.513 avs:823)
leaf: 0x18004c9 25167049 (4: row:513.513 avs:823)
leaf: 0x18004ca 25167050 (5: row:513.513 avs:823)
leaf: 0x18004cb 25167051 (6: row:513.513 avs:824)
leaf: 0x18004cc 25167052 (7: row:513.513 avs:823)
leaf: 0x18004cd 25167053 (8: row:513.513 avs:823)
leaf: 0x18004ce 25167054 (9: row:513.513 avs:823)
leaf: 0x18004cf 25167055 (10: row:513.513 avs:823)
leaf: 0x18004d1 25167057 (11: row:513.513 avs:823)
leaf: 0x18004d2 25167058 (12: row:513.513 avs:823)
leaf: 0x18004d3 25167059 (13: row:513.513 avs:824)
leaf: 0x18004d4 25167060 (14: row:513.513 avs:823)
leaf: 0x18004d5 25167061 (15: row:513.513 avs:823)
leaf: 0x18004d6 25167062 (16: row:513.513 avs:823)
leaf: 0x18004d7 25167063 (17: row:513.513 avs:823)
leaf: 0x18004d8 25167064 (18: row:246.246 avs:4559)
—– end tree dump

We notice that it’s exactly the same size as before and we don’t have the same issues with a bloated index as we did in the previous non-unique index example.

However, if we perform the second update which effectively changes all the ID values to those which don’t currently exist within the table:

SQL> update ziggy set id=id+10000;

10000 rows updated.

SQL> rollback;

Rollback complete.

Now Oracle can’t recycle the existing index entries as the new values don’t currently exist within the index. So Oracle is indeed forced to mark all the existing index entries as deleted and insert new index entries into the index. These new index entries all exist in the right hand most side of the index, resulting in 90-10 block splits with additional index leaf blocks being added to the index. If we rollback this transaction, it will result in all the new index entries being removed, leaving behind these new empty leaf blocks just as with the non-unique index example.

A new tree dump will confirm this:

—– begin tree dump
branch: 0x180017b 25166203 (0: nrow: 47, level: 1)
leaf: 0x180017c 25166204 (-1: row:520.520 avs:824)
leaf: 0x180017d 25166205 (0: row:513.513 avs:823)
leaf: 0x180017e 25166206 (1: row:513.513 avs:823)
leaf: 0x180017f 25166207 (2: row:513.513 avs:823)
leaf: 0x18004c8 25167048 (3: row:513.513 avs:823)
leaf: 0x18004c9 25167049 (4: row:513.513 avs:823)
leaf: 0x18004ca 25167050 (5: row:513.513 avs:823)
leaf: 0x18004cb 25167051 (6: row:513.513 avs:824)
leaf: 0x18004cc 25167052 (7: row:513.513 avs:823)
leaf: 0x18004cd 25167053 (8: row:513.513 avs:823)
leaf: 0x18004ce 25167054 (9: row:513.513 avs:823)
leaf: 0x18004cf 25167055 (10: row:513.513 avs:823)
leaf: 0x18004d1 25167057 (11: row:513.513 avs:823)
leaf: 0x18004d2 25167058 (12: row:513.513 avs:823)
leaf: 0x18004d3 25167059 (13: row:513.513 avs:824)
leaf: 0x18004d4 25167060 (14: row:513.513 avs:823)
leaf: 0x18004d5 25167061 (15: row:513.513 avs:823)
leaf: 0x18004d6 25167062 (16: row:513.513 avs:823)
leaf: 0x18004d7 25167063 (17: row:513.513 avs:823)
leaf: 0x18004d8 25167064 (18: row:246.246 avs:4559)
leaf: 0x18004d9 25167065 (19: row:0.0 avs:8000)
leaf: 0x18004da 25167066 (20: row:0.0 avs:8000)
leaf: 0x18004df 25167071 (21: row:0.0 avs:8000)
leaf: 0x18004dd 25167069 (22: row:0.0 avs:8000)
leaf: 0x18004de 25167070 (23: row:0.0 avs:8000)
leaf: 0x18004db 25167067 (24: row:0.0 avs:8000)
leaf: 0x18004dc 25167068 (25: row:0.0 avs:8000)
leaf: 0x18004e5 25167077 (26: row:0.0 avs:8000)
leaf: 0x18004e6 25167078 (27: row:0.0 avs:8000)
leaf: 0x18004e7 25167079 (28: row:0.0 avs:8000)
leaf: 0x18004e4 25167076 (29: row:0.0 avs:8000)
leaf: 0x18004ed 25167085 (30: row:0.0 avs:8000)
leaf: 0x18004ee 25167086 (31: row:0.0 avs:8000)
leaf: 0x18004ef 25167087 (32: row:0.0 avs:8000)
leaf: 0x18004e1 25167073 (33: row:0.0 avs:8000)
leaf: 0x18004e2 25167074 (34: row:0.0 avs:8000)
leaf: 0x18004e3 25167075 (35: row:0.0 avs:8000)
leaf: 0x18004e9 25167081 (36: row:0.0 avs:8000)
leaf: 0x18004ea 25167082 (37: row:0.0 avs:8000)
leaf: 0x18004eb 25167083 (38: row:0.0 avs:8000)
leaf: 0x18004ec 25167084 (39: row:0.0 avs:8000)
leaf: 0x18004f5 25167093 (40: row:0.0 avs:8000)
leaf: 0x18004f6 25167094 (41: row:0.0 avs:8000)
leaf: 0x18004f7 25167095 (42: row:0.0 avs:8000)
leaf: 0x18004f1 25167089 (43: row:0.0 avs:8000)
leaf: 0x18004e8 25167080 (44: row:0.0 avs:8000)
leaf: 0x18004f2 25167090 (45: row:0.0 avs:8000)
—– end tree dump

 

The index has indeed bloated in size as a result of the update. Note that the index would be the same size had the transaction committed, except that the leaf blocks that currently contain data would effectively be empty and contain nothing but deleted index entries while the empty leaf blocks would all contain the new indexed values.

So depending on the update operation, a unique index can potentially reuse existing index entries if the new column values existed previously in other rows. If not, then the usual delete/insert mechanism applies.

Empty Leaf Blocks After Rollback Part I (Empty Spaces) June 23, 2015

Posted by Richard Foote in Index Coalesce, Oracle Indexes, Update Indexes.
4 comments

There’s been an interesting recent discussion on the OTN Database forum regarding “Index blank blocks after large update that was rolled back“. Setting aside the odd scenario of updating a column that previously had 20 million distinct values to the same value on a 2 billion row table, the key questions raised are why the blank index leaf blocks and why the performance degradation after the update failed and rolled back.

This is a topic I’ve previously discussed in the blog such as in Updates and Indexes and Differences Between Unique and Non-Unique Indexes. However, I though it might be worthwhile discussing these again the in context of the rollback of a large update-based transaction.

The key point to make is that an Update is actually a delete/insert operation in the context of indexes. So if we perform a large update, all the previous indexed values are marked as deleted in the index and the new values re-inserted elsewhere within the index structure, potentially filling up a whole bunch of new leaf blocks. If we then decide to rollback the transaction (or the transaction fails and automatically rolls back), then all these newly inserted index entries are deleted potentially leaving behind now empty new leaf blocks in the expanded index structure. Here’s the thing, Oracle will roll back changes to index entries but not changes to the index structure such as block splits.

If an index scan is forced to navigate through these empty leaf blocks, this can indeed potentially have a detrimental impact on subsequent performance.

However, depending on whether the index is Unique or Non-Unique and the type of update being performed, the impact on the index could be quite different.

To illustrate all this, a simple demo.

Let’s start with a simple table and Non-Unique index on the (effectively) unique ID column:

SQL> create table bowie (id number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, 'DAVID BOWIE' from dual connect by level <=10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie_id_i on bowie(id);

Index created.

If we look at an index tree dump of this index:

SQL> select object_id from dba_objects where object_name='BOWIE_ID_I';

OBJECT_ID
----------
98700

SQL> alter session set events 'immediate trace name treedump level 98700';

Session altered.

—– begin tree dump
branch: 0x1800523 25167139 (0: nrow: 21, level: 1)
leaf: 0x1800524 25167140 (-1: row:485.485 avs:828)
leaf: 0x1800525 25167141 (0: row:479.479 avs:820)
leaf: 0x1800526 25167142 (1: row:479.479 avs:820)
leaf: 0x1800527 25167143 (2: row:479.479 avs:820)
leaf: 0x1800528 25167144 (3: row:479.479 avs:820)
leaf: 0x1800529 25167145 (4: row:479.479 avs:819)
leaf: 0x180052a 25167146 (5: row:479.479 avs:820)
leaf: 0x180052b 25167147 (6: row:479.479 avs:820)
leaf: 0x180052c 25167148 (7: row:479.479 avs:820)
leaf: 0x180052d 25167149 (8: row:479.479 avs:819)
leaf: 0x180052e 25167150 (9: row:479.479 avs:820)
leaf: 0x180052f 25167151 (10: row:479.479 avs:820)
leaf: 0x1800531 25167153 (11: row:479.479 avs:820)
leaf: 0x1800532 25167154 (12: row:479.479 avs:820)
leaf: 0x1800533 25167155 (13: row:479.479 avs:819)
leaf: 0x1800534 25167156 (14: row:479.479 avs:820)
leaf: 0x1800535 25167157 (15: row:479.479 avs:820)
leaf: 0x1800536 25167158 (16: row:479.479 avs:820)
leaf: 0x1800537 25167159 (17: row:479.479 avs:820)
leaf: 0x1800538 25167160 (18: row:479.479 avs:819)
leaf: 0x1800539 25167161 (19: row:414.414 avs:1795)
—– end tree dump

We notice the index is nice and compact with just the default 10% of free space.

I’m going to start by performing an update of the ID column which simply increments the ID by 1 for all rows in the table. But then, rather than commit, I’ll roll back the transaction:

SQL> update bowie set id=id+1;

10000 rows updated.

SQL> rollback;

Rollback complete.

Now the interesting thing to note here is that for each ID value, we temporarily have the same value twice as we progress and update each ID value (for example, for ID=1, it becomes 2 which already exists. Then the previous ID=2 becomes 3 which already exists, etc.). As the index is Non-Unique, this means when we update say ID=1 to 2, we need mark as deleted the index entry with ID=1 and insert a new index entry with an ID=2. When we update the previous ID=2 to 3, we again mark as deleted the previous indexed value of 2 and insert a new index entry of 3. Etc. Etc.

As we only have 10% of free space available in the index before the update, by updating all rows in this fashion, it means we have to keep performing 50-50 block splits to fit in the new index entries in the corresponding leaf blocks. This effectively results in the index doubling in size as we now have twice the number of index entries (with the previous index entries now marked as deleted).

However, having now performed all these index block splits, if we now roll back the update transaction, it simply means that all the new index entries are deleted and the delete byte removed from the previously deleted entries, with the index structure retaining its newly bloated size. The resultant index block splits are not rolled back. If we look at a new index tree dump:

—– begin tree dump
branch: 0x1800523 25167139 (0: nrow: 43, level: 1)
leaf: 0x1800524 25167140 (-1: row:223.223 avs:4756)
leaf: 0x180053e 25167166 (0: row:262.262 avs:4072)
leaf: 0x1800525 25167141 (1: row:242.242 avs:4373)
leaf: 0x180053f 25167167 (2: row:237.237 avs:4447)
leaf: 0x1800526 25167142 (3: row:217.217 avs:4747)
leaf: 0x180053a 25167162 (4: row:262.262 avs:4073)
leaf: 0x1800527 25167143 (5: row:217.217 avs:4747)
leaf: 0x180053b 25167163 (6: row:262.262 avs:4073)
leaf: 0x1800528 25167144 (7: row:217.217 avs:4747)
leaf: 0x180053c 25167164 (8: row:262.262 avs:4073)
leaf: 0x1800529 25167145 (9: row:217.217 avs:4747)
leaf: 0x180053d 25167165 (10: row:262.262 avs:4072)
leaf: 0x180052a 25167146 (11: row:217.217 avs:4747)
leaf: 0x1800545 25167173 (12: row:262.262 avs:4073)
leaf: 0x180052b 25167147 (13: row:217.217 avs:4747)
leaf: 0x1800546 25167174 (14: row:262.262 avs:4073)
leaf: 0x180052c 25167148 (15: row:217.217 avs:4747)
leaf: 0x1800547 25167175 (16: row:262.262 avs:4073)
leaf: 0x180052d 25167149 (17: row:217.217 avs:4747)
leaf: 0x1800541 25167169 (18: row:262.262 avs:4072)
leaf: 0x180052e 25167150 (19: row:217.217 avs:4748)
leaf: 0x180054e 25167182 (20: row:262.262 avs:4072)
leaf: 0x180052f 25167151 (21: row:136.136 avs:5962)
leaf: 0x180054f 25167183 (22: row:133.133 avs:6006)
leaf: 0x1800542 25167170 (23: row:210.210 avs:4852)
leaf: 0x1800531 25167153 (24: row:217.217 avs:4747)
leaf: 0x1800543 25167171 (25: row:262.262 avs:4073)
leaf: 0x1800532 25167154 (26: row:217.217 avs:4747)
leaf: 0x1800544 25167172 (27: row:262.262 avs:4073)
leaf: 0x1800533 25167155 (28: row:217.217 avs:4747)
leaf: 0x180054d 25167181 (29: row:262.262 avs:4072)
leaf: 0x1800534 25167156 (30: row:217.217 avs:4748)
leaf: 0x1800549 25167177 (31: row:262.262 avs:4072)
leaf: 0x1800535 25167157 (32: row:217.217 avs:4747)
leaf: 0x180054a 25167178 (33: row:262.262 avs:4073)
leaf: 0x1800536 25167158 (34: row:217.217 avs:4747)
leaf: 0x180054b 25167179 (35: row:262.262 avs:4073)
leaf: 0x1800537 25167159 (36: row:217.217 avs:4747)
leaf: 0x180054c 25167180 (37: row:262.262 avs:4073)
leaf: 0x1800538 25167160 (38: row:256.256 avs:4162)
leaf: 0x1800548 25167176 (39: row:223.223 avs:4657)
leaf: 0x1800539 25167161 (40: row:152.152 avs:5722)
leaf: 0x1800555 25167189 (41: row:262.262 avs:4073)
—– end tree dump

The index is now approximately double the size with each leaf block now approximately 1/2 empty (or 1/2 full if you’re the positive type).

If we now perform another update, but this time update all the IDs to values that don’t currently exist (by simply adding 10000 to the ID) and then again rollback:

SQL> update bowie set id=id+10000;

10000 rows updated.

SQL> rollback;

Rollback complete.

—– begin tree dump
branch: 0x1800523 25167139 (0: nrow: 73, level: 1)
leaf: 0x1800524 25167140 (-1: row:223.223 avs:4756)
leaf: 0x180053e 25167166 (0: row:262.262 avs:4072)
leaf: 0x1800525 25167141 (1: row:242.242 avs:4373)
leaf: 0x180053f 25167167 (2: row:237.237 avs:4447)
leaf: 0x1800526 25167142 (3: row:217.217 avs:4747)
leaf: 0x180053a 25167162 (4: row:262.262 avs:4073)
leaf: 0x1800527 25167143 (5: row:217.217 avs:4747)
leaf: 0x180053b 25167163 (6: row:262.262 avs:4073)
leaf: 0x1800528 25167144 (7: row:217.217 avs:4747)
leaf: 0x180053c 25167164 (8: row:262.262 avs:4073)
leaf: 0x1800529 25167145 (9: row:217.217 avs:4747)
leaf: 0x180053d 25167165 (10: row:262.262 avs:4072)
leaf: 0x180052a 25167146 (11: row:217.217 avs:4747)
leaf: 0x1800545 25167173 (12: row:262.262 avs:4073)
leaf: 0x180052b 25167147 (13: row:217.217 avs:4747)
leaf: 0x1800546 25167174 (14: row:262.262 avs:4073)
leaf: 0x180052c 25167148 (15: row:217.217 avs:4747)
leaf: 0x1800547 25167175 (16: row:262.262 avs:4073)
leaf: 0x180052d 25167149 (17: row:217.217 avs:4747)
leaf: 0x1800541 25167169 (18: row:262.262 avs:4072)
leaf: 0x180052e 25167150 (19: row:217.217 avs:4748)
leaf: 0x180054e 25167182 (20: row:262.262 avs:4072)
leaf: 0x180052f 25167151 (21: row:136.136 avs:5962)
leaf: 0x180054f 25167183 (22: row:133.133 avs:6006)
leaf: 0x1800542 25167170 (23: row:210.210 avs:4852)
leaf: 0x1800531 25167153 (24: row:217.217 avs:4747)
leaf: 0x1800543 25167171 (25: row:262.262 avs:4073)
leaf: 0x1800532 25167154 (26: row:217.217 avs:4747)
leaf: 0x1800544 25167172 (27: row:262.262 avs:4073)
leaf: 0x1800533 25167155 (28: row:217.217 avs:4747)
leaf: 0x180054d 25167181 (29: row:262.262 avs:4072)
leaf: 0x1800534 25167156 (30: row:217.217 avs:4748)
leaf: 0x1800549 25167177 (31: row:262.262 avs:4072)
leaf: 0x1800535 25167157 (32: row:217.217 avs:4747)
leaf: 0x180054a 25167178 (33: row:262.262 avs:4073)
leaf: 0x1800536 25167158 (34: row:217.217 avs:4747)
leaf: 0x180054b 25167179 (35: row:262.262 avs:4073)
leaf: 0x1800537 25167159 (36: row:217.217 avs:4747)
leaf: 0x180054c 25167180 (37: row:262.262 avs:4073)
leaf: 0x1800538 25167160 (38: row:256.256 avs:4162)
leaf: 0x1800548 25167176 (39: row:223.223 avs:4657)
leaf: 0x1800539 25167161 (40: row:152.152 avs:5722)
leaf: 0x1800555 25167189 (41: row:262.262 avs:4073)
leaf: 0x1800552 25167186 (42: row:0.0 avs:8000)
   leaf: 0x1800553 25167187 (43: row:0.0 avs:8000)
   leaf: 0x1800551 25167185 (44: row:0.0 avs:8000)
   leaf: 0x1800556 25167190 (45: row:0.0 avs:8000)
   leaf: 0x1800557 25167191 (46: row:0.0 avs:8000)
   leaf: 0x1800554 25167188 (47: row:0.0 avs:8000)
   leaf: 0x180055d 25167197 (48: row:0.0 avs:8000)
   leaf: 0x180055e 25167198 (49: row:0.0 avs:8000)
   leaf: 0x180055f 25167199 (50: row:0.0 avs:8000)
   leaf: 0x1800558 25167192 (51: row:0.0 avs:8000)
   leaf: 0x1800559 25167193 (52: row:0.0 avs:8000)
   leaf: 0x1800565 25167205 (53: row:0.0 avs:8000)
   leaf: 0x1800566 25167206 (54: row:0.0 avs:8000)
   leaf: 0x1800567 25167207 (55: row:0.0 avs:8000)
   leaf: 0x1800561 25167201 (56: row:0.0 avs:8000)
   leaf: 0x180055c 25167196 (57: row:0.0 avs:8000)
   leaf: 0x180055a 25167194 (58: row:0.0 avs:8000)
   leaf: 0x180055b 25167195 (59: row:0.0 avs:8000)
   leaf: 0x1800563 25167203 (60: row:0.0 avs:8000)
   leaf: 0x1800564 25167204 (61: row:0.0 avs:8000)
   leaf: 0x180056d 25167213 (62: row:0.0 avs:8000)
   leaf: 0x180056e 25167214 (63: row:0.0 avs:8000)
   leaf: 0x180056f 25167215 (64: row:0.0 avs:8000)
   leaf: 0x1800568 25167208 (65: row:0.0 avs:8000)
   leaf: 0x1800569 25167209 (66: row:0.0 avs:8000)
   leaf: 0x180056a 25167210 (67: row:0.0 avs:8000)
   leaf: 0x180056b 25167211 (68: row:0.0 avs:8000)
   leaf: 0x180056c 25167212 (69: row:0.0 avs:8000)
   leaf: 0x1800562 25167202 (70: row:0.0 avs:8000)
   leaf: 0x1800575 25167221 (71: row:0.0 avs:8000)
—– end tree dump

As all the inserts now occurred in the right-hand most side of the index, Oracle allocated a bunch of new index leaf blocks via 90-10 block splits to store all the new index entries. After the rollback however, all these new entries were removed leaving behind nothing but these new empty leaf blocks which are still part of the overall index structure.

Query performance now depends on what part of the index we need to access.

If we just want to select a single value, then no problem as the ID column is effectively unique and we just need to generally access down to the one leaf block:

SQL> select * from bowie where id=42;

ID NAME
---------- ------------------------------------------
42 DAVID BOWIE
Execution Plan
----------------------------------------------------------
Plan hash value: 1405654398

--------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name       | Rows  | Bytes | Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |            |     1 |    36 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE      |     1 |    36 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BOWIE_ID_I |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID"=42)

Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
4  consistent gets
0  physical reads
0  redo size
618  bytes sent via SQL*Net to client
552  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
1  rows processed

4 consistent gets is about as good as it gets for a non-unique Blevel 1 index.

Larger index range scan might need to access additional leaf blocks as they now only contain 1/2 the number of index entries than before, although the additional overhead of such scans would still likely be minimal as most of the work is associated with visiting the table blocks.

One of the worst case scenarios would be having to now plough through all these empty leaf blocks as with the following search for the max ID value:

SQL> select max(id) from bowie;

MAX(ID)
----------
10000
Execution Plan
----------------------------------------------------------
Plan hash value: 3831694747

-----------------------------------------------------------------------------------------
| Id  | Operation                  | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |            |     1 |    13 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE            |            |     1 |    13 |            |          |
|   2 |   INDEX FULL SCAN (MIN/MAX)| BOWIE_ID_I |     1 |    13 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
32  consistent gets
0  physical reads
0  redo size
541  bytes sent via SQL*Net to client
552  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
1  rows processed

Oracle uses the Index Full (Min/Max) Scan by starting with the right-most leaf block but as it’s empty, Oracle is forced to make its way across through all the empty leaf blocks until it finally hits upon the first non-empty leaf block that contains the max ID. The excessive 32 consistent gets is due to having to access all these new empty blocks.

If we now Coalesce the index and try again:

SQL> alter index bowie_id_i coalesce;

Index altered.

SQL> select max(id) from bowie;

MAX(ID)
----------
10000
Execution Plan
----------------------------------------------------------
Plan hash value: 3831694747

-----------------------------------------------------------------------------------------
| Id  | Operation                  | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |            |     1 |    13 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE            |            |     1 |    13 |            |          |
|   2 |   INDEX FULL SCAN (MIN/MAX)| BOWIE_ID_I |     1 |    13 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
2  consistent gets
0  physical reads
0  redo size
541  bytes sent via SQL*Net to client
552  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
1  rows processed

We notice the Max(ID) query now returns the result with just the 2 expected consistent gets from a BLevel 1 index.

If we now look at the index tree dump:

branch: 0x1800523 25167139 (0: nrow: 21, level: 1)
leaf: 0x1800524 25167140 (-1: row:485.485 avs:828)
leaf: 0x1800525 25167141 (0: row:479.479 avs:820)
leaf: 0x1800526 25167142 (1: row:479.479 avs:820)
leaf: 0x1800527 25167143 (2: row:479.479 avs:820)
leaf: 0x1800528 25167144 (3: row:479.479 avs:820)
leaf: 0x1800529 25167145 (4: row:479.479 avs:819)
leaf: 0x180052a 25167146 (5: row:479.479 avs:820)
leaf: 0x180052b 25167147 (6: row:479.479 avs:820)
leaf: 0x180052c 25167148 (7: row:479.479 avs:820)
leaf: 0x180052d 25167149 (8: row:479.479 avs:819)
leaf: 0x180052e 25167150 (9: row:479.479 avs:820)
leaf: 0x180052f 25167151 (10: row:479.479 avs:820)
leaf: 0x1800531 25167153 (11: row:479.479 avs:820)
leaf: 0x1800532 25167154 (12: row:479.479 avs:820)
leaf: 0x1800533 25167155 (13: row:479.479 avs:819)
leaf: 0x1800534 25167156 (14: row:479.479 avs:820)
leaf: 0x1800535 25167157 (15: row:479.479 avs:820)
leaf: 0x1800536 25167158 (16: row:479.479 avs:820)
leaf: 0x1800537 25167159 (17: row:479.479 avs:820)
leaf: 0x1800538 25167160 (18: row:479.479 avs:819)
leaf: 0x1800539 25167161 (19: row:414.414 avs:1795)
—– end tree dump

We notice the index is back to its original compact self again and we can determine the Max(ID) by just visiting the last leaf block.

That’s enough of a read for now !! I’ll next look at how this scenario differs with a Unique index in Part II.

Index Tree Dumps in Oracle 12c Database (New Age) June 22, 2015

Posted by Richard Foote in 12c, TreeDumps.
add a comment

I’ve previously discussed Index Tree Dumps but I’ve recently found a nice little improvement that’s been introduced in Oracle Database 12c.

Let’s begin by creating a little table and index:

SQL> create table bowie (id number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, 'DAVID BOWIE' from dual connect by level <=10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie_id_i on bowie(id);

Index created.

To generate an Index Tree Dump, we first need the OBJECT_ID of the index:

SQL> select object_id from dba_objects where object_name='BOWIE_ID_I';

OBJECT_ID
----------
98829

And then use it to generate the Index Tree Dump:

SQL> alter session set events 'immediate trace name treedump level 98829';

Session altered.

Previously, an Index Tree Dump looked like the following:

—– begin tree dump
branch: 0x100023b 16777787 (0: nrow: 21, level: 1)
leaf: 0x100023c 16777788 (-1: nrow: 485 rrow: 485)
leaf: 0x100023d 16777789 (0: nrow: 479 rrow: 479)
leaf: 0x100023e 16777790 (1: nrow: 479 rrow: 479)
leaf: 0x100023f 16777791 (2: nrow: 479 rrow: 479)
leaf: 0x1000240 16777792 (3: nrow: 479 rrow: 479)
leaf: 0x1000241 16777793 (4: nrow: 479 rrow: 479)
leaf: 0x1000242 16777794 (5: nrow: 479 rrow: 479)
leaf: 0x1000243 16777795 (6: nrow: 479 rrow: 479)
leaf: 0x1000244 16777796 (7: nrow: 479 rrow: 479)
leaf: 0x1000245 16777797 (8: nrow: 479 rrow: 479)
leaf: 0x1000246 16777798 (9: nrow: 479 rrow: 479)
leaf: 0x1000247 16777799 (10: nrow: 479 rrow: 479)
leaf: 0x1000249 16777801 (11: nrow: 479 rrow: 479)
leaf: 0x100024a 16777802 (12: nrow: 479 rrow: 479)
leaf: 0x100024b 16777803 (13: nrow: 479 rrow: 479)
leaf: 0x100024c 16777804 (14: nrow: 479 rrow: 479)
leaf: 0x100024d 16777805 (15: nrow: 479 rrow: 479)
leaf: 0x100024e 16777806 (16: nrow: 479 rrow: 479)
leaf: 0x100024f 16777807 (17: nrow: 479 rrow: 479)
leaf: 0x1000250 16777808 (18: nrow: 479 rrow: 479)
leaf: 0x1000251 16777809 (19: nrow: 414 rrow: 414)
—– end tree dump

So this index is a Level 1 Index with a root block and 21 Leaf Blocks. The first entry always corresponds to the index root block and is followed by the 21 leaf blocks. Each leaf block entry details the relative block address, the sequence number, the number of index entries (nrow) and the number of non-deleted index entries (rrow).

If we look at the same Index Tree Dump in 12c (12.0.1.2):

branch: 0x180017b 25166203 (0: nrow: 21, level: 1)
leaf: 0x180017c 25166204 (-1: row:485.485 avs:828)
leaf: 0x180017d 25166205 (0: row:479.479 avs:820)
leaf: 0x180017e 25166206 (1: row:479.479 avs:820)
leaf: 0x180017f 25166207 (2: row:479.479 avs:820)
leaf: 0x18004c8 25167048 (3: row:479.479 avs:820)
leaf: 0x18004c9 25167049 (4: row:479.479 avs:819)
leaf: 0x18004ca 25167050 (5: row:479.479 avs:820)
leaf: 0x18004cb 25167051 (6: row:479.479 avs:820)
leaf: 0x18004cc 25167052 (7: row:479.479 avs:820)
leaf: 0x18004cd 25167053 (8: row:479.479 avs:819)
leaf: 0x18004ce 25167054 (9: row:479.479 avs:820)
leaf: 0x18004cf 25167055 (10: row:479.479 avs:820)
leaf: 0x18004d1 25167057 (11: row:479.479 avs:820)
leaf: 0x18004d2 25167058 (12: row:479.479 avs:820)
leaf: 0x18004d3 25167059 (13: row:479.479 avs:819)
leaf: 0x18004d4 25167060 (14: row:479.479 avs:820)
leaf: 0x18004d5 25167061 (15: row:479.479 avs:820)
leaf: 0x18004d6 25167062 (16: row:479.479 avs:820)
leaf: 0x18004d7 25167063 (17: row:479.479 avs:820)
leaf: 0x18004d8 25167064 (18: row:479.479 avs:819)
leaf: 0x18004d9 25167065 (19: row:414.414 avs:1795)
—– end tree dump

We notice the format is a little different in that it also now includes the avs (free space) within the leaf block as well.

If we now delete a few rows (and hence index entries) and look at the updated tree dump:

SQL> delete bowie where id between 1 and 400;

400 rows deleted.

SQL> commit;

Commit complete.

SQL> alter session set events 'immediate trace name treedump level 98829';

Session altered.

branch: 0x180017b 25166203 (0: nrow: 21, level: 1)
leaf: 0x180017c 25166204 (-1: row:485.85 avs:828)
leaf: 0x180017d 25166205 (0: row:479.479 avs:820)
leaf: 0x180017e 25166206 (1: row:479.479 avs:820)
leaf: 0x180017f 25166207 (2: row:479.479 avs:820)
leaf: 0x18004c8 25167048 (3: row:479.479 avs:820)
leaf: 0x18004c9 25167049 (4: row:479.479 avs:819)
leaf: 0x18004ca 25167050 (5: row:479.479 avs:820)
leaf: 0x18004cb 25167051 (6: row:479.479 avs:820)
leaf: 0x18004cc 25167052 (7: row:479.479 avs:820)
leaf: 0x18004cd 25167053 (8: row:479.479 avs:819)
leaf: 0x18004ce 25167054 (9: row:479.479 avs:820)
leaf: 0x18004cf 25167055 (10: row:479.479 avs:820)
leaf: 0x18004d1 25167057 (11: row:479.479 avs:820)
leaf: 0x18004d2 25167058 (12: row:479.479 avs:820)
leaf: 0x18004d3 25167059 (13: row:479.479 avs:819)
leaf: 0x18004d4 25167060 (14: row:479.479 avs:820)
leaf: 0x18004d5 25167061 (15: row:479.479 avs:820)
leaf: 0x18004d6 25167062 (16: row:479.479 avs:820)
leaf: 0x18004d7 25167063 (17: row:479.479 avs:820)
leaf: 0x18004d8 25167064 (18: row:479.479 avs:819)
leaf: 0x18004d9 25167065 (19: row:414.414 avs:1795)
—– end tree dump

We notice that it now correctly details how many non-deleted index entries we now have in the first leaf block (85). Unfortunately, the free space remains the same and doesn’t take into account the deleted index entries (still recorded as 828 bytes).

Of course, if we perform any additional DML that impacts this leaf block such as another delete:

SQL> delete bowie where id=401;

1 row deleted.

SQL> commit;

Commit complete.

SQL> alter session set events 'immediate trace name treedump level 98829';

Session altered.

—– begin tree dump
branch: 0x180017b 25166203 (0: nrow: 21, level: 1)
leaf: 0x180017c 25166204 (-1: row:85.84 avs:6725)
leaf: 0x180017d 25166205 (0: row:479.479 avs:820)
leaf: 0x180017e 25166206 (1: row:479.479 avs:820)
leaf: 0x180017f 25166207 (2: row:479.479 avs:820)
leaf: 0x18004c8 25167048 (3: row:479.479 avs:820)
leaf: 0x18004c9 25167049 (4: row:479.479 avs:819)
leaf: 0x18004ca 25167050 (5: row:479.479 avs:820)
leaf: 0x18004cb 25167051 (6: row:479.479 avs:820)
leaf: 0x18004cc 25167052 (7: row:479.479 avs:820)
leaf: 0x18004cd 25167053 (8: row:479.479 avs:819)
leaf: 0x18004ce 25167054 (9: row:479.479 avs:820)
leaf: 0x18004cf 25167055 (10: row:479.479 avs:820)
leaf: 0x18004d1 25167057 (11: row:479.479 avs:820)
leaf: 0x18004d2 25167058 (12: row:479.479 avs:820)
leaf: 0x18004d3 25167059 (13: row:479.479 avs:819)
leaf: 0x18004d4 25167060 (14: row:479.479 avs:820)
leaf: 0x18004d5 25167061 (15: row:479.479 avs:820)
leaf: 0x18004d6 25167062 (16: row:479.479 avs:820)
leaf: 0x18004d7 25167063 (17: row:479.479 avs:820)
leaf: 0x18004d8 25167064 (18: row:479.479 avs:819)
leaf: 0x18004d9 25167065 (19: row:414.414 avs:1795)
—– end tree dump

We notice the leaf block has now cleaned out the previously deleted index entries and the free space has been updated accordingly (now 6725 bytes).

Showing the amount of free space within a block is a nice little improvement to the format of the index tree dump.

Indexing and Transparent Data Encryption Part III (You Can’t Do That) June 16, 2015

Posted by Richard Foote in Oracle Indexes.
add a comment

In Part II of this series, we looked at how we can create a B-Tree index on a encrypted column, providing we do not apply salt during encryption.

However, this is not the only restriction with regard to indexing an encrypted column using column-based encryption.

If we attempt to create an index that is not a straight B-Tree index, for example a Bitmap Index:

SQL> create bitmap index bowie_code_i on bowie(code);
create bitmap index bowie_code_i on bowie(code)
*
ERROR at line 1:
ORA-28337: the specified index may not be defined on an encrypted column

Or a Function-Based Index:

SQL> create index bowie_code_i on bowie(code+10);
create index bowie_code_i on bowie(code+10)
*
ERROR at line 1:
ORA-28337: the specified index may not be defined on an encrypted column

We note that such indexes can’t be defined on an encrypted column. Only standard B-Tree Indexes are supported on specifically encrypted columns.

The B-Tree index can potentially be used with equality predicates:

SQL> set autotrace on
SQL> select * from bowie where id=42;

ID       CODE     SALARY TEXT
---------- ---------- ---------- ------------------------------
42         42         42 BOWIE

Execution Plan
----------------------------------------------------------
Plan hash value: 1405654398

--------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name       | Rows  | Bytes | Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |            |     1 |    28 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE      |     1 |    28 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BOWIE_ID_I |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID"=42)

Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
4  consistent gets
1  physical reads
0  redo size
748  bytes sent via SQL*Net to client
551  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
1  rows processed

As we can see, the CBO has successfully used the index. However, if we attempt to use SQL with a non-equality predicate, such as a bounded range scan:

SQL> select * from bowie where id between 42 and 43;

ID       CODE     SALARY TEXT
---------- ---------- ---------- ------------------------------
42         42         42 BOWIE
43         42         43 BOWIE
Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     2 |    34 |    15   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| BOWIE |     2 |    34 |    15   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(INTERNAL_FUNCTION("ID")>=42 AND
INTERNAL_FUNCTION("ID")<=43)

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
54  consistent gets
0  physical reads
0  redo size
814  bytes sent via SQL*Net to client
551  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
2  rows processed

Or an unbounded range scan:

SQL> select * from bowie where id > 10000000000;

no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     1 |   124 |    15   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| BOWIE |     1 |   124 |    15   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(INTERNAL_FUNCTION("ID")>10000000000)

Statistics
----------------------------------------------------------
5  recursive calls
0  db block gets
108  consistent gets
0  physical reads
0  redo size
538  bytes sent via SQL*Net to client
540  bytes received via SQL*Net from client
1  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
0  rows processed

We notice the index is ignored and can’t be used by the CBO.

So on an encrypted column, only a standard B-Tree index with an equality based predicate can be used. These restrictions don’t however apply to indexes based on tables within encrypted tablespaces.

Indexing and Transparent Data Encryption Part II (Hide Away) May 20, 2015

Posted by Richard Foote in Block Dumps, Oracle Indexes, TDE.
add a comment

In Part I, I quickly ran through how to setup an encrypted tablespace using Transparent Data Encryption and to take care creating indexes outside of these tablespaces.

Another method of encrypting data in the Oracle database is to just encrypt selected columns. Although the advantage here is that we can just encrypt sensitive columns of interest (and that the data remains encrypted within the buffer cache), this method has a number of major restrictions, especially in relation to indexing.

To first set the scene, I’ll start by creating and populating an unencrypted table:

SQL> create table bowie (id number, code number, salary number, text varchar2(30)) tablespace unsafe;

Table created.

SQL> insert into bowie select rownum, 42, rownum, 'BOWIE' from dual connect by level <=10000;

10000 rows created.

SQL> commit;

Commit complete.

I’ll now create an index on the (effectively unique) ID column:

SQL> create index bowie_id_i on bowie(id);

Index created.

We can effectively use this index on some basic range based predicates:

SQL> select * from bowie where id between 42 and 43;

ID       CODE     SALARY TEXT
---------- ---------- ---------- ------------------------------
42         42         42 BOWIE
43         42         43 BOWIE
Execution Plan
----------------------------------------------------------
Plan hash value: 1405654398

--------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name       | Rows  | Bytes | Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |            |     2 |   112 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE      |     2 |   112 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BOWIE_ID_I |     2 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

SQL> select * from bowie where id > 10000000000;

no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 1405654398

--------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name       | Rows  | Bytes |  Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |            |     1 |    56 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE      |     1 |    56 |     1   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BOWIE_ID_I |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

So all is well with our index here and is used as expected.

If we perform a block dump and have a look at the first table block containing rows (following are excerpts from the block dump):

Block header dump:  0x020001dc
Object id on Block? Y
seg/obj: 0x16cda  csc: 0x00.2b3abc  itc: 2  flg: E  typ: 1 – DATA
brn: 0  bdba: 0x20001d8 ver: 0x01 opc: 0
inc: 0  exflg: 0

Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.006.000005dd  0x014004e3.00bb.17  –U-  330  fsc 0x0000.002b3aca
0x02   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
bdba: 0x020001dc
data_block_dump,data header at 0x1c72e64
===============
tsiz: 0x1f98
hsiz: 0x2a6
pbl: 0x01c72e64
76543210
flag=——–
ntab=1
nrow=330
frre=-1
fsbo=0x2a6
fseo=0x5d6
avsp=0x330
tosp=0x330
0xe:pti[0] nrow=330 offs=0

We notice the block contains 330 rows. Note this number as we’ll check back on it later.

If we look further on in the dump to the section listing some of the rows within the block:

block_row_dump:
tab 0, row 0, @0x181a
tl: 20 fb: –H-FL– lb: 0x1  cc: 4
col  0: [ 3]  c2 07 47
col  1: [ 2]  c1 2b
col  2: [ 3]  c2 07 47
col  3: [ 5]  42 4f 57 49 45
tab 0, row 1, @0x182e
tl: 20 fb: –H-FL– lb: 0x1  cc: 4
col  0: [ 3]  c2 07 48
col  1: [ 2]  c1 2b
col  2: [ 3]  c2 07 48
col  3: [ 5]  42 4f 57 49 45
tab 0, row 2, @0x1842
tl: 20 fb: –H-FL– lb: 0x1  cc: 4
col  0: [ 3]  c2 07 49
col  1: [ 2]  c1 2b
col  2: [ 3]  c2 07 49
col  3: [ 5]  42 4f 57 49 45
tab 0, row 3, @0x1856

…..

We can see the 4 columns of each row and note that the length of the ID and CODE columns are 3 and 2 bytes respectively. We can also see that the hex values of the CODE column (col 1) are all the same: c1 2b (as they all have a value of 42).

OK, time to encrypt some columns. I’ll re-create the table but this time encrypt both the ID and CODE columns using all the default settings:

SQL> drop table bowie;

Table dropped.

SQL> create table bowie (id number encrypt, code number encrypt, salary number, text varchar2(30)) tablespace unsafe;

Table created.

SQL> insert into bowie select rownum, 42, rownum, 'BOWIE' from dual connect by level <=10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL> desc bowie
Name                                      Null?    Type
----------------------------------------- -------- ----------------------------

ID                                                 NUMBER ENCRYPT
CODE                                               NUMBER ENCRYPT
SALARY                                             NUMBER
TEXT                                               VARCHAR2(30)

 

If we look at a dump of this table we notice a number of key differences:

 

Block header dump:  0x02000464
Object id on Block? Y
seg/obj: 0x16cef  csc: 0x00.2e6e1f  itc: 2  flg: E  typ: 1 – DATA
brn: 0  bdba: 0x2000460 ver: 0x01 opc: 0
inc: 0  exflg: 0

Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0004.001.00000671  0x01409e62.00ec.19  –U-   60  fsc 0x0000.002e6e7b
0x02   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
bdba: 0x02000464
data_block_dump,data header at 0x1d82e64
===============
tsiz: 0x1f98
hsiz: 0x8a
pbl: 0x01d82e64
76543210
flag=——–
ntab=1
nrow=60
frre=-1
fsbo=0x8a
fseo=0x3dc
avsp=0x352
tosp=0x352
0xe:pti[0] nrow=60 offs=0

….

 

The first difference is that where previously we had 330 rows, the block now only contains 60 rows of data.

If we look further on to the rows themselves:

tab 0, row 0, @0x1b72
tl: 118 fb: –H-FL– lb: 0x1  cc: 4
col  0: [52]
bd 33 16 9a aa 6a 68 cf f7 a2 99 6c 3d b0 0c 1f 11 3d 42 cb 76 20 52 cc 18
8a 96 fd 0e 59 41 f2 a0 0f da 96 37 40 9b 28 dd 93 82 29 74 32 f1 53 c5 27
e8 e1
col  1: [52]
57 0c 63 c8 4f a9 42 f7 61 c3 63 b2 1b 0e f9 bc 74 5b 74 46 87 08 26 23 6b
c9 ae 52 ca 0e 31 9e ac 54 79 f7 2d f9 64 41 30 e5 6e 11 00 a3 55 d4 81 42
a4 19
col  2: [ 2]  c1 3e
col  3: [ 5]  42 4f 57 49 45
tab 0, row 1, @0x1be8
tl: 118 fb: –H-FL– lb: 0x1  cc: 4
col  0: [52]
c3 13 d5 f7 a8 1f 46 8b bd 2b a9 e0 25 8e b3 15 2d 52 77 f5 fa 8a 52 46 f6
1d 0d 0a 58 8f 68 51 ea 2f 8c 0e 56 9d 2b cf 6f 86 7c d4 a9 d9 2a 25 fd a1
6e 7d
col  1: [52]
49 19 8f 40 da 0e ad fb 7a 79 6d 50 61 c9 9e 33 90 3f 73 c9 64 0a f0 98 c6
2f 8c c1 3f 47 fd 78 55 be 5c e5 df 1f 94 ab c0 3b a2 e3 37 65 f4 8e 80 c0
f4 66
col  2: [ 2]  c1 3f
col  3: [ 5]  42 4f 57 49 45
tab 0, row 2, @0x1c5e
tl: 118 fb: –H-FL– lb: 0x1  cc: 4
col  0: [52]
1e 9d 90 85 70 e8 50 39 c9 64 9f 85 6f c6 e6 03 44 8c 59 73 9a 0e a6 cb 3c
ff 41 91 42 2a 36 f1 98 b7 54 61 24 1a 62 87 3f b9 21 5e d7 43 f6 39 14 7a
d0 34
col  1: [52]
e5 a0 4a 82 06 12 88 08 fb df db b7 00 19 60 cc d1 da 98 34 1d 24 44 64 79
04 48 29 7a 2c 5d 26 06 0d f3 5a 42 1c 34 59 65 14 85 53 e7 07 ac ee 11 73
82 5f
col  2: [ 2]  c1 40
col  3: [ 5]  42 4f 57 49 45

….

We see the reason why we have fewer rows per block is that the encrypted columns have significantly increased in size. Where previously they were just 3 and 2 bytes respectively, both the ID and CODE columns are now 52 bytes in length. The actual size would in part depend on the encryption algorithm used (some algorithms round to the next 8 bytes), in this example I used the default AES192.

With AES192, the length of the column is rounded to the next 16 bytes. However, if we simply encrypt a columns as is, it would mean a column value would be encrypted to the same value when using the same encryption key. This means a malicious person could potentially attempt to reverse engineer a value by inserting known column values and seeing if the generated encrypted values are the same as those in the table. To prevent this, Oracle by default adds “Salt”, which is basically a random string, to the column value being encrypted to make it now impossible to reverse engineer the inserted value. This adds another 16 bytes to the length of the column value. If we look at the second CODE column (col 1) in the block dump, we notice they all have different encrypted values even though they all have the same actual value of 42 within the table.

So that’s 32 bytes accounted for. The remaining 20 bytes is a result of TDE adding a Message Authentication Code (MAC) to each encrypted value for integrity checking purposes.

Clearly, having columns that increase so significantly due to encryption will also have an impact on any associated indexes as they will likewise not be able to contain as many entries per index block and hence be significantly larger.

However, the more pressing issue is that by adding salt to the encryption process, there is no easy deterministic way Oracle can associate an actual indexed value with the encrypted value when accessing and modifying the index. As a result, Oracle simply doesn’t allow an index to be created on any column that has been encrypted with salt.

SQL> create index bowie_id_i on bowie(id);
create index bowie_id_i on bowie(id)
*
ERROR at line 1:
ORA-28338: Column(s) cannot be both indexed and encrypted with salt

SQL> alter table bowie add primary key(id);
alter table bowie add primary key(id)
*
ERROR at line 1:
ORA-28338: Column(s) cannot be both indexed and encrypted with salt

If we want to encrypt a column and have the column indexed, we must encrypt the column without salt. Additionally, if you want to make the index more efficient without the overheads associated with MAC integrity checks, you may also want to encrypt the columns with the NOMAC option.

Let’s re-create the table without these encryption defaults:

SQL> drop table bowie;

Table dropped.

SQL> create table bowie (id number encrypt 'nomac' no salt, code number encrypt 'nomac' no salt, salary number, text varchar2(30)) tablespace unsafe;

Table created.

SQL> insert into bowie select rownum, 42, rownum, 'BOWIE' from dual connect by level <=10000;

10000 rows created.

SQL> commit;

Commit complete.

If we look at a block dump of this table:

Block header dump:  0x020000e4
Object id on Block? Y
seg/obj: 0x16cf0  csc: 0x00.2e75a9  itc: 2  flg: E  typ: 1 – DATA
brn: 0  bdba: 0x20000e0 ver: 0x01 opc: 0
inc: 0  exflg: 0

Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0005.00a.00000872  0x0140b74e.00f0.09  –U-  148  fsc 0x0000.002e75bb
0x02   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
bdba: 0x020000e4
data_block_dump,data header at 0x1dc2e64
===============
tsiz: 0x1f98
hsiz: 0x13a
pbl: 0x01dc2e64
76543210
flag=——–
ntab=1
nrow=148
frre=-1
fsbo=0x13a
fseo=0x46d
avsp=0x333
tosp=0x333
0xe:pti[0] nrow=148 offs=0

We notice the number of rows in the table has somewhat bounced back up to 148 rows.

If we look at some of the rows:

block_row_dump:
tab 0, row 0, @0x1b01
tl: 47 fb: –H-FL– lb: 0x1  cc: 4
col  0: [16]  de 59 69 48 5e 65 d8 21 da 49 bd ba 19 d5 1f 80
col  1: [16]  9e d8 3b 95 65 60 43 df 2c e2 b0 85 ae 5e 87 61
col  2: [ 3]  c2 02 34
col  3: [ 5]  42 4f 57 49 45
tab 0, row 1, @0x1b30
tl: 47 fb: –H-FL– lb: 0x1  cc: 4
col  0: [16]  9e 96 cc e1 57 07 74 61 64 19 99 4a ad 08 d9 93
col  1: [16]  9e d8 3b 95 65 60 43 df 2c e2 b0 85 ae 5e 87 61
col  2: [ 3]  c2 02 35
col  3: [ 5]  42 4f 57 49 45
tab 0, row 2, @0x1b5f
tl: 47 fb: –H-FL– lb: 0x1  cc: 4
col  0: [16]  f8 cf 58 ba f2 1b f7 a6 2d 59 1c c7 2c e4 28 86
col  1: [16]  9e d8 3b 95 65 60 43 df 2c e2 b0 85 ae 5e 87 61
col  2: [ 3]  c2 02 36
col  3: [ 5]  42 4f 57 49 45

 

We see the length of the encrypted columns has dropped back down to 16 bytes, still more than the unencrypted columns but less than the 52 bytes required for the encrypted columns with both salt and MAC enabled.

Note though that the CODE column (col 1) while encrypted all now have the same encrypted hex value (9e d8 3b 95 65 60 43 df 2c e2 b0 85 ae 5e 87 61) without the salt applied. So the encrypted data is that little bit less secure but we can now successfully create B-tree indexes on these encrypted columns:

SQL> create index bowie_id_i on bowie(id);

Index created.

 

This however doesn’t end the various restrictions associated with indexing column encrypted columns as we’ll see in the next post.

Indexing and Transparent Data Encryption Part I (The Secret Life of Arabia) May 19, 2015

Posted by Richard Foote in Oracle Indexes.
Tags:
add a comment

Database security has been a really hot topic recently so I thought I might write a few posts in relation to indexing and Transparent Data Encryption (TDE) which is available as part of the Oracle Advanced Security option.

To protect the database from unauthorized “backed-door” accesses, the data within the actual database files can be encrypted. This helps prevent privileged users such as Database and System Administrators who have direct access to these files from being able to inappropriately access confidential data by simply viewing the data directly with OS editors and the such.

There are two basic ways one can use Oracle TDE to encrypt data within an Oracle database; via Column or Tablespace encryption. Column-based encryption has a number of restrictions, especially in relation to indexing which I’ll cover in later posts. To start with, my focus will be on Tablespace encryption.

With Tablespace encryption, all objects within the tablespace are encrypted in the same manner using the same encryption key and so has far fewer restrictions and issues.

It’s relatively easy to setup up tablespace encryption although note you must have the Oracle Advanced Security Option.

First, ensure you have an entry in your sqlnet.ora that identifies the location of your Oracle Security Wallet, such as:

ENCRYPTION_WALLET_LOCATION=
(SOURCE=
(METHOD=FILE)
(METHOD_DATA=
(DIRECTORY=D:\wallet)))

As a user with ADMINISTER KEY MANAGEMENT privileges, create a password protected keystore wallet in the location referenced in the sqlnet.ora file such as:

SQL> ADMINISTER KEY MANAGEMENT CREATE KEYSTORE 'd:\wallet' identified by bowie;

keystore altered.

This wallet contains the master key used in turn to encrypt the various column and tablespace encryption keys used within the database to encrypt the actual data. Do not lose this wallet as without it, you will not be able to access any of your encrypted data !!

Next, open the database keystore using the password specified above such as:

SQL> ADMINISTER KEY MANAGEMENT SET KEYSTORE OPEN IDENTIFIED BY bowie;

keystore altered.

Finally, create an encrypted tablespace to hold all your sensitive data, such as:

SQL> CREATE TABLESPACE safe DATAFILE 'd:\app\rfoote\oradata\orcl\safe01.dbf ' size 10m ENCRYPTION USING '3DES168' DEFAULT STORAGE(ENCRYPT);

Tablespace created.

OK, to set the scene, we’re going to first create and populate a table in an unencrypted tablespace (note I’ve created tiny tablespaces here just so I can quickly open the file with a basic windows notepad editor):

SQL> create table bowie (id number, salary number, text varchar2(30)) tablespace unsafe;

Table created.

SQL> insert into bowie select rownum, rownum, 'DAVID BOWIE' from dual connect by level <=100000;

100000 rows created.

SQL> commit;

Commit complete.

I now flush the buffer cache to ensure all the data is written to disk:

SQL> alter system flush buffer_cache;

System altered.

If I now simply opened the tablespace data file with a text or hex editor, I can easily see the contents of the table, for example:

DAVID BOWIE,Ã]Ã]
DAVID BOWIE,Ã^Ã^
DAVID BOWIE,Ã_Ã_
DAVID BOWIE,Ã`Ã`
DAVID BOWIE,ÃaÃa
DAVID BOWIE,ÃbÃb
DAVID BOWIE,ÃcÃc
DAVID BOWIE,ÃdÃd
DAVID BOWIE,ÃÃ
DAVID BOWIE,ÃÃ
DAVID BOWIE,ÃÃ

If I however create the same data but store the table in the encrypted tablespace:

SQL> create table bowie2 (id number, salary number, text varchar2(30)) tablespace safe;

Table created.

SQL> insert into bowie2 select rownum, rownum, 'DAVID BOWIE' from dual connect by level <=100000;

100000 rows created.

SQL> commit;

Commit complete.

SQL> alter system flush buffer_cache;

System altered.

If I now try to view the contents of the encrypted data file, I find there’s nothing that’s in a readable format. For example:

4VûÇÓÓž|²÷hTîç:qz-XÒåPÏœkNþr#ßto¢€p$oÉ<÷¨q뉸øÊ^rìÔó­š< Ï›”ÕïÑꖸ轫P6″²Ý÷FE›Û!涷½:|WÞÍ)᪞!
W:ë¿®¡i':5g”=FE÷W’97ð˜g2Å1h›ö{ž—êûò×ø\ßÓð¶§
£õÉÔS‡DþÚ)?ÿ\‹L<ԏ#^@iIØ—Nkå#_}Í5BüŒèÚVµÄ?Àö•*¡”õtX
”osø

So good luck to any dodgy DBA or Sys Admin folk who tries to circumvent database security by directly viewing the database files or backups or stolen copies of the files, etc.

If we now also create an index on this table within the encrypted tablespace:

SQL> create index bowie2_idx on bowie2(id, text) tablespace safe;

Index created.

SQL> alter system flush buffer_cache;

System altered.

Again, a hunt for meaningful data in the data file will be fruitless. And because the data is encrypted at the tablespace layer, the index can be used with no restrictions:

SQL> set autotrace on
SQL> select * from bowie2 where id between 42 and 46;

        ID     SALARY TEXT
---------- ---------- ------------------------------
        42         42 DAVID BOWIE
        43         43 DAVID BOWIE
        44         44 DAVID BOWIE
        45         45 DAVID BOWIE
        46         46 DAVID BOWIE

Execution Plan
----------------------------------------------------------
Plan hash value: 3132122149

--------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name       | Rows  | Bytes |  Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |            |     5 |   215 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE2     |     5 |   215 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BOWIE2_IDX |     5 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=42 AND "ID"<=46)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
         18  recursive calls
          0  db block gets
         78  consistent gets
        311  physical reads
          0  redo size
        831  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

Now comes the first trap.

If we were to replicate this data in an unencrypted tablespace, then of course the associated data would be visible within the database data files again. And it’s of course very easy to potentially replicate table data, either accidently or maliciously, via the creation of database indexes. For example, the default tablespace of a user might be in an unencrypted tablespace or a privileged user who we are trying to protect ourselves from has the create any index privilege.

So if we create an index in an unencrypted tablespace (I’m also adding a new row to easily identify it from the previous values inserted with the initial BOWIE table):

SQL> insert into bowie2 values (100001, 42, 'ZIGGY STARDUST');

1 row created.

SQL> commit;

Commit complete.

SQL> create index bowie2_all_i on bowie2(id, salary, text) tablespace unsafe;

Index created.

SQL> alter system flush buffer_cache;

System altered.

We can search all we want in the encrypted tablespace for meaningful data with little success but all the confidential data can now be seen via the index data, stored in the unencrypted tablespace:

ÃbVÃbVDAVID BOWIE@ <    ÃÁ+ZIGGY STARDUST@¡   ÃÃDAVID BOWIE@¡

Don’t let down your good TDE security intentions by being a little lazy in how you manage and police your database indexes.

Singapore Maths Question Solution and Very Interesting Observation (The Trickster) April 23, 2015

Posted by Richard Foote in Oracle Indexes.
13 comments

OK, time to reveal the solution to the somewhat tricky Singapore maths exam question I blogged previously.

Remember, there were 10 dates:

May 13   May 15   May 19

June 13   June 14

July 16   July 18

August 14   August 15   August 16

Bowie only knew the month of my birthday, Ziggy only knew the day.

Bowie begins by saying “I don’t know when your birthday is, but I know Ziggy does not know too.”

Now the first part is obvious, Bowie only knows the month and there are at least 2 possible dates for every month, so there is no way of Bowie knowing my birthday based on just the month. However, the second part is new information, Bowie knows that Ziggy doesn’t know either. How can he claim this ?

Well, there are 2 special dates on the list, May 19 and July 18 as these are the only dates that have a unique day. If Ziggy was given one of these days (18 or 19), he would instantly know my birthday. However, Bowie who only knows the month knows that Ziggy doesn’t know my birthday. He can only claim this if the month is not May and not July because if the month he was given was one of these months, Ziggy might then know my birthday.

So my birthday must be in either June or August and we can eliminate both May and July from further consideration.

Now Ziggy can likewise perform this logic and he now knows the month of my birthday must be either June or August. He now says “At first I don’t know when your birthday is, but now I know”.

The first part is again obvious as Bowie has already stated that Ziggy doesn’t at first know my birthday. But the second part is new information, he now knows. The only way he can now know is if my birthday is one of June 13, August 15 or August 16 as these are the only dates in June and August that have a unique day. If my birthday was on either June 14 or August 14, there is no way Ziggy could now know my birthday as they both have the same day.

So we’re now down to just these 3 possible dates.

Now Bowie can also likewise perform this logic and that my birthday must be one of June 13, August 15 or August 16. The final piece of information is Bowie saying “Then I also know when your birthday is.”. The only way he can say this is if my birthday is on June 13 as this is the only remaining date with a unique month. If my birthday had been either August 15 or August 16 then there is no way for Bowie is now know.

Therefore my birthday must indeed be June 13.

So congratulations to all of you who managed to get the correct answer :)

An Interesting Observation

Now here’s an interesting thing. 8 people got the correct answer. But equal on top of popular answers with 8 responses also was July 16. Why was July 16 such a popular answer, when the month of July was actually one of the earliest set of dates we could eliminate from consideration. It also wasn’t one of the unique day dates which is also a common answer ? What’s so special about July 16 that so many people picked this answer ?

Well, the original Singapore question I referenced in my original blog piece in which the students had to work out Cheryl’s birthday had a correct answer of, you guessed it, July 16. You see, not only did I change the names from the original question, but I also changed the dates as well.  However, I made sure that July 16 was one of the possible 10 dates :)

Now I don’t know how everyone got to their answer but it strikes me as an interesting coincidence that so many people got the same answer as to the original Singapore exam question. I wonder how many people looked up the answer from the Singapore question and just assumed the same answer fit here as well ? How many people saw enough information in the question for them to assume an answer, without being careful enough regarding the actual information laid out before them ? These are two similar but yet different questions which have two different answers.

And isn’t this such a common trait, especially in the IT world when trying to diagnose a performance issue. You see a problem in which you’ve seen a similar symptom before and assume that the root cause was the same as it might have been previously, even though the information at hand (if only you looked carefully enough) is entirely different and doesn’t actually lend itself to the same root cause.

Just because the reason for lots of log file sync waits before was slow write performance to the redo log files doesn’t necessarily mean it’s the same reason today. Just because you have slow RAC cluster waits previously due to interconnect issues doesn’t necessarily mean it’s now the root cause. Quickly jumping to the wrong conclusion is such an easy mistake to make when trying to diagnose a performance problem. One needs to look carefully at all the facts on hand and with an Oracle database, there is no excuse to not look at all the relevant, appropriate facts when diagnosing a performance problem.

No, just because someone previously had a brain tumour when they had a headache doesn’t necessarily mean your headache is the result of a brain tumour :)

Hope you had some fun, I certainly did !! :)

Good Singapore Maths Students Would Likely Make Good Oracle DBAs (Problems) April 15, 2015

Posted by Richard Foote in Oracle Indexes.
32 comments

An interesting mathematics based question from a Singapore high school exam has been doing the internet rounds in the past few days. Considering it’s aimed at high school students, it’s a tricky one and obviously designed to filter out the better students, in a country with a very good reputation for churning out mathematically gifted students.

I enjoy solving problems and this one took me a few minutes to work it out. However, at the end of the process, it occurred to me that I used a similar process to how I’ve often solved performance issues with Oracle databases. In fact, this question kinda reminded me of a performance issue that I had only just recently been asked by a customer to help resolve.

One needs to clearly understand the question being asked. One also needs to focus and understand the data at hand. Then use a process of elimination to both rule out and just as importantly rule in possible answers (or root causes to performance issues). Eventually, one can then narrow down and pinpoint things down to a specific solution (or set of answers).

So for example, the database is running exceptionally slow globally at certain times, why ?  So it looks like it’s because there are lots of cluster related waits at these times, why ? So it looks like it’s because writing to the redo logs is really slow at these times, why ? And so on and so on.

If you can work out the solution to this problem in a reasonably timely manner, then in all likelihood you have good problem solving skills and the makings of a good Oracle DBA. You just need to also like donuts and good whiskies :)

I’ve reproduced the question here, changing the names to protect the innocent.

“Bowie and Ziggy just become friends with me, and they want to know when my birthday is. I give them 10 possible dates:

May 13   May 15   May 19

June 13   June 14

July 16   July 18

August 14   August 15   August 16

I then tell Bowie and Ziggy separately the month and the day of my birthday respectively.

Bowie: I don’t know when your birthday is, but I know Ziggy does not know too.

Ziggy: At first I don’t know when your birthday is, but now I know.

Bowie: Then I also know when your birthday is.

So when is my birthday ?”

Feel free to comment on what you think the answer is but please don’t give away how you might have worked it out. For those interested (or for those that don’t check out the solution on the web first :) ), I’ll explain how to get to the answer in a few days time.

Like I said, if you get it right, you should consider a career as an Oracle DBA !! And here’s a link to an excellent whisky: Sullivans Cove Whisky :)

Oracle Database In-Memory Test Drive Workshop: Canberra 28 April 2015 March 30, 2015

Posted by Richard Foote in 12c, In-Memory.
4 comments

I’ll be running a free Oracle Database In-Memory Test Drive Workshop locally here in Canberra on Tuesday, 28th April 2015.

Just bring a laptop with at least 8G of RAM and I’ll supply a VirtualBox image with the Oracle Database 12c In-Memory environment. Together we’ll go through a number of hands-on labs that cover:

  1. Configuring the Product Easily
  2. Understanding Fast Table Scans (with none of those pesky indexes)
  3. Understanding Query Optimisation
  4. Understanding Transactional Consistency

It’s sure to be a fun morning. It’s also sure to fill up really quickly so please register ASAP to avoid disappointment.

For all the necessary details including how to register click here.

Hope to see you then :)

UPDATE: This event is now officially FULL. Sorry to disappoint if you haven’t yet enrolled.

12.1.0.2 Introduction to Zone Maps Part III (Little By Little) November 24, 2014

Posted by Richard Foote in 12c, Attribute Clustering, Oracle Indexes, Zone Maps.
1 comment so far

I’ve previously discussed the new Zone Map database feature and how they work in a similar manner to Exadata Storage indexes.

Just like Storage Indexes (and conventional indexes for that manner), they work best when the data is well clustered in relation to the Zone Map or index. By having the data in the table ordered in the same manner as the Zone Map, the ranges of the min/max values for each 8M “zone” in the table can be as narrow as possible, making them more likely to eliminate zone accesses.

On the other hand, if the data in the table is not well clustered, then the min/max ranges within the Zone Map can be extremely wide, making their effectiveness limited.

In my previous example on the ALBUM_ID column in my first article on this subject, the data was extremely well clustered and so the associated Zone Map was very effective. But what if the data is poorly clustered ?

To illustrate, I’m going to create a Zone Map based on the poorly clustered ARTIST_ID column, which has its values randomly distributed throughout the whole table:

SQL> create materialized zonemap big_bowie_artist_id_zm on big_bowie(artist_id);
 create materialized zonemap big_bowie_artist_id_zm on big_bowie(artist_id)
 *
 ERROR at line 1:
 ORA-31958: fact table "BOWIE"."BIG_BOWIE" already has a zonemap
 "BOWIE"."BIG_BOWIE_ALBUM_ID_ZM" on it

Another difference between an index and Zone Map is that there can only be the one Zone Map defined per table, but a Zone Map can include multiple columns. As I already have a Zone Map defined on just the ALBUM_ID column, I can’t just create another.

So I’ll drop the current Zone Map and create a new one based on both the ARTIST_ID and ALBUM_ID columns:

SQL> drop materialized zonemap big_bowie_album_id_zm;

Materialized zonemap dropped.

SQL> create materialized zonemap big_bowie_zm on big_bowie(album_id, artist_id);

Materialized zonemap created.
    
 SQL> select measure, position_in_select, agg_function, agg_column_name
 from dba_zonemap_measures where zonemap_name='BIG_BOWIE_ZM';

MEASURE              POSITION_IN_SELECT AGG_FUNCTION  AGG_COLUMN_NAME
 -------------------- ------------------ ------------- --------------------
 "BOWIE"."BIG_BOWIE".                  5 MAX           MAX_2_ARTIST_ID
 "ARTIST_ID"

"BOWIE"."BIG_BOWIE".                  4 MIN           MIN_2_ARTIST_ID
 "ARTIST_ID"

"BOWIE"."BIG_BOWIE".                  3 MAX           MAX_1_ALBUM_ID
 "ALBUM_ID"

"BOWIE"."BIG_BOWIE".                  2 MIN           MIN_1_ALBUM_ID
 "ALBUM_ID"

So this new Zone Map has min/max details on each zone in the table for both the ARTIST_ID and ALBUM_ID columns.

The min/max ranges of a Zone Map provides an excellent visual representation of the clustering of the data. If I select Zone Map details of the ALBUM_ID column (see partial listing below):

SQL> select zone_id$, min_1_album_id, max_1_album_id, zone_rows$ from big_bowie_zm;

ZONE_ID$ MIN_1_ALBUM_ID MAX_1_ALBUM_ID ZONE_ROWS$
 ---------- -------------- -------------- ----------
 3.8586E+11              1              2      66234
 3.8586E+11              5              6      56715
 3.8586E+11              7              7      76562
 3.8586E+11              7              8      76632
 3.8586E+11              8              9      76633
 3.8586E+11             21             22      75615
 3.8586E+11             29             29      75582
 3.8586E+11             31             32      75545
 3.8586E+11             35             36      75617
 3.8586E+11             43             44      75615
 ...

3.8586E+11             76             77      75615
 3.8586E+11             79             80      75615
 3.8586E+11             86             87      75616
 3.8586E+11             88             89      75618
 3.8586E+11             97             97      75771
 3.8586E+11            100            100      15871

134 rows selected.

As the data in the table is effectively ordered based on the ALBUM_ID column (and so is extremely well clustered in relation to this column), the min/max ranges for each zone is extremely narrow. Each zone basically only contains one or two different values of ALBUM_ID and so if I’m just after a specific ALBUM_ID value, the Zone Map is very effective in eliminating zones from having to be accessed. Just what we want.

However, if we look at the Zone Map details of the poorly clustered ARTIST_ID column (again just a partial listing):

SQL> select zone_id$, min_2_artist_id, max_2_artist_id, zone_rows$ from big_bowie_zm;

ZONE_ID$ MIN_2_ARTIST_ID MAX_2_ARTIST_ID ZONE_ROWS$
 ---------- --------------- --------------- ----------
 3.8586E+11            3661           98244      66234
 3.8586E+11               1          100000      56715
 3.8586E+11            5273           81834      76562
 3.8586E+11               1          100000      76632
 3.8586E+11               1          100000      76633
 3.8586E+11               1          100000      75615
 3.8586E+11            2383           77964      75582
 3.8586E+11               1          100000      75545
 3.8586E+11               1          100000      75617
 3.8586E+11               1          100000      75615
 ...

3.8586E+11               1          100000      75615
 3.8586E+11               1          100000      75615
 3.8586E+11               1          100000      75615
 3.8586E+11               1          100000      75615
 3.8586E+11               1          100000      75616
 3.8586E+11               1          100000      75618
 3.8586E+11            4848           80618      75771
 3.8586E+11           84130          100000      15871

134 rows selected.

We notice the ranges for most of the zones is extremely large, with many actually having a min value of 1 (the actual minimum) and a max of 100000 (the actual maximum). This is a worst case scenario as a specific required value could potentially reside in most of the zones, thereby  forcing Oracle to visit most zones and making the Zone Map totally ineffective.

If we run a query searching for a specific ARTIST_ID:

SQL> select * from big_bowie where artist_id=42;

100 rows selected.

Elapsed: 00:00:00.69

Execution Plan
 ----------------------------------------------------------
 Plan hash value: 1980960934

----------------------------------------------------------------------------------------------------
 | Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
 ----------------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT                       |           |    99 |  9108 |  3291  (13)| 00:00:01 |
 |*  1 |  TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE |    99 |  9108 |  3291  (13)| 00:00:01 |
 ----------------------------------------------------------------------------------------------------
 Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("ARTIST_ID"=42)
 filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
 BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_2_ARTIST_ID" > :1 OR
 zm."MAX_2_ARTIST_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."BIG_BOWIE_ZM" zm WHERE
 zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND
 "ARTIST_ID"=42)

      
 Statistics
 ----------------------------------------------------------
 141  recursive calls
 0  db block gets
 101614  consistent gets
 0  physical reads
 0  redo size
 5190  bytes sent via SQL*Net to client
 618  bytes received via SQL*Net from client
 8  SQL*Net roundtrips to/from client
 0  sorts (memory)
 0  sorts (disk)
 100  rows processed

We notice we are forced to perform a very high number of consistent gets (101,614) when returning just 100 rows, much higher than the 2,364 consistent gets required to return a full 100,000 rows for a specific ALBUM_ID and not far from the 135,085 consistent gets when performing a full table scan.

We need to improve the performance of these queries based on the ARTIST_ID column …

Let’s drop this zone map:

SQL> drop materialized zonemap big_bowie_zm;

Materialized zonemap dropped.

and change the physical clustering of the data in the table so that the data is primarily now clustered in ARTIST_ID order:

 

SQL> alter table big_bowie add clustering by linear order(artist_id, album_id) with materialized zonemap;

Table altered.

So we have added a clustering attribute to this table (previously discussed here) and based a new Zone Map on this clustering at the same time.

SQL> select zonemap_name from dba_zonemaps where fact_table='BIG_BOWIE';

ZONEMAP_NAME
---------------
ZMAP$_BIG_BOWIE

SQL> select zonemap_name, pruning, with_clustering, invalid, stale, unusable
from dba_zonemaps where zonemap_name = 'ZMAP$_BIG_BOWIE';

ZONEMAP_NAME    PRUNING  WITH_CLUSTERING INVALID STALE   UNUSABLE
--------------- -------- --------------- ------- ------- --------
ZMAP$_BIG_BOWIE ENABLED  YES             NO      NO      NO

However, as we haven’t actually reorganized the table, the rows in the table are still clustered the same as before:

SQL> select zone_id$, min_2_album_id, max_2_album_id, zone_rows$ from zmap$_big_bowie;

ZONE_ID$ MIN_2_ALBUM_ID MAX_2_ALBUM_ID ZONE_ROWS$
---------- -------------- -------------- ----------
3.8586E+11             43             44      75615
3.8586E+11              1              2      66234
3.8586E+11             81             82      75615
3.8586E+11             29             29      75582
3.8586E+11             50             50      75481
3.8586E+11             90             91      75484
3.8586E+11              5              6      56715
3.8586E+11              7              8      76632
3.8586E+11              8              9      76633
3.8586E+11             16             16      75481
...

3.8586E+11             44             44      75480
3.8586E+11             82             83      75616
3.8586E+11            100            100      15871
3.8586E+11             34             35      75576
3.8586E+11             14             15      75615
3.8586E+11             33             34      75616
3.8586E+11              3              5      75707

134 rows selected.

SQL> select zone_id$, min_1_artist_id, max_1_artist_id, zone_rows$ from zmap$_big_bowie;

ZONE_ID$ MIN_1_ARTIST_ID MAX_1_ARTIST_ID ZONE_ROWS$
---------- --------------- --------------- ----------
3.8586E+11               1          100000      75545
3.8586E+11               1          100000      75616
3.8586E+11               1          100000      75617
3.8586E+11               1          100000      75911
3.8586E+11               1          100000      75616
3.8586E+11               1          100000      75616
3.8586E+11               1          100000      75615
3.8586E+11               1          100000      75616
3.8586E+11             132           75743      75612
3.8586E+11               1          100000      75615
...

3.8586E+11               1          100000      66296
3.8586E+11               1          100000      75615
3.8586E+11            2360           96960      75701
3.8586E+11               1          100000      75615
3.8586E+11               1          100000      75616
3.8586E+11           23432           98911      75480
3.8586E+11               1          100000      75791
3.8586E+11           21104           96583      75480

134 rows selected.

But if we now reorganise the table so that the clustering attribute can take effect:

SQL> alter table big_bowie move;

Table altered.

We notice the characteristics of the Zone Map has change dramatically. The previously well clustered ALBUM_ID now has a totally ineffective Zone Map with all the ranges effectively consisting of the full min/max values:

SQL> select zone_id$, min_2_album_id, max_2_album_id, zone_rows$ from zmap$_big_bowie;

ZONE_ID$ MIN_2_ALBUM_ID MAX_2_ALBUM_ID ZONE_ROWS$
---------- -------------- -------------- ----------
3.9704E+11              1            142      21185
3.9704E+11              1            100       9452
3.9704E+11              1            100      76516
3.9704E+11              1            100      75501
3.9704E+11              1            100      75497
3.9704E+11              1            100      75501
3.9704E+11              1            100      75499
3.9704E+11              1            100      75504
3.9704E+11              1            100      75500
3.9704E+11              1            100      75501
...

3.9704E+11              1            100      75503
3.9704E+11              1            100      75498
3.9704E+11              1            100      75501
3.9704E+11              1            100      75501
3.9704E+11              1            100      75501
3.9704E+11              1            100      75501
3.9704E+11              1            100      75794

144 rows selected.

While the previously ineffective Zone Map on the ARTIST_ID column is now much more effective with significantly smaller min/max ranges for each zone:

SQL> select zone_id$, min_1_artist_id, max_1_artist_id, zone_rows$ from zmap$_big_bowie;

ZONE_ID$ MIN_1_ARTIST_ID MAX_1_ARTIST_ID ZONE_ROWS$
---------- --------------- --------------- ----------
3.9704E+11              67            1036      21185
3.9704E+11            2359            2453       9452
3.9704E+11            8341            9106      76516
3.9704E+11           18933           19688      75501
3.9704E+11           22708           23463      75497
3.9704E+11           26483           27238      75501
3.9704E+11           27238           27993      75499
3.9704E+11           33278           34033      75504
3.9704E+11           36674           40449      75500
3.9704E+11           38563           39318      75501
...

3.9704E+11           49888           50643      75503
3.9704E+11           62723           63478      75498
3.9704E+11           77824           78579      75501
3.9704E+11           82354           83109      75501
3.9704E+11           88394           89149      75501
3.9704E+11           93679           94434      75501
3.9704E+11           98211           98969      75794

144 rows selected.

The same query now runs so much faster as the Zone Map can eliminate almost all zones from being accessed:

SQL> select * from big_bowie where artist_id=42;

100 rows selected.

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
Plan hash value: 1980960934

----------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |           |    99 |  9108 |  3291  (13)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE |    99 |  9108 |  3291  (13)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ARTIST_ID"=42)
filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_1_ARTIST_ID" > :1 OR
zm."MAX_1_ARTIST_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."ZMAP$_BIG_BOWIE" zm WHERE
zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND
"ARTIST_ID"=42)
Statistics
----------------------------------------------------------
187  recursive calls
0  db block gets
175  consistent gets
0  physical reads
0  redo size
5190  bytes sent via SQL*Net to client
618  bytes received via SQL*Net from client
8  SQL*Net roundtrips to/from client
9  sorts (memory)
0  sorts (disk)
100  rows processed

Consistent gets has reduced dramatically down to just 175 from the previously massive 101,614.

As is common with changing the clustering of data, what improves one thing makes something else significantly worse. The previously efficient accesses based on the ALBUM_ID column is now nowhere near as efficient as before:

SQL> select * from big_bowie where album_id = 42;

100000 rows selected.

Elapsed: 00:00:01.27

Execution Plan
----------------------------------------------------------
Plan hash value: 1980960934

----------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |           |   100K|  8984K|  3269  (12)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE |   100K|  8984K|  3269  (12)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ALBUM_ID"=42)
filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_2_ALBUM_ID" > :1 OR
zm."MAX_2_ALBUM_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."ZMAP$_BIG_BOWIE" zm WHERE
zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND "ALBUM_ID"=42)

Statistics
----------------------------------------------------------
187  recursive calls
0  db block gets
141568  consistent gets
0  physical reads
0  redo size
4399566  bytes sent via SQL*Net to client
73878  bytes received via SQL*Net from client
6668  SQL*Net roundtrips to/from client
9  sorts (memory)
0  sorts (disk)
100000  rows processed

We now have to perform a whopping 141,568 consistent gets up from the previous 2,364 consistent gets.

So Zone Maps, like database indexes and Exadata Storage Indexes, can be extremely beneficial in reducing I/O but their effectiveness is very much dependant on the clustering of the underlining data.

Index Advanced Compression vs. Bitmap Indexes (Candidate) October 31, 2014

Posted by Richard Foote in 12c, Advanced Index Compression, Bitmap Indexes, Oracle Indexes.
6 comments

A good question from Robert Thorneycroft I thought warranted its own post. He asked:

I have a question regarding bitmapped indexes verses index compression. In your previous blog titled ‘So What Is A Good Cardinality Estimate For A Bitmap Index Column ? (Song 2)’ you came to the conclusion that ‘500,000 distinct values in a 1 million row table’ would still be a viable scenario for deploying bitmapped indexes over non-compressed b-tree indexes.

Now b-tree index compression is common, especially with the release of Advanced Index Compression how does this affect your conclusion? Are there still any rules of thumb which can be used to determine when to deploy bitmapped indexes instead of compressed b-tree indexes or has index compression made bitmapped indexes largely redundant?”

 

If you’re not familiar with Bitmap Indexes, it might be worth having a read of my previous posts on the subject.

Now Advanced Index Compression introduced in 12.1.0.2 has certainly made compressing indexes a lot easier and in many scenarios, more efficient than was previously possible. Does that indeed mean Bitmap Indexes, that are relatively small and automatically compressed, are now largely redundant ?

The answer is no, Bitmap Indexes are still highly relevant in Data Warehouse environments as they have a number of key advantages in the manner they get compressed over B-Tree Indexes.

Compression of a B-Tree index is performed within a leaf block where Oracle effectively de-duplicates the index entries (or parts thereof). This means that a highly repeated index value might need to be stored repeatedly in each leaf block. Bitmap index entries on the other hand can potentially span the entire table and only need to be split if the overall size of the index entries exceeds 1/2 a block. Therefore, the number of indexed values stored in a Bitmap Index can be far less than with a B-tree.

However, it’s in the area of storing the associated rowids where Bitmap Indexes can have the main advantage. With a B-tree index, even when highly compressed, each and every index entry must have an associated rowid stored in the index. If you have say 1 million index entries, that’s 1 million rowids that need to be stored, regardless of the compression ratio. With a Bitmap Index, an index entry has 2 rowids to specify the range of rows covered by the index entry, but this might be sufficient to cover the entire table. So depending on the number of distinct values being indexed in say a million row table, there may be dramatically fewer than 1 million rowids stored in the Bitmap Index.

To show how Bitmap Indexes are generally much smaller than corresponding compressed B-Tree indexes, a few simple examples.

In example 1, I’m going to create a B-Tree Index that is perfect candidate for compression. This index has very large indexed values that are all duplicates and so will compress very effectively:

SQL> create table ziggy (id number, weird varchar2(100));

Table created.

SQL> insert into ziggy select rownum, 'THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS'
     from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index ziggy_weird_i on ziggy(weird) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY_WEIRD_I';

INDEX_NAME        BLEVEL LEAF_BLOCKS   NUM_ROWS
------------- ---------- ----------- ----------
ZIGGY_WEIRD_I          2        9175    1000000

SQL> drop index ziggy_weird_i2;

Index dropped.

SQL> create index ziggy_weird_i on ziggy(weird) pctfree 0 compress advanced low;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY_WEIRD_I';

INDEX_NAME        BLEVEL LEAF_BLOCKS   NUM_ROWS
------------- ---------- ----------- ----------
ZIGGY_WEIRD_I          2        1389    1000000

 

So this index has compressed down from 9175 leaf blocks to just 1389. That’s impressive.

However, this scenario is also the perfect case for a Bitmap Index with large, highly repeated index entries. If we compare the compressed B-Tree Index with a corresponding Bitmap index:

SQL> create bitmap index ziggy_weird_i on ziggy(weird) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY_WEIRD_I';

INDEX_NAME        BLEVEL LEAF_BLOCKS   NUM_ROWS
------------- ---------- ----------- ----------
ZIGGY_WEIRD_I          1          21         42

 

At just a tiny 21 leaf blocks, the Bitmap Index wins by a mile.

In example 2, I’m going to create an index that still almost a perfect case for compressing a B-Tree Index, but far less so for a Bitmap Index. I’m going to create enough duplicate entries to just about fill a specific leaf block, so that each leaf block only has 1 or 2 distinct index values. However, as we’ll have many more distinct indexed values overall, this means we’ll need more index entries in the corresponding Bitmap Index.

SQL> create table ziggy2 (id number, weird varchar2(100));

Table created.

SQL> insert into ziggy2 select rownum, 'THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS'||mod(rownum,1385)
     from dual connect by level<=1000000;

1000000 rows created.

SQL> commit;

Commit complete.
SQL> create index ziggy2_weird_i on ziggy2(weird) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY2_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY2_WEIRD_I          2        9568    1000000

SQL> drop index ziggy2_weird_i;

Index dropped.

SQL> create index ziggy2_weird_i on ziggy2(weird) pctfree 0 compress advanced low;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY2_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY2_WEIRD_I          2        1401    1000000

 

So we have a relatively large indexed column that has some 1385 distinct values but each value just about fills out a compress leaf block. If we look at the compression of the index, we have reduced the index down from 9568 leaf blocks to just 1401 leaf blocks. Again, a very impressive compression ratio.

Unlike the previous example where we had just the one value, we now have some 1385 index entries that need to be created as a minimum for our Bitmap Index. So how does it compare now ?

SQL> drop index ziggy2_weird_I;

Index dropped.

SQL> create bitmap index ziggy2_weird_i on ziggy2(weird) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY2_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY2_WEIRD_I          2         462       1385

 

Although the Bitmap Index is much larger than it was in the previous example, at just 464 leaf blocks it’s still significantly smaller than the corresponding compressed 1401 leaf block B-Tree index.

OK, example 3, we’re going to go into territory where no Bitmap Index should tread (or so many myths would suggest). We going to index a column in which each value only has the one duplicate. So for our 1 million row table, the column will have some 500,000 distinct values.

With relatively few duplicate column values, the compression of our B-Tree Indexes is not going to be as impressive. However, because the indexed values are still relatively large, any reduction here would likely have some overall impact:

SQL> create table ziggy3 (id number, weird varchar2(100));

Table created.

SQL> insert into ziggy3 select rownum, 'THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS'||mod(rownum,500000)
     from dual connect by level<=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index ziggy3_weird_i on ziggy3(weird) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY3_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY3_WEIRD_I          2        9891    1000000

SQL> drop index ziggy3_weird_i;

Index dropped.

SQL> create index ziggy3_weird_i on ziggy3(weird) pctfree 0 compress advanced low;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY3_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY3_WEIRD_I          2        6017    1000000

 

So the compression ratio is not as good now, coming down to 6017 leaf blocks from 9891. However, this will surely be better than a Bitmap Index with 500,000 distinct values …

 

SQL> drop index ziggy3_weird_i;

Index dropped.

SQL> create bitmap index ziggy3_weird_i on ziggy3(weird) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY3_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY3_WEIRD_I          2        5740     500000

 

So even in this extreme example, the Bitmap Index at 5740 leaf blocks is still smaller than the corresponding compressed B-Tree Index at 6017 leaf blocks.

In this last example 4, it’s a scenario similar to the last one, except the index entries themselves are going to be much smaller (a few byte number column vs. the 60 odd byte varchar2). Therefore, the rowids of the index entries will be a much larger proportion of the overall index entry size. Reducing the storage of index values via compression will be far less effective, considering the prefix table in a compressed index comes with some overhead.

SQL> create table ziggy4 (id number, weird number);

Table created.

SQL> insert into ziggy4 select rownum, mod(rownum,500000) from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index ziggy4_weird_i on ziggy4(weird) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY4_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY4_WEIRD_I          2        1998    1000000

SQL> drop index ziggy4_weird_i;

Index dropped.

SQL> create index ziggy4_weird_i on ziggy4(weird) pctfree 0 compress advanced low;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY4_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY4_WEIRD_I          2        1998    1000000

 

So Index Advanced Compression has decided against compressing this index, it’s just not worth the effort. If we force compression:

 

SQL> drop index ziggy4_weird_i;

Index dropped.

SQL> create index ziggy4_weird_i on ziggy4(weird) pctfree 0 compress;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY4_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY4_WEIRD_I          2        2065    1000000

 

We notice the index has actually increased in size, up to 2065 leaf blocks from 1998. The overheads of the prefix table over-ride the small efficiencies of reducing the duplicate number indexed values.

Meanwhile the corresponding Bitmap Index:

SQL> drop index ziggy4_weird_i;

Index dropped.

SQL> create bitmap index ziggy4_weird_i on ziggy4(weird) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY4_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY4_WEIRD_I          2        1817     500000

 

Is still smaller at 1817 leaf blocks than the best B-Tree index has to offer.

So the answer is no, Bitmap Indexes are not now redundant now we have Index Advanced Compression. In Data Warehouse environments, as long as they don’t reference column values that are approaching uniqueness,  Bitmap Indexes are likely going to be smaller than corresponding compressed B-Tree indexes.

12.1.0.2 Introduction to Zone Maps Part II (Changes) October 30, 2014

Posted by Richard Foote in 12c, Exadata, Oracle Indexes, Zone Maps.
add a comment

In Part I, I discussed how Zone Maps are new index like structures, similar to Exadata Storage Indexes, that enables the “pruning” of disk blocks during accesses of the table by storing the min and max values of selected columns for each “zone” of a table. A Zone being a range of contiguous (8M) blocks.

I showed how a Zone Map was relatively tiny but very effective in reducing the number of consistent gets for a well clustered column (ALBUM_ID).

In this post, we’re going to continue with the demo and look at what happens when we update data in the table with a Zone Map in place.

So lets update the ALBUM_ID column (which currently has a Zone Map defined) for a few rows. The value of ALBUM_ID was previously 1 for all these rows (the full range of values is currently between 1 and 100) but we’re going to update them to 142:

SQL> update big_bowie set album_id=142 where id between 1 and 100;

100 rows updated.

SQL> commit;

Commit complete.

 

So the maximum value of ALBUM_ID is now 142, not 100. If we look at the maximum value as currently listed in the Zone Map:

 

SQL> select max(max_1_album_id) from  big_bowie_album_id_zm;

MAX(MAX_1_ALBUM_ID)
-------------------
100

 

We notice the maximum is still defined as being 100. So the update on the table has not actually updated the contents of the Zone Map. So this is a big difference between Zone Maps and conventional indexes, indexes are automatically updated during DML operations, Zone Maps are not (unless the REFRESH ON COMMIT option is specified).

If we look at the state of Zone Map entries that have a minimum of 1 (the previous values of ALBUM_ID before the update):

SQL> select * from big_bowie_album_id_zm where min_1_album_id = 1;

ZONE_ID$ MIN_1_ALBUM_ID MAX_1_ALBUM_ID ZONE_LEVEL$ ZONE_STATE$ ZONE_ROWS$
---------- -------------- -------------- ----------- ----------- ----------
3.8586E+11              1              2           0           0      66234
3.8586E+11              1              2           0           1      65787
3.8586E+11              1              2           0           0      66223

 

We notice that one of the entries has a status of 1, meaning that a specific zone has been marked as stale. However, all the other zones are still OK.

If we look at the status of the overall Zone Map:

SQL> select zonemap_name, pruning, refresh_mode, invalid, stale, unusable
from dba_zonemaps where zonemap_name='BIG_BOWIE_ALBUM_ID_ZM';

ZONEMAP_NAME              PRUNING  REFRESH_MODE      INVALID STALE   UNUSABLE
------------------------- -------- ----------------- ------- ------- --------
BIG_BOWIE_ALBUM_ID_ZM     ENABLED  LOAD DATAMOVEMENT NO      NO      NO

 

We notice that the Zone Map is still “hunky dory” after the update.

If we now re-run the query we ran in Part I:

 

SQL> select * from big_bowie where album_id = 42;

100000 rows selected.

Elapsed: 00:00:00.29

Execution Plan
----------------------------------------------------------
Plan hash value: 1980960934

----------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |           |   100K|  8984K|  3269  (12)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE |   100K|  8984K|  3269  (12)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ALBUM_ID"=42)
filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_1_ALBUM_ID" > :1 OR
zm."MAX_1_ALBUM_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."BIG_BOWIE_ALBUM_ID_ZM" zm
WHERE zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND "ALBUM_ID"=42)
Statistics
----------------------------------------------------------
141  recursive calls
0  db block gets
3238  consistent gets
0  physical reads
0  redo size
3130019  bytes sent via SQL*Net to client
761  bytes received via SQL*Net from client
21  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
100000  rows processed

 

We see the Zone Map was still used by the CBO. The number of consistent gets has increased (up from 2364 to 3238) as we now have to additional access all the blocks associated with this stale zone, but it’s still more efficient that reading all the blocks from the entire table.

If we want to remove the stale zone entries, we can refresh the Zone Map or rebuild it (for ON DEMAND refresh):

 

SQL> alter materialized zonemap big_bowie_album_id_zm rebuild;

Materialized zonemap altered.

 

If we now look at the Zone Map entry:

 

SQL> select * from big_bowie_album_id_zm where min_1_album_id = 1;

ZONE_ID$ MIN_1_ALBUM_ID MAX_1_ALBUM_ID ZONE_LEVEL$ ZONE_STATE$ ZONE_ROWS$
---------- -------------- -------------- ----------- ----------- ----------
3.8586E+11              1              2           0           0      66234
3.8586E+11              1            142           0           0      65787
3.8586E+11              1              2           0           0      66223

 

We see that the entry is no longer stale and now correctly reflects the actual maximum value within the zone (142).

If we now re-run the query:

SQL> select * from big_bowie where album_id = 42;

100000 rows selected.

Elapsed: 00:00:00.30

Execution Plan
----------------------------------------------------------
Plan hash value: 1980960934

----------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |           |   100K|  8984K|  3269  (12)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE |   100K|  8984K|  3269  (12)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ALBUM_ID"=42)
filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_1_ALBUM_ID" > :1 OR
zm."MAX_1_ALBUM_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."BIG_BOWIE_ALBUM_ID_ZM" zm
WHERE zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND "ALBUM_ID"=42)
Statistics
----------------------------------------------------------
141  recursive calls
0  db block gets
3238  consistent gets
0  physical reads
0  redo size
3130019  bytes sent via SQL*Net to client
761  bytes received via SQL*Net from client
21  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
100000  rows processed

 

We notice nothing has appreciably changed, the Zone Map is still being used but the number of consistent gets remains the same as before. Why haven’t we returned back to our previous 2364 consistent gets ?

Well, as the range of possible values within the updated zone is now between 1 and 142, the required value of 42 could potentially be found within this zone and so still needs to be accessed just in case. We know that the value of 42 doesn’t exist within this zone, but Oracle has no way of knowing this based on the possible 1 to 142 range.

Hence Zone Maps work best when the data is well clustered and the Min/Max ranges of each zone can be used to limit which zones need to be accessed. If the data was not well clustered and the values within each zone mostly had ranges between the min and max values, then Oracle wouldn’t be able to effectively prune many/any zone and the Zone Map would be useless.

As we’ll see in Part III :)

Index Compression Part VI: 12c Index Advanced Compression Block Dumps (Tumble and Twirl) October 9, 2014

Posted by Richard Foote in 12c, Advanced Index Compression, Block Dumps, Index Compression, Oracle Indexes.
5 comments

Sometimes, a few pictures (or in this case index block dumps) is better than a whole bunch of words :)

In my previous post, I introduced the new Advanced Index Compression feature, whereby Oracle automatically determines how to best compress an index. I showed a simple example of an indexed column that had sections of index entries that were basically unique (and so don’t benefit from compression) and other sections with index entries that had many duplicates (that do compress well). Advanced Index Compression enables Oracle to automatically just compress those index leaf blocks where compression is beneficial.

If we look at a couple of partial block dumps from this index, first a dump from a leaf block that did have duplicate index entries:

Leaf block dump
===============
header address 216542820=0xce82e64
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0xa0: opcode=0: iot flags=-C- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 651
kdxcofbo 1346=0x542
kdxcofeo 2172=0x87c
kdxcoavs 826
kdxlespl 0
kdxlende 0
kdxlenxt 25166046=0x18000de
kdxleprv 25166044=0x18000dc
kdxledsz 0
kdxlebksz 8036
kdxlepnro 1
kdxlepnco 1 (Adaptive)
prefix row#0[8031] flag: -P—–, lock: 0, len=5
col 0; len 2; (2):  c1 2b
prc 651
row#0[8022] flag: ——-, lock: 0, len=9
col 0; len 6; (6):  01 80 1e 86 00 5c
psno 0
row#1[8013] flag: ——-, lock: 0, len=9
col 0; len 6; (6):  01 80 1e 86 00 5d
psno 0
row#2[8004] flag: ——-, lock: 0, len=9
col 0; len 6; (6):  01 80 1e 86 00 5e
psno 0
row#3[7995] flag: ——-, lock: 0, len=9
col 0; len 6; (6):  01 80 1e 86 00 5f
psno 0
row#4[7986] flag: ——-, lock: 0, len=9
col 0; len 6; (6):  01 80 1e 86 00 60
psno 0

row#650[2172] flag: ——-, lock: 0, len=9
col 0; len 6; (6):  01 80 1e 8d 00 10
psno 0
—– end of leaf block Logical dump —–

The red section is a portion of the index header that determines the number of rows in the prefix table of the index (kdxlepnro 1). The prefix table basically lists all the distinct column values in the leaf blocks that are to be compressed. The value 1 denotes there is actually only just the 1 distinct column value in this specific leaf block (i.e. all index entries have the same indexed value). This section also denotes how many of the indexed columns are to be compressed (kdxlepnco 1). As this index only has the one column, it also has a value of 1. Note this value can potentially be anything between 0 (no columns compressed) up to the number of columns in the index.  The (Adaptive) reference tells us that Index Advanced Compression has been used and that the values here can change from leaf block to leaf block depending on the data characteristics of the index entries within each leaf block (a dump of a basic compressed index will not have the “Adaptive” reference).

The green section is the compression prefix table and details all the unique combinations of index entries to be compressed within the leaf block. As all indexed values are the same in this index (value 42, internally represented as c1 2b hex), the prefix table only has the one row. prc 651 denotes that all 651 index entries in this leaf block have this specific indexed value.

Next follows all the actual index entries, which now only consist of the rowid (the 6 byte col 0 column) as they all reference psno 0, which is the unique row id of the only row within the prefix table (row#0).

So rather than storing the indexed value 651 times, we can just store the index value (42) just the once within the prefix table and simply reference it from within the actual index entries. This is why index compression can save us storage, storing something once within a leaf block rather than multiple times.

If we now look at a partial block dump of another index leaf block within the index, that consists of many differing (basically unique) index entries:

Leaf block dump
===============
header address 216542820=0xce82e64
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0xa0: opcode=0: iot flags=-C- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 449
kdxcofbo 938=0x3aa
kdxcofeo 1754=0x6da
kdxcoavs 816
kdxlespl 0
kdxlende 0
kdxlenxt 25168667=0x1800b1b
kdxleprv 25168665=0x1800b19
kdxledsz 0
kdxlebksz 8036
kdxlepnro 0
kdxlepnco 0 (Adaptive)
row#0[8022] flag: ——-, lock: 0, len=14
col 0; len 4; (4):  c3 58 3d 2c
col 1; len 6; (6):  01 80 12 e6 00 41
row#1[8008] flag: ——-, lock: 0, len=14
col 0; len 4; (4):  c3 58 3d 2d
col 1; len 6; (6):  01 80 12 e6 00 42
row#2[7994] flag: ——-, lock: 0, len=14
col 0; len 4; (4):  c3 58 3d 2e
col 1; len 6; (6):  01 80 12 e6 00 43


row#448[1754] flag: ——-, lock: 0, len=14
col 0; len 4; (4):  c3 58 41 5c
col 1; len 6; (6):  01 80 12 ee 00 1d
—– end of leaf block Logical dump —–

We notice that in the red section, both kdxlepnro 0 and kdxlepnco 0 (Adaptive) have a value of 0, meaning we have no rows and no columns within the prefix table. As such, we have no prefix table at all here and that this leaf block has simply not been compressed.

If we look at the actual index entries, they all have an additional column now in blue, that being the actual indexed value as all the index values in this leaf block are different from each other. Without some form of index entry duplication, there would be no benefit from compression and Index Advanced Compression has automatically determined this and not bothered to compress this leaf block. An attempt to compress this block would have actually increased the necessary overall storage for these index entries, due to the additional overheads associated with the prefix table (note it has an additional 2 byes of overhead per row within the prefix table).

I’ll next look at an example of a multi-column index and how Index Advanced Compression handles which columns in the index to compress.

Follow

Get every new post delivered to your Inbox.

Join 2,073 other followers