jump to navigation

Regular Index vs. Partitions (The Best Of Both Worlds) August 28, 2008

Posted by Richard Foote in Oracle Cost Based Optimizer, Oracle Indexes, Partitioning.
trackback

When asking for requests and suggestions on what I could cover in the blog, Brian Tkatch asked “When is a partition better than just a regular index ?”

As it’s a very good question, I thought I might spend a few moments attempting to provide some kind of meaningful answer.

The first point I would make however, is that indexes and partitions are not at all mutually exclusive. A table of course could very well benefit from being both partitioned and indexed. Most partitioned tables have indexes and indeed indexes can in turn be partitioned.

The second point I would make is that there are two key advantages or reasons why it might be beneficial to partition a segment. They are:

  1. Better Manageability
  2. Better Performance

This post can be viewed as being but an introduction to the potential performance benefits of partitioned segments. I will leave the topic of better manageability for another time.

The point of Brian’s question is when can the performance benefits available to partitioned segments out perform and be a better option than that of a possible associated index.

As I’ve discussed a number of times, there are no magic numbers or ratios as to when an index is the most appropriate and cost effective access path. If fact, the link shows a theoretical example of when a Full Table Scan (FTS) is by far the cheaper option when selecting just 1% of data. Generally speaking though, the higher the percentage of returned rows, the more likely a FTS is going to be the more efficient option.

However, a Full Table Scan is just that, a FULL Table Scan. Oracle needs to read the entire table, each and every block below the segment High Water Mark. So if a query needed to retrieve just 5% of the data, it’s quite likely it would perform a FTS because it’s possibly a cheaper and less costly alternative than using an associated index (the above link explains why) but it would need to access 100% of the table to do so. That means it would need to effectively access 95% of the data and simply just throw away the rows so it can access the 5% of data it actually requires in a more efficient manner than possible using an index access path.

That’s potentially (say) 95% of the work and resources for no real reason other than Oracle has no choice but to access all the data when performing a FTS. That’s potentially 95% of wasted resources, 95% of wasted effort. Yes, a TFS might be more efficient than using an index but accessing unnecessarily say 95% of data sounds rather costly nonetheless.

Wouldn’t it be nice if somehow we could get the benefits of performing a FTS (multiblock reads, reading a specific block just the once, etc.) but only read the say 5% of the data that we were actually interested in (or something significantly less than the 100%)? Somehow combine the benefits of a FTS with the benefits provided by an index, that being only needing to access data that is of direct interest.

Enter Partitioning.

Partitioning provides Oracle with another level of granularity (or levels of granularity with composite partitions) when accessing a table. It’s no longer an question of using an index or reading the entire table. Oracle can now either use an index, access the entire table or with partitioning access just those partitions (or bits) of a table that can only possibly contain data of interest.

By partitioning a table, Oracle knows that data belonging to the partitioning keys must belong in a specific partition. The logical table is effectively broken down into multiple physical segments with each partition segment containing only data associated with the partitioning keys or columns.

If the say 5% of data that’s of interest only logically resides in the one or whatever number of partitions, then there’s no need for Oracle to access the data in all the other partitions, containing the other say 95% of unwanted data. The Cost based Optimizer (CBO) can automatically take advantage of “partition pruning” and only needs to access the partition or those partitions that only can contain the data of interest.

Oracle can now potentially perform a so-called “FTS” of the table, but only actually accesses and reads the 5% (or significantly less than the 100%) of data that’s potentially of direct interest by accessing just those partitions it needs to.

At a simplistic level, indexes are best when accessing a relatively “small” amount of data. A FTS is best when accessing a relatively “large” amount of the data and partitions can be extremely useful and beneficial when accessing an amount of data somewhere between the two. As such, partitioning is more likely to be beneficial in a Data Warehouse or Reporting environment where accessing a relatively high percentage of data is common.

This demo of indexes vs. a FTS vs. partition pruning shows how a FTS outperforms an index when accessing approximately 1/8 of data in a table but an appropriately partitioned table out performs both options.

More on partitioning in the future.

Comments»

1. Mohamed Houri - August 28, 2008

Richard,

You wrote
partitions can be extremely useful and beneficial when accessing an amount of data somewhere between the two

I believe you should write

Tables appropriately partitioned can be extremely useful…..

Because partitioning not only can not change anything into the performance but when inappropriately implemented it can dramatically alter the performance. You wrote about partition pruning (or partition elimination) this I think happens only when the partition key is in the where clause of your select as it is the case in your demo.

Another thing where partition can be useful then index is when you intend to delete (each month ) the content of a table based on a date for example, it will be better to range partition by this date and to use alter table exchange partition instead of a delete using a b-tree index. Isn’t it?

Regards

Mohamed Houri

Like

2. Robert Klemme - August 28, 2008

Mohamed, absolutely agree! I was going to say the same: dropping a partition is hugely more efficient than deleting via an index with the added benefit that local indexes are gone as well and need not be updated individually.

With that in mind I find the automated partitioning of 11g one of the most practical features for date ranged data: 11g will just create new partitions when needed and so you can ensure that e.g. every month’s data goes into its own partition without having to resort to scheduler jobs to create those partitions ahead of time.

Tables with a lot of concurrent inserts could potentially also benefit from partitioning by ensuring that inserts of different sessions go to different partitions by carefully selecting partitioning scheme and keys.

Like

3. Tony - August 28, 2008

All killer no filler as usual.

It would be nice to have some clarification on the local versus global index on the partitioned table…why did you use local here? When would you not?

I love partitioning, I think it is a great design level performance tool. While Mohamed has a point that the wrong partitioning can be disastrous, I do think it is a little hard to do this wrong.

Like

4. Dan - August 28, 2008

Richard,

Excellent information. Thanks!

Regards,
Dan

Like

5. Nilo Segura - August 28, 2008

Concerning when a partition is better than an index…

We have an application where we replaced several FBIs created on
a table with an status column, for a list partition table using the
same status column as the partitioning key.

This table is heavy on DML operations, and the rows can change from one status to another (so enable row movement is needed).

The indexes created on the status columns could be removed, so less overhead when DMLin on the table. At the same time, we could
start shrinking the table (in 10gR2 it is not possible to shrink a table/partitions that has an FBI).

Now the optimizer knows it has to do an FTS on the partitions, that is why we also shrink them, to keep the number of blocks per partition controlled.

And so far so good performance-wise 🙂

regards.

Like

6. Mohamed Houri - August 29, 2008

Hi Tony,

Local versus global index, sometimes its up to you and sometimes it depends strongly on the choice of both the partition key and the primary key

For example, when the primary key is equal to the partition key (or is a part of it) the index enforcing the primary key uniqueness should be locally partitioned.

However, if you partition by date and you enforce your primary key by and ide, then be carrefull your primary index shoud be global.

I have a preference for a locally partitioned indexes

It is not so hard to partition wrong. As it has been mentioned by Richard partitioning can be used for

1.Better Manageability
2. Better Performance

Partition pruning, thanks to the information in the WHERE clause, is the elimination of some of the partitions from consideration during the select statement execution. In order to benefit from this partition elimination the WHERE clause must reference at least one column from the set of columns that comprise the partition key.

So, If your partition stategy is for Performance but you partition for better Manageability and you choose a technical partition key which is never used as a predicate in your business queries, then you have partitioned wrong as far as performance will certainly suffer from this doing a full partitions scans or a full partition index scan. This is what I did.

Mohamed

Like

7. Robert Klemme - August 29, 2008

Nilo, do rows change status frequently? You said there was a lot of DML but did not indicate the frequency of status changes. If they are frequent then I would be skeptical whether this setup is optimal.

Also, do you do shrinking on a regular basis? If yes, that might again be an indication that your strategy is not optimal.

Cheers

Like

8. Richard Foote - August 31, 2008

Hi Mohamed

You’re of course entitled to your opinion on how I should write however I’m happy with how it currently reads. I said “can be” which suggests it might not always be the case, else I would have written “is always” or some such.

Indexes “can be” useful in improving performance, where it’s kinda obvious this might not be the case if the “appropriate” columns are not indexed.

The other example you raised regarding deleting a whole bunch of rows each month falls directly into the area of better manageability which I specifically said I wasn’t going to touch in this entry.

Partitioning is a huge subject, just want to take it one bite at a time 😉

Like

9. Richard Foote - August 31, 2008

Hi Robert

I agree with the 11g features, very very useful.

Yes partitioning (especially hash) can be useful in spreading load, although how data is to be subsequently accessed needs to be carefully considered as well.

Like

10. Richard Foote - August 31, 2008

Hi Tony

Will cover local vs global indexes one day I promise.

No real reason for going with local in the demo (as the indexes aren’t actually used) except perhaps in this specific example of needing data from just the one partition, the local index could have been the slightly cheaper alternative if it were indeed used as it would have been the only local index structure that needed to be accessed.

Like

11. Richard Foote - August 31, 2008

Hi Nilo

What works, works.

However, note if the status column is frequently updated and required row movement, that’s effectively a delete followed by an insert operation, rather than just a simple update, so some of your index savings would be lost with the associated additional overheads.

Like

12. Nilo Segura - August 31, 2008

Hello,

Indeed, and that was the initial drawback for this approach. However, this drawback is mitigated by the fact that any row can change no more than a fixed number of times its status, and some of partitions contain more than 1 value.

So in the end the row movement was(is) not an issue for this particular app/table. We did manage to attain our goals: keep the size of the table controlled (shrink), quick access to the data, and less indexes to take care of.

So it was the best thing since the last best thing.. 🙂

regards.

Like

13. Nilo Segura - August 31, 2008

Robert

The need to shrink the table, came because this table is very dynamic, with a lot of data that get inserted and deleted (size of the table can increase very much in a short period of time).
The table becomes full of “holes” and this was damaging operations that “legally” had to do FTS (monitoring/reports).

The shrink could only be done during maintenance windows (difficult to schedule), and had to drop the FBI first and recreated them later.

The list partitions replacing the indexes came as a solution for our problem, now the shrink can be done online.

regards.

Like

14. Brian Tkatch - September 2, 2008

Richard, thanx for taking this up. Partitioning vs INDEX is something i’d like to get clear and the little details you point out are just the way to do that.

In the example, the TABLE is both partitioned and INDEXed. Must it have both? If the partition is by range it probably should be. But when if it is a single value?

Also, in the example, the query used was “between ’01-JAN-2003′ and ’31-DEC-2003′;” That would exlude all records from the 31st after 12am. Would the query be better written as “between ’01-JAN-2003′ and ’01-JAN-2004′ AND < ’01-JAN-2004′;”?

FYI, i first asked this question a bit earlier. https://richardfoote.wordpress.com/2007/12/23/merry-christmas-and-a-happy-index-rebuild-free-new-year/#comments. I am ecstatic that you picked it up.

Like

15. Richard Foote - September 3, 2008

Hi Brian

No worries.

No, it doesn’t have to be both indexed and partitioned on the same column(s). In the example you state where there may only be a few values per partition key (perhaps say via a list partition), then there may be no point in indexing the partitioning key as the selectivity may not warrant it.

Correct, it would exclude values after 12am on the 31st but who said I wanted such values 😉

I thought I might just post a few more examples of partition pruning for those who may not be familiar with how the CBO can eliminate partitions.

Like

16. Brian Tkatch - September 3, 2008

Thanx Richard. Examples are always good.

Like

17. Arnab - November 8, 2010

Thanks! Richard for your amazing insights

Like

18. Mustafa Qamar-ud-Din - May 17, 2016

Reblogged this on Qamar-ud-Din.

Like

Richard Foote - May 26, 2016

Hi Mustafa

Why ?

Like

Mustafa Qamar-ud-Din - May 28, 2016

I reblogged it to share it with my colleagues. It’s well explained. Thank you 😉

Like


Leave a comment