jump to navigation

Demonizing Foreign Keys (“Helden”) August 25, 2009

Posted by Richard Foote in Foreign Keys, Richard's Musings.
trackback

My mate Marcel Kratochvil, a fellow Canberran Oracle ACE Director and David Bowie fan, has recently published a paper finally called “Discussing Foreign Keys and their usage in the Database” in which (in his own words) he is “demonizing foreign keys”.

It’s a really good read, well worth the effort, in which covers various interesting topics but primarily questions the general usage of Foreign Keys in the relational (or object-relational) database, especially in the world of multimedia. He’s been arguing against the general use of Foreign Keys in the database for a long time and has finally put down his thoughts on the subject. I would highly recommend giving the paper a read and see if it might alter (or confirm) your own views on the subject.

Although I think Marcel makes many a good point and I agree with much of what he says, I however basically disagree with his general central notion and arguments against Foreign Keys. I guess I still consider myself a “Foreign Key Dinosaur” 🙂

These are just some of the comments and feedback I gave Marcel when I was given an early look at the paper. I suggest you read Marcel’s paper before my comments so they make more sense and can be put into context and perspective.

 

“Let me say that I agree with much of what you say and you make many valid points. Certainly the world of multimedia is one which we at work are slowing moving into and it certainly has many challenges. Now, with respect to the specific issue of FKs in the database, I suspect we’re actually in some ways in agreement. I agree with you that there isn’t necessarily one correct answer for all situations. One of your closing statements:

So by all means use Foreign Keys, but don’t blindly use them. Use them when it makes sense, when it solves a tuning issue or when maintaining a relationship between tables is crucial to the application running

is actually pretty well what I think as well. However, I suspect our interpretation of what this actually means differs somewhat with my opinion being that it makes sense to use FKs far more widely and comprehensively than perhaps you suggest in your paper. So our likely differences of opinion lies mainly with the occurrences of when it makes sense to use FKs. However, your paper doesn’t alter my fundamental view that it almost always makes sense to use FKs in the database where possible.

Firstly, many of your comments suggest DBAs generally find FKs problematic, developers like to push them. As a DBA (and ex developer of nearly 10 years), I find FKs to be essential in many of my activities and find they generally assist, rather than impede my tasks. More on this later but the importance of FKs on DBAs activities and responsibilities is paramount.

I don’t think anyone seriously suggests that FKs (or other constraints) ensure data integrity in the general sense. Yes, someone can enter incorrect details or pick the incorrect menu option. However, constraints and FKs specifically are an important method of reducing the occurrence of data integrity related issues that if not implemented in the database, needs likely to be implemented less effectively elsewhere. More on this to come but whenever I mention data integrity later, I mean it only in the sense of having consistent data relationships that follow key business rules.

Examples are good because the strength of the example can often be telling. Take your deleting an account but keep the purchase order example. Tricky scenario but one which simply requires a correct physical model with FKs to support it to start with. If person accounts must be deleted, then either it doesn’t indeed make sense to use a FK in the purchase order table to point to the person account or the FK needs an “on delete set null” type logic or the FK needs to point to a entity details table that contains data that doesn’t have to be deleted, etc.. Will the tax audit really be happy to have purchase orders associated with unknown entities, how can you prove the purchase was legitimate and not a fraudulent tax reduction if the purchase order can reference entities that don’t exist. A entity number that references nothing is meaningless therefore it either needs to reference something else, perhaps sufficient person details to keep both privacy and auditing requirements satisfied, or it should be removed. A FK would be
critical in such an application but it needs to point to the correct entity and/or be maintained appropriately to ensure the business rules are followed.

The issue of “not trusting” developers is an interesting one. I fundamentally don’t trust developers to perform and code FK functionality outside the database for 4 key reasons:

1) Developers come in sorts of shapes and sizes, have all sorts of experiences and come in varying degrees of quality. There are good developers and not so good developers and as I have no control on their hiring and firing, can’t assume they’re necessarily “trustworthy” when it comes to ensuring their code will maintain necessary data relationship integrity. I’ll trust FKs in the database over developers in this respect any day as FKs in the database are far simpler to implement and control.

2) Even the best of the best, the most brilliant of developers is limited by being human. And no matter how good, humans make mistakes. They occasionally write code that might leave orphaned data, they write code that might not take into consideration a specific business rule, they change the business rule in the application but forget to change it correctly in a specific module. A FK in a database doesn’t make a mistake. If it’s there, it’s centralised and it’s enforced. Period.

3) Note the database itself, with all its clever internal code, must still lock tables when a parent record is deleted/updated/merged. Now Oracle internally can get around this if an index exists on the FK because it can in effect check the index structure to ensure no new associated FKs are inserted. This “trick” is only available to the Oracle code, a developer can’t effectively lock a row in an index that doesn’t exist. Therefore, if Oracle itself is forced to lock a table to ensure the relationship between tables remains consistent, a developer must likewise lock a table during any similar operation. The very best of trustworthy developers only really has the choice between locking tables or potentially allowing “dodgy” data in the system. Give me an indexed FK constraint any day as it will fundamentally be more efficient to implement.

4) Even the best developer, writing the most perfect of error proof code, can’t necessarily control how data is accessed and changed in the database. In highly complex environments (note some of our applications have 2,500+ tables), with these users that make mistakes, it’s virtually impossible for the application to cater for every possible data “cleanup” scenario. Therefore, DBAs (in particular) and other power users are frequently required to go in the database via the “backend” and clean things up. I
don’t know of any of our more complex applications, be they in-built or 3rd part apps, where this hasn’t been necessary. Business rules in the application, with FKs effectively managed in the application are of no use when data is not access via the application. And we can’t wait the 6 months it might take to change the application to prevent these scenarios from occurring or to make the application able to deal with the data spill or data “integrity” issue.

Expanding on this point a little, we have many many Oracle applications. These are “real-world”, modern applications. I can tell you without exception, that the most problematic, worst performing, data integrity error filled atrocities of applications we administer have a common characteristic. They don’t have FKs. I don’t have to imagine an application that has no FKs, we have a number of 3rd party apps, running in production right now that have no FKs, ranging from case management systems to CAD systems that I need to deal with every day. Guess what. They have various performance issues AND all manner of data integrity issues, directly as a result of not implementing FKs in the database. I’m sure these software companies trusted their developers … As a humble DBA, we have little control over the adoption of these various apps, although lessons are being learnt and things are slowly changing.

Not only are data integrity issues much more prevalent in these apps, I also don’t have to imagine the nightmare that is trying to resolve data related issues without FKs in the database. If I have to remove (say) an over classified document after its been incorrectly inserted a few weeks ago into the system and subsequently linked and referenced all over the place, it’s extremely difficult to safely remove it. We have no way of knowing the necessary dependencies, if you delete a row here or there, how to ensure there’s no orphaned data left behind as a result and measure the impact in other parts of the apps. Interestingly, even the software vendors themselves struggle and forever provide scripts that don’t clean data out correctly. Application upgrades invariably result in corrupted data relationships.

Our applications with FKs have far less issues and when we do need to clean things up, the FKs both protect us and clearly document what needs to be done. One can’t “accidentally” delete a parent row, leaving orphaned data behind, the enabled FKs won’t let us. We know the impact of changing this column definition or dropping this column, the FKs tell us all the related dependencies. We don’t go in blind, we don’t have to guess, we know. We know that by removing (say) this document from the system, it will impact all these related cases, that this person will no longer have these details recorded against him. And no, having some data integrity issues due to FKs being left out is not acceptable. The implications of having say orphaned data might mean a vital document containing critical intelligence information is inaccessible by the application, resulting in perhaps a fugitive being able to leave the country or conversely, sensitive data still being accessible. The implications here can potentially be life threatening. It doesn’t get any more real world than that and FKs are a vital component in ensuring the correct data is visible when it should be.

When the help desk forwards an ORA-02292 – child record found error or if we get one during a data spill cleanup, unlike some who grumble and complain, I just thank goodness for the FK. Because someone is attempting something that a business rule doesn’t allow and hasn’t thought through the implications. Without the FK in the database, the operation would have succeeded and caused a data integrity issue. No, the child records need to be removed as well or no, the parent record needs to be logically deleted instead or no, you simply can’t now perform that operation due to the live data in the system, etc. The FKs help to protect us from ourselves, be it at the
application level or indeed via any accesses to the database.
 
Without FKs, one is blind to the relationships and dependencies between objects within the database. The application is not a pretty place to search and neither is the documentation, assuming it exists and is up to date.

Not only is the DBA and others “blind” to these relationships if the FKs are not enabled in the database, but so is the CBO. A point you’ve missed a little in the paper is the performance implications of the CBO not being able to make important assumptions on the data relationships if FKs are missing. It’s not a good idea to hide information from the CBO, it reduces its options and limits its decision making processes. It can’t for example determine and ignore redundant join conditions, it can’t make use of query rewrite opportunities, it can’t assume it can use an existing index or use an efficient join method, etc. And the CBO gets cleverer and can make use of database constraint information more often with each new release. Without enabled FKs, the CBO can’t use this information and might make sub-optimal decisions as a result.

At the end of the day, I agree with your statement that we should use FKs when they “make sense”. It’s just IMHO, they make sense far far more often than you appear to suggest. I work with applications without FKs in the database, it isn’t pretty !!”

 

Marcel has started a Foreign Key Discussion website where you can find his paper and make comments on the various subject matters discussed. I’m sure Marcel would welcome any such comments and feedback.

Comments»

1. CJ - August 26, 2009

Hi Richard,

I agree with what you have said. I think it been better for the completeness of discussion if you had also mentioned data warehousing environments in the discussion. I think D/W database is one place where it might be acceptable not to have any FK constraints (or to have them in disable novalidate mode)

Regards.

Like

Richard Foote - August 29, 2009

Hi CJ

Yes indeed, FK enforcement can be relaxed in DW environments where the FK relationships are guaranteed anyways because they’re enforced in the source databases. However, the consequences on the CBO need to be carefully considered as well in that by removing FKs you also remove potentially useful smarts in the CBO as well.

BTW, my comments weren’t meant to be a “complete discussion”, in fact I only included some of the overall feedback I gave to Marcel.

Like

2. dbdvl - August 26, 2009

Richard,

I agree 100% with your points as my experience as a DBA for over 20 yrs follows yours. I have had this argument more times than I can count with development teams and their managment. Invariably when FK’s are not implemented it leads to grief in the DBA team. And the most distressing part of this discussion is that I find myself having to have it more and more when setting up new databases. In our profession, sadly, it appears that developers ( some, not all 🙂 ) are not learning from the lessons of developers past and seem to want to reinvent lessons learned again and again.
Thanks again for an insightful and informative post.

Like

Richard Foote - August 29, 2009

Yes, I’m not entirely sure we as a profession learn from our mistakes as much as we should.

I know the horrors of a database without FKs because we have a number of 3rd party applications that we have to subsequently support that don’t have a FK in sight. Unfortunately without FKs to do the policing, the applications does a lousy job and data anarchy results !!

Like

3. Daniel Stolf - August 27, 2009

Richard,

I completly agree with you.

There are SOME cases where not having a foreign key may be good, but to infeer that the whole relational model is broken is a bit far fetched.

I have seen some DML intensive DW enviroments in wich it was better to put the constraint aside, because the performance gained on select statements by the FK wouldn’t overcome the issues there would be on inserts and updates.

In that case, the dimensions were (semi-)open and it was guaranteed that if we were to insert an entry into the fact table, we would first verify the dimension and, if it wasn’t there, we would insert into it as well. In the case of closed dimensions, there would be a “No Information” entry, or something like that.

Plus, as you said, I haven’t seen a single example in the fine article that proves the relational model to be fundamentaly broken.

The multimedia example:

“The first is the delayed delete and the
second concerns Internet replication. Lets setup a
multimedia application. In it I have a collection, and
a collection contains images and video. Lets say that
one collection contains hundreds of thousands of
images. There is a one to many relationship between
the collection and image table. A primary key in
collection maps to a foreign key in the image table.
Now lets say we need to delete the collection (in a
real life scenario the collection could represent an
area in a museum that has been made redundant).
To delete the collection is simple, but to then delete
all the hundreds of thousands of images isn’t (one
could state that with Securefiles in Oracle11G it is
faster, but lets assume its an expensive I/O). Should
the delete all be done in one transaction and done immediately?

A solution, and a fair solution (but not the
only solution) is to perform a delayed delete. In this
case the collection parent record is deleted, then a
batch job runs and deletes the records in the image
table with a delay in between. So for the duration of
the delete (which could be days depending on
performance), during this time the foreign key
integrity rule is broken. We have orphaned child
records.”

If you are going to flag the child rows for delayed delete, why not flag the parent row as well?

On top of that, if one is trying to convince me, one shouldn’t call me a a daltonic jurassic lizard (a dinosaur with a black and white point of view) 😛

Like

Marcel Kratochvil - October 12, 2009

Daniel, As I point out in the paper, I infer the relational model is “broken” because it cannot address multimedia and other binary data types. As I hope I have shown in the paper, multimedia is a superset of data and not a subset, which is how people currently treat it. As such relational is a data model which is only useful for dealing with subset of all the data. Once you set your world view to only be relational data then its hard to argue against Foreign Keys. Once you change your world view to include multimedia and most other data types then you will see how limiting the relational model is.

Like

4. Richard Foote - August 29, 2009

Hi Daniel

IMHO, the relational model certainly isn’t broken, it just gets misunderstood at times 🙂

I believe recent research suggests dinosaurs became extinct because they all took up smoking:

http://cgi.ebay.com.my/Far-Side-Gary-Larson-Postcard-Dinosaurs-Smoking-Extinct_W0QQcmdZViewItemQQitemZ350215807336

Like

Marcel Kratochvil - October 12, 2009

My apologies if you believe I implied you are a dinosaur. My exact quote is “Blind adherence to the wonderment of Foreign Keys to me says, you are blinkered, uncompromising and a
dinosaur.”. Based on the points raised on this blog, I don’t see anyone following Foreign Keys with Blind Adherence.

I do though see an inherent bias in the field when it comes to the Relational Model. Its a monumental task to even raise the idea that the world view regarding Relational is dated. This is why the paper is so long. I felt I had to raise discussion points from a variety of angles, hoping that one or more would bring home the point. The more allied you are to the relational model (and only the relational model – that’s important to stress), the greater the resistance to accepting any change.

I argue this with Richard and others, but I now see and am convinced that the most DBAs in the Oracle community are stuck in their comfort zone and are resistant to change. Without competition from other database vendors, no competing or major new functional changes are occurring. In the 1990’s with Sybase, Informix, DB2, SQLServer, vendors were leap frogging each other with new ideas and directions. The database was always changing. Now its very rare to see anything new. Even 11gR2 is a minor improvement over 11gR1, which was a minor improvement over 10 (which was a minor improvement over 9).

When you start to look at the database beyond Relational, and include Multimedia and other types in it, you see that even though the Oracle database is the most advanced, it is still primitive and lacking major core components for handling correctly these types of data. Much change is needed, and yet there is a fundamental focus on only relational.

For example, how many optimizer features are there in the database for multimedia? Basically none. Domain indexes are the closest, but the optimizer is based around dealing with relational data only, not any data. Its in its infancy. How smart is the optimizer in dealing with queries when it comes to the the public internet where the bottleneck is the pipe to the outside world? Its geared around the server (memory, I/O, CPU), not the network. There isn’t even a feature to enable the ability to throttle the delivery of multimedia over the pipe. The focus for the optimizer in these areas is missing, its a massive gaping hole (another example, how smart is the optimizer for dealing with one row that is 80Gb in size?)

So I focus on Foreign Keys, I focus on relational, and I am trying to say, wake up everyone and realise there is a bigger world out there, one with new issues, major issues and the database at the moment isn’t anywhere near powerful enough to deal with it. It should be, but its inadequate.

So relational is dated, and yes I might be accused of misrepresenting or not understanding it right, but so far no one has yet answered anything of the issues raised from a multimedia (non relational data) view. The comments raised on this blog are from a view point of people thinking only about relational data, and not the bigger picture. And yes, if you only focus on relational and come up with examples of only relational, then the relational model looks good. Change your viewpoint to multimedia and other data types, and relational looks very bad.

Case in point. Data warehousing. Who in the data warehousing community has every thought of storing multimedia in a warehouse, effectively creating an image warehouse? I do this for a living, and I will tell you that none of the standard relational warehouse rules make sense. Its a different playing field.

Like

5. Jonathan Lewis - August 30, 2009

I marked this article for a Burleson award by the time I was half way down page two. Unfortunately I only got to the middle of page three before I decided that there was no point in wasting any more time on it.

I checked the blog, and this comment from Marcel seems to sum things up nicely:
I have picked on foreign keys deliberately. By showing that components of them are potentially flawed, and by showing that they are not the perfect solution to solving integrity and performance, then I have been hoping to highlight that the relational model is dated and needs changing.

This looks like the type of statement usually made by someone who doesn’t understand the relational model and therefore says it must be wrong.

For starters:
a) the relational model says nothing about physical implementation, so can say nothing about performance. Any arguments based on physical implementation are clearly irrelevant.

b) the relational model includes the concept of “constraints”. The foreign key is just one type of constraint, so it ought to be obvious that foreign keys on their own are not a perfect solution to integrity (for which one should read “internal consistency”).

Like

Richard Foote - September 1, 2009

Hi Jonathan

A Burleson Award, I love the idea 🙂

Marcel I’m sure would be thrilled (not) !!

Like

Andy - September 2, 2009

Burleson Award???????????????????????????

LMAO!!!!!

I propose we make the word ‘Burleson’ a verb/adjective as well. 😉

Like

Marcel Kratochvil - October 12, 2009

Jonathon, am happy to have the rotten tomatoes thrown at me (and I knew they would when this controversial topic was raised), but its obvious you completely missed the point behind the paper with your response especially as the actual good stuff only started on page four.
I am battling an industry that is so focused on only relational data that they have completely missed the richness of multimedia. If you have done serious work with binary data (and other non relational data like XML), you would be in a better position to appreciate the points I have raised.
Richard raised very good points in his synopsis, and this whole paper started when I was debating Richard over lunch with this topic. I address his points in later pages in the paper. I would like to have a more open debate on this because its an important topic and am happy to debate/discuss this further.
If I can’t defend the arguments then I will have to eat humble pie and accept this ignoble award. I am sure that Richard would love to present it to me at this time.

Like

6. Richard Foote - September 6, 2009

Hi Andy

Oh believe me, it’s already used as a verb/adjective many a time 🙂

Like

7. Andy - September 7, 2009

Dang! I should have filed for a patent earlier…. :p

Anyway, hope you had enjoyed the humid weather in Kuala Lumpur.

Like

Richard Foote - September 7, 2009

Hi Andy

I had a great time in Kuala Lumpur thank you. It was certainly a little warmer than it is at home at the moment. See my latest post for some of the details 🙂

Like

8. Hans Forbrich - September 8, 2009

Agreed – I’ve often stated “use the database for what it is designed to do. It’s not just a data dump, it’s a data management system designed to protect the data” and integrity is one aspect that must be protected.

Unfortunately many people (usually developers) tend to misunderstand what a rich tool they have at their disposal in the RDBMS. One symptom is that they keep reinventing the wheel and adding layer upon layer of obscurity to circumvent problems they themselves create. For example, it seems TopLink was invented to avoid learning SQL. 😉

Back to FKs – one things that has often bothered me is that the FK concept seems to be overloaded. They are used 1) to define a parent-child relationship and 2) to enforce a domain (code lookup). I suspect that the optimizer could actually do different things if it was aware of the distinction.

Like

Richard Foote - September 10, 2009

Hi Hans

Nice to hear from you !!

And I in turn agree with your comments.

Oracle can do some clever things now with FKs defined. For example, if you perform an inner join to a parent table but you don’t actually reference anything from the parent table, then Oracle can simple ignore the join condition and not even access the parent table as it’s redundent as all FK values must exist in the parent table.

Are you going to Open World this year ?

Like

Hans Forbrich - September 11, 2009

Sure am. Looking forward to seeing you and all.

Like

Jonathan Lewis - September 14, 2009

Hans,
Your complaint about “overloading” is the wrong complaint. You should be complaining that Oracle does not allow you to define domains, particularly enumerated domains, forcing you to adopt some other approach. The “foreign key” approach is a simple-minded substitute, although check constraints and user-defined types may also work.

Don’t complain too much about the lack of domains though, or Oracle might implement them — by doing what most people already do but hiding the infrastructure so you can’t tune it.

Like

Richard Foote - September 16, 2009

Hi Jonathan

Very good point !!

Like

Marcel Kratochvil - October 12, 2009

Hans, am going to get pedantic here, but it highlights part of the view I am trying to combat. Oracle is not an RDBMS (R being Relational). Its more than that. Oracle doesn’t even describe themselves as an RDBMS anymore. Oracle handles Objects, Multimedia, XML and other data which are not part of the relational model. Oracle can conform and be limited to supporting only the relational data, but that is a subset of what the database can do.

Like

9. Richard Foote - September 16, 2009

Hi Hans

Great, hope to catch up. I have something on each and every evening this year so it’s going to all be a bit hectic but I’m really looking forward to it all 🙂

Like

10. Daniel Stolf - September 19, 2009

Funny thing, I thought just the same as Jonathan Lewis when reading the article… Something like “Hmmm It’s just too Burleson like”. But, you know, I’m new and just starting at this, I’ll not be the one to say that…

Like

Richard Foote - September 23, 2009

Hi Daniel

I’ll be catching up with Marcel at Open World. I just can’t wait to tell him his paper has been compared to the writings of Burleson. I want to take a photo to capture the expression on his face 😉

Like

Marcel Kratochvil - October 12, 2009

Richard did tell me, and he didn’t have a camera at the time, but the reaction made him happy.

I would say that this is dangerous territory that people are moving down. If you want to squash an argument you passionately disagree with, just by saying “its like a another persons paper which we don’t like” then you are preventing new ideas from emerging and limiting open discussions on it.
Richard knows how much effort I put into this paper whilst trying to avoid these comments (he knows my greatest fear was that it would be interpretted wrongly). This is why the first round of the paper went to peer review amongst ten people with experience in this field (Richard being one of them). Only when the comments from the first round that came back were that its not a career killer, did I make it public. How many people who write technical papers actually do this?

This paper wasn’t written overnight and has been in progress for over 10yrs and is based on active experience across a large range of industries.

Am happy to discuss all points raised on the blog, but I do insist on people reading the whole paper, because only then will I hope people get a good view of the actual points I am trying to raise.

Like


Leave a comment