Demonizing Foreign Keys (“Helden”) August 25, 2009Posted by Richard Foote in Foreign Keys, Richard's Musings.
My mate Marcel Kratochvil, a fellow Canberran Oracle ACE Director and David Bowie fan, has recently published a paper finally called “Discussing Foreign Keys and their usage in the Database” in which (in his own words) he is “demonizing foreign keys”.
It’s a really good read, well worth the effort, in which covers various interesting topics but primarily questions the general usage of Foreign Keys in the relational (or object-relational) database, especially in the world of multimedia. He’s been arguing against the general use of Foreign Keys in the database for a long time and has finally put down his thoughts on the subject. I would highly recommend giving the paper a read and see if it might alter (or confirm) your own views on the subject.
Although I think Marcel makes many a good point and I agree with much of what he says, I however basically disagree with his general central notion and arguments against Foreign Keys. I guess I still consider myself a “Foreign Key Dinosaur”
These are just some of the comments and feedback I gave Marcel when I was given an early look at the paper. I suggest you read Marcel’s paper before my comments so they make more sense and can be put into context and perspective.
“Let me say that I agree with much of what you say and you make many valid points. Certainly the world of multimedia is one which we at work are slowing moving into and it certainly has many challenges. Now, with respect to the specific issue of FKs in the database, I suspect we’re actually in some ways in agreement. I agree with you that there isn’t necessarily one correct answer for all situations. One of your closing statements:
“So by all means use Foreign Keys, but don’t blindly use them. Use them when it makes sense, when it solves a tuning issue or when maintaining a relationship between tables is crucial to the application running”
is actually pretty well what I think as well. However, I suspect our interpretation of what this actually means differs somewhat with my opinion being that it makes sense to use FKs far more widely and comprehensively than perhaps you suggest in your paper. So our likely differences of opinion lies mainly with the occurrences of when it makes sense to use FKs. However, your paper doesn’t alter my fundamental view that it almost always makes sense to use FKs in the database where possible.
Firstly, many of your comments suggest DBAs generally find FKs problematic, developers like to push them. As a DBA (and ex developer of nearly 10 years), I find FKs to be essential in many of my activities and find they generally assist, rather than impede my tasks. More on this later but the importance of FKs on DBAs activities and responsibilities is paramount.
I don’t think anyone seriously suggests that FKs (or other constraints) ensure data integrity in the general sense. Yes, someone can enter incorrect details or pick the incorrect menu option. However, constraints and FKs specifically are an important method of reducing the occurrence of data integrity related issues that if not implemented in the database, needs likely to be implemented less effectively elsewhere. More on this to come but whenever I mention data integrity later, I mean it only in the sense of having consistent data relationships that follow key business rules.
Examples are good because the strength of the example can often be telling. Take your deleting an account but keep the purchase order example. Tricky scenario but one which simply requires a correct physical model with FKs to support it to start with. If person accounts must be deleted, then either it doesn’t indeed make sense to use a FK in the purchase order table to point to the person account or the FK needs an “on delete set null” type logic or the FK needs to point to a entity details table that contains data that doesn’t have to be deleted, etc.. Will the tax audit really be happy to have purchase orders associated with unknown entities, how can you prove the purchase was legitimate and not a fraudulent tax reduction if the purchase order can reference entities that don’t exist. A entity number that references nothing is meaningless therefore it either needs to reference something else, perhaps sufficient person details to keep both privacy and auditing requirements satisfied, or it should be removed. A FK would be
critical in such an application but it needs to point to the correct entity and/or be maintained appropriately to ensure the business rules are followed.
The issue of “not trusting” developers is an interesting one. I fundamentally don’t trust developers to perform and code FK functionality outside the database for 4 key reasons:
1) Developers come in sorts of shapes and sizes, have all sorts of experiences and come in varying degrees of quality. There are good developers and not so good developers and as I have no control on their hiring and firing, can’t assume they’re necessarily “trustworthy” when it comes to ensuring their code will maintain necessary data relationship integrity. I’ll trust FKs in the database over developers in this respect any day as FKs in the database are far simpler to implement and control.
2) Even the best of the best, the most brilliant of developers is limited by being human. And no matter how good, humans make mistakes. They occasionally write code that might leave orphaned data, they write code that might not take into consideration a specific business rule, they change the business rule in the application but forget to change it correctly in a specific module. A FK in a database doesn’t make a mistake. If it’s there, it’s centralised and it’s enforced. Period.
3) Note the database itself, with all its clever internal code, must still lock tables when a parent record is deleted/updated/merged. Now Oracle internally can get around this if an index exists on the FK because it can in effect check the index structure to ensure no new associated FKs are inserted. This “trick” is only available to the Oracle code, a developer can’t effectively lock a row in an index that doesn’t exist. Therefore, if Oracle itself is forced to lock a table to ensure the relationship between tables remains consistent, a developer must likewise lock a table during any similar operation. The very best of trustworthy developers only really has the choice between locking tables or potentially allowing “dodgy” data in the system. Give me an indexed FK constraint any day as it will fundamentally be more efficient to implement.
4) Even the best developer, writing the most perfect of error proof code, can’t necessarily control how data is accessed and changed in the database. In highly complex environments (note some of our applications have 2,500+ tables), with these users that make mistakes, it’s virtually impossible for the application to cater for every possible data “cleanup” scenario. Therefore, DBAs (in particular) and other power users are frequently required to go in the database via the “backend” and clean things up. I
don’t know of any of our more complex applications, be they in-built or 3rd part apps, where this hasn’t been necessary. Business rules in the application, with FKs effectively managed in the application are of no use when data is not access via the application. And we can’t wait the 6 months it might take to change the application to prevent these scenarios from occurring or to make the application able to deal with the data spill or data “integrity” issue.
Expanding on this point a little, we have many many Oracle applications. These are “real-world”, modern applications. I can tell you without exception, that the most problematic, worst performing, data integrity error filled atrocities of applications we administer have a common characteristic. They don’t have FKs. I don’t have to imagine an application that has no FKs, we have a number of 3rd party apps, running in production right now that have no FKs, ranging from case management systems to CAD systems that I need to deal with every day. Guess what. They have various performance issues AND all manner of data integrity issues, directly as a result of not implementing FKs in the database. I’m sure these software companies trusted their developers … As a humble DBA, we have little control over the adoption of these various apps, although lessons are being learnt and things are slowly changing.
Not only are data integrity issues much more prevalent in these apps, I also don’t have to imagine the nightmare that is trying to resolve data related issues without FKs in the database. If I have to remove (say) an over classified document after its been incorrectly inserted a few weeks ago into the system and subsequently linked and referenced all over the place, it’s extremely difficult to safely remove it. We have no way of knowing the necessary dependencies, if you delete a row here or there, how to ensure there’s no orphaned data left behind as a result and measure the impact in other parts of the apps. Interestingly, even the software vendors themselves struggle and forever provide scripts that don’t clean data out correctly. Application upgrades invariably result in corrupted data relationships.
Our applications with FKs have far less issues and when we do need to clean things up, the FKs both protect us and clearly document what needs to be done. One can’t “accidentally” delete a parent row, leaving orphaned data behind, the enabled FKs won’t let us. We know the impact of changing this column definition or dropping this column, the FKs tell us all the related dependencies. We don’t go in blind, we don’t have to guess, we know. We know that by removing (say) this document from the system, it will impact all these related cases, that this person will no longer have these details recorded against him. And no, having some data integrity issues due to FKs being left out is not acceptable. The implications of having say orphaned data might mean a vital document containing critical intelligence information is inaccessible by the application, resulting in perhaps a fugitive being able to leave the country or conversely, sensitive data still being accessible. The implications here can potentially be life threatening. It doesn’t get any more real world than that and FKs are a vital component in ensuring the correct data is visible when it should be.
When the help desk forwards an ORA-02292 – child record found error or if we get one during a data spill cleanup, unlike some who grumble and complain, I just thank goodness for the FK. Because someone is attempting something that a business rule doesn’t allow and hasn’t thought through the implications. Without the FK in the database, the operation would have succeeded and caused a data integrity issue. No, the child records need to be removed as well or no, the parent record needs to be logically deleted instead or no, you simply can’t now perform that operation due to the live data in the system, etc. The FKs help to protect us from ourselves, be it at the
application level or indeed via any accesses to the database.
Without FKs, one is blind to the relationships and dependencies between objects within the database. The application is not a pretty place to search and neither is the documentation, assuming it exists and is up to date.
Not only is the DBA and others “blind” to these relationships if the FKs are not enabled in the database, but so is the CBO. A point you’ve missed a little in the paper is the performance implications of the CBO not being able to make important assumptions on the data relationships if FKs are missing. It’s not a good idea to hide information from the CBO, it reduces its options and limits its decision making processes. It can’t for example determine and ignore redundant join conditions, it can’t make use of query rewrite opportunities, it can’t assume it can use an existing index or use an efficient join method, etc. And the CBO gets cleverer and can make use of database constraint information more often with each new release. Without enabled FKs, the CBO can’t use this information and might make sub-optimal decisions as a result.
At the end of the day, I agree with your statement that we should use FKs when they “make sense”. It’s just IMHO, they make sense far far more often than you appear to suggest. I work with applications without FKs in the database, it isn’t pretty !!”
Marcel has started a Foreign Key Discussion website where you can find his paper and make comments on the various subject matters discussed. I’m sure Marcel would welcome any such comments and feedback.