Wrong-errors bugs: a new class of bug?

Position:Database Technique
 
FREE EXCERPT

Dan Tow

Above all else, we count on databases to reflect the truth consistently, or at least to reflect the table data perfectly. The database cannot be blamed when an application (or the end users of an application) place inaccurate data in its tables, but a database must accurately report the data it holds. Therefore, bugs are not all created equal; there are bugs, and there are wrong-rows bugs, bugs that silently misrepresent the data that the tables hold. Even the craziest, most obscure comer case that potentially misrepresents your data should rightly bring a loud chorus: "The emperor has no clothes!' We depend on the database, above all, not to lie.

Unnecessary errors are comparatively benign. If the database returns an unnecessary error, at least we haven't been lied to, and we know that we need to take corrective action. A wrong-rows bug might lead to months, even years, of expensive business mistakes before anyone notices. The database vendors rightly attach so much importance to fixing wrong-rows bugs that when a query returns rows other than those needed by the application, it almost always indicates a mistake in the application SQL, not in the RDBMS vendor code!

Setting aside the wrong-rows bugs, most ordinary RDBMS bugs consist of the database returning an unnecessary error; for example, disconnecting your session without good cause. Not all errors come from RDBMS bugs, however. For example, if you select an expression that calls for division by zero, the database is supposed to return an error.

A New Class of Bug

In this article, I propose the recognition of a new class of bug, a class that is not generally considered a bug at all. Specifically, I propose that errors such as attempted conversions of unconvertible values or division-by-zero should appear only when absolutely necessary, when any execution plan conceivable would encounter the error An SQL statement that returned an unnecessary error (an error that would not result from every conceivable path to the data) would be guilty of this new class of bug, a wrong-errors bug. A common means to demonstrate a wrong-rows bug is simply to show that the same query returns different rows depending on which execution plan it follows. Except for unusual, non-relational operators such as the Oracle ROWNUM operator, we recognize that however much developers might argue what rows a query should return, there must surely be just one correct result, not one result when the database follows one path, and another result when it follows another path! I propose that function errors, such as attempted conversions of unconvertible values or division- by-zero, ought to be equally consistent! Furthermore, I propose that these wrong-errors bugs are nearly as serious as wrong-rows bugs, even though the current standards do not treat them as bugs at all! I have two lines of argument for why these ought to be considered bugs, and why they are so serious:

  1. The idealistic argument: Errors are just as important a part of the result of a query as the rows returned. Whether a query returns an error should be just as well defined by the SQL standard as which rows the query returns, and just as independent of the path the optimser chooses to reach the data!

  2. The pragmatic argument: Execution plans change all the time, unpredictably and it is very costly to have a missioncritical process suddenly, mysteriously error out three hours before your quarter close! It's extraordinarily hard to test for a problem that might only occur long after release, when some customer encounters a data distribution that leads to an execution plan that reaches a row that triggers an error.

    Since the behavior I propose is new and subtly inconsistent with past accepted database behavior, instantly classifying the old behavior as buggy may be asking a lot. I concede that, at best we'll need a clean migration path, perhaps a SQL standard calling for the new behavior, and configuration parameters that would allow developers to choose to keep the old behavior if they felt it important. However, I believe that our past acceptance of the old behavior was more historical accident than anything else, and that if we'd begun with the database behavior I propose, we'd view failure to follow that behavior as a serious bug.

    An Example

    Here's a manufactured example of the problem, on Oracle, to make this concrete:

    SELECT O.Status, O.Order_Date, O.Order_ID, O.Customer_ID, E.Last_Name, E. First_Name, E. Phone_Extension FROM Orders O, Ext_Expediters E WHERE (O.Customer_ID=:1 OR :1 IS NULL) AND O.Char_Extension_Col01=E.Numerical_Expediter_ID AND (O.Expediter_ID=:2 OR :2 IS NULL) ORDER BY E.Last_Name, E.First_Name, E.Phone_Extension, O.Status, O.Order_Date This is a custom query joining a packaged Orders table to a custom table created by the IT department to add expediting functionality to the packaged application. The optimal order of the join would depend on which of the two bind variables, :1 or :2, was provided with a non-null value. Given different data...

To continue reading

REQUEST YOUR TRIAL