SQL's Three-Valued Logic and Certain Answers
The standard theoretical approach to evaluating queries on incomplete databases is to compute certain answers. Their main drawback is very high complexity of query evaluation. In practice, SQL relies on three-valued logic for efficiently evaluating queries on databases with nulls, but this cannot produce certain answers due to the
If query evaluation differs from certain answers, it can generate either false positives (answers which are not certain) or false negatives (i.e., miss some of the certain answers). The problem with SQL is that it produces both. Since we cannot eliminate both and have an efficient procedure at the same time, the best we can hope for is eliminate one kind of wrong answers entirely - then at least the user knows in which way not to trust SQL.
We show that it is surprisingly simple to fix SQL's three-valued logic to eliminate false positives (or false negatives by a dual procedure). In the process of doing so we also introduce a new notion of certain answers with nulls, which properly accounts for queries returning tuples containing null values.