NLP To SQL

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Technical Perspective:

Natural Language to SQL Translation by Iteratively


Exploring a Middle Ground

Jeffrey F. Naughton
University of Wisconsin–Madison
Madison WI, U.S.A.

A fundamental question in data management is how re- iterative process?


lational database management systems (RDBMSs) should Answering this question is difficult. Unlike the case for
be queried. Ideally, the query interface should be powerful keyword search systems, the answer to the query may not
enough to express arbitrary queries, yet simple enough to help the user know if the executed query was what they re-
learn that users require virtually no training. Natural lan- ally wanted. For example, consider the simple query “find
guage is an obvious and appealing approach – presumably the difference between sales this year and last year.” In gen-
most users already know at least one natural language and eral the RDBMS will return a number – and it is very hard
use it to “query” other humans constantly. Unfortunately, to tell just from that number if the query was correct or not.
employing natural language to query RDBMSs is highly non- It would be far more precise for the system to respond to the
trivial, and for the most part, not used. However, with the user by presenting the generated SQL query itself. But this
growing power and ubiquity of Natural Language Processing would require the person posing the natural language query
(NLP) systems, it makes sense to redouble efforts in apply- to be able to read and understand SQL, which contradicts
ing NLP to database querying. a major motivation for the system in the first place.
At the most basic level, relational database systems are Now we come to what is perhaps the heart of this paper:
queried using SQL. (For that matter, most “NoSQL” sys- the decision to adopt an intermediate language the authors
tems are also queried using SQL.) SQL is very powerful and call “Query Tree,” a two-way domain-independent communi-
precise, and, for novices, very hard to write. So SQL can- cation model allowing the user and system to understand one
not be used as a user interface for anyone but power users. other. A query tree aids mapping a user query to its corre-
Nonetheless, as the most widely used RDBMS query lan- sponding semantically correct SQL and translating a query
guage, SQL is the most natural language into which to trans- plan to its corresponding natural language interpretation.
late natural language questions over relational data. This The authors harness the schema knowledge, schema-driven
translation is the focus of the following paper, “Understand- similarity metrics, query tree reformulation and ranking to
ing Natural Language Queries over Relational Databases”, make the problem tractable for the system and the user.
by Li and Jagadish. The authors close with a user study evaluating the ap-
The first important decision made by the authors of this proach. The user study itself is interesting, including the
paper is to reject a one-shot, one-way translation process aspect of using Chinese to convey the queries to the sub-
from a natural language query to a corresponding SQL query. jects instead of English to avoid bias through the phrasing
Instead, the authors advocate an iterative dialog between in the query description (presumably the subjects already
the person posing the query and the system building the re- spoke Chinese!) The experiments show that the approach is
lational query. This makes perfect sense – even in the much best for simple to medium complexity queries.
simpler world of keyword search systems, users iteratively This paper represent a significant improvement in the
refine their queries. Unfortunately, adopting this approach state of the art, and it is an ideal springboard for future
for RDBMS querying does not yield an easy problem – in advances. In an area as difficult and important as natu-
fact, it uncovers a highly interesting and difficult challenge: ral language querying of relational database systems, this is
how should the user and the system communicate in this indeed a major contribution.

SIGMOD Record, March 2016 (Vol. 45, No. 1) 5

You might also like