SQUARES : A SQL Synthesizer Using Query Reverse Engineering

Abstract

Nowadays, with the massive amount of data that data analysts have to deal with, they frequently find tables with interesting data and they do not know how these tables were generated from a database. Hence, there is an increasing need for systems capable of solving the problem of Query Reverse Engi- neering (QRE). Given a database D and an output table Q(D), these systems have to find a query Q, such that, when running Q on D, the result is equal to Q(D). QRE is a subfield of Program Synthesis. The goal of Program Synthesis is to automatically generate programs that satisfy a given high-level specification. Since the 60’s, Program Synthesis is a well-studied problem, and has been considered the Holy Grail of Computer Science. Until now, program synthesizers have been using a single tree representation to represent programs. We propose a novel enumeration-based SQL synthesizer SQUARES, that uses a new line representation where we represent each program line with its own subtree. Experimental results on the synthesis of SQL queries, show that the proposed line-based encoding allows a faster enumeration of programs when compared to the usual tree-based encoding. Moreover, while the tree-based encoding does not scale beyond a small number of operations, the new line-based encoding allows finding programs with a larger sequence of operations. Experimental results on the synthesis of SQL queries from OutSystems show that SQUARES outperforms Scythe, a state-of-the-art SQL synthesizer.

Publication
SQUARES : A SQL Synthesizer Using Query Reverse Engineering