Data Democratisation with Deep Learning: The Anatomy of a Natural Language Data Interface

Presented at the 2023 International Conference on Web Search and Data Mining (WSDM)

Abstract

In the age of the Digital Revolution, almost all human activities, from industrial and business operations to medical and academic research, are reliant on the constant integration and utilisation of ever-increasing volumes of data. However, the explosive volume and complexity of data makes querying and exploration challenging even for experts, and makes the need to democratise the access to data, even for non-technical users, all the more evident. It is time to lift all technical barriers, by empowering users to access relational databases through conversation. We consider 3 main research areas that a natural language data interface is based on: Text-to-SQL, SQL-to-Text, and Data-to-Text. The purpose of this tutorial is a deep dive into these areas, covering state-of-the-art techniques and models, and explaining how the progress in the deep learning field has led to impressive advancements. We will present benchmarks that sparked research and competition, and discuss open problems and research opportunities with one of the most important challenges being the integration of these 3 research areas into one conversational system.

Outline

  1. Text-to-SQL
    • The Text-to-SQL problem
    • Benchmarks
    • A Taxonomy for Deep Learning Text-to-SQL Systems
    • Key Systems
    • Research Challenges
  2. SQL-to-Text
    • The SQL-to-Text problem
    • Challenges
    • Key Systems
    • Research Challenges
  3. Data-to-Text
    • What is Data-to-Text
    • Subfields of Data-to-Text
    • Table-to-Text
    • Graph-to-Text
    • Evaluation
    • Research Challenges
  4. Bringing it all together
    • What do we mean?
    • Why is it not trivial?
    • Challenges
    • Demo

Presenters

Material

Feel free to download the slides of the tutorial here.