I have written a blog post as a summary of my PhD thesis. In my thesis I explored how to learn from feedback given to model outputs when the collection of direct supervision signals is too costly. I also built a natural language interface to the geographical database OpenStreetMap.”

Teaser

“We all need people who will give us feedback. That’s how we improve.” - Bill Gates, TED Talks Education, May 2013

Motivation

We all know that supervised data is expensive to obtain. So let’s ask the following question: What if we learn from feedback given to model outputs instead?

Next to reducing the requirement for supervised data, learning from feedback also has several other advantages:

  • Even if supervised data is given, we want to also discover alternative good outputs.
  • With feedback given to model outputs, we can improve over time.
  • It is possible to personalise a system to a specific use case or user.

For these reasons, I explored how to learn from feedback for sequence-to-sequence tasks in NLP in my PhD thesis.

The scenario I assume in my thesis can be summarized with the following picture:

problem_overview

A pre-trained model receives an input for which it produces one or several outputs. An output is grounded in a given external world which assigns some feedback to it. The feedback is then used to update the pre-trained model.

Full story.