This week, I tried to decide between taking an existing dataset for question answering for financial documents vs. making a new dataset.

I gathered some pros and cons and identified potential datasets. Building a new dataset from scratch requires effort. But it can also be the basis for a publication. Also, there does not seem to be a dataset that fits my research question.

My supervisor and I will discuss the results and decide in the upcoming week.

You can follow these updates: Substack Blog Telegram WhatsApp LinkedIn Medium Twitter Calendly

This large screen makes me more productive. This large screen makes me more productive.

What Happened Since Last Week?

I finished the other half of the book “How to Write and Publish a Scientific Paper” by Robert Day. Good book.

My supervisor suggested a dataset that I might use for benchmarking my algorithms. If this dataset is suitable for my research, it can save me much time as I would not have to create a dataset on my own. I reflected on this dataset and am not sure if it is suitable for me. I will discuss this with my supervisor on Tuesday.

My colleague Thomas Huber held a presentation at the Chair of Data Science and NLP about the introspection of transformer-based language models [LM-Debugger - An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models (Geva et al, 2022)]. The paper presents a method to accurately change the behavior of language models for specific prompts (among other research contributions).

What Were the Biggest Obstacles?

No major obstacles. I left the phone at home again today, and it was great.

Which Goals Did I Meet?

  1. Write one section for the first paper. The first paper will be a literature review/state-of-the-art.
  2. Identify a conference for the state-of-the-art.

Which Goals Did I Miss?

  1. Align the outlet (conference) with my supervisor. (I.e., ask him if he likes the conference and thinks that it fits my research question.

Was It a Good Week?

Yes. Everything starts falling into place, and I have a clearer view of the literature and what I want to write about.

Short-Term Tasks for The Coming Week

  1. Align the outlet (conference) with my supervisor. (I.e., ask him if he likes the conference and thinks that it fits my research question.
  2. Decide on whether to prepare a dataset ourselves or take an existing dataset.

About “75-Step Journey Toward a Ph.D. in Natural Language Processing”

You will, from now on, witness my grind. Feel my blood, sweat, and tears.

With this series of articles, you become a real-life weekly witness of my dissertation progress, all in 75 steps. This has multiple purposes:

1) Forcing myself to keep moving through the power of public shame!

2) Helping other (prospective) Ph.D. students to stay motivated and to show that hard times are normal when going through this process.

3) Getting support from the community when I go through hard times.

Share this with your Ph.D. student friends: Substack Blog Telegram WhatsApp LinkedIn Medium Twitter Calendly.

Read More From the 75 Steps Toward a Ph.D. in NLP Series

2022-08-20: Update 1/75 - Kicking Off the Journey Toward a Ph.D. in NLP

2022-08-28: Update 2/75 - Literature Review

2022-09-04: Update 3/75 - Back on Track and Back to Vallendar

2022-09-10: Update 4/75 - Long Test Runtime; Retriever Works

2022-09-18: Update 5/75 - Jour Fixe Joy

2022-09-26: Update 6/75 - Reading Group

2022-10-02: Update 7/75 - Leaving the Phone at Home

2022-10-09: Update 8/75 - Finding a Conference

2022-10-23: Update 10/75 - Still Unsure About the Dataset

2022-10-30: Update 11/75 - NVIDIA DGX-2 and Swiss Cheese

2022-11-10: Update 12/75 - Three Days of Conference via Zoom

2022-11-24: Update 13/75 - Vacation and Roadmap for 2023

2022-11-30: Update 14/75 - Supervising B.Sc. and M.Sc. Theses

2022-12-14: Update 15/75 - A Rather Uneventful Week

2022-12-24: Update 16/75 - Year-End Cleanup Sprint

2023-01-01: Update 17/75 - New Year’s Resolutions

2023-07-20: Update 18-28/75 - A Long Gap and Two Papers Handed In!

2023-12-12: Update 29-50/75 - First On-Site Conference Visit and Increased Focus

2023-12-25: Update 51-52/75 - Merry Christmas!