2022-10-16: Update 9/75 - Dataset - Make or Take
This week, I tried to decide between taking an existing dataset for question answering for financial documents vs. making a new dataset.
I gathered some pros and cons and identified potential datasets. Building a new dataset from scratch requires effort. But it can also be the basis for a publication. Also, there does not seem to be a dataset that fits my research question.
My supervisor and I will discuss the results and decide in the upcoming week.
You can follow these updates: Substack Blog Telegram WhatsApp LinkedIn Medium Twitter Calendly
This large screen makes me more productive.
What Happened Since Last Week?
I finished the other half of the book “How to Write and Publish a Scientific Paper” by Robert Day. Good book.
My supervisor suggested a dataset that I might use for benchmarking my algorithms. If this dataset is suitable for my research, it can save me much time as I would not have to create a dataset on my own. I reflected on this dataset and am not sure if it is suitable for me. I will discuss this with my supervisor on Tuesday.
My colleague Thomas Huber held a presentation at the Chair of Data Science and NLP about the introspection of transformer-based language models [LM-Debugger - An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models (Geva et al, 2022)]. The paper presents a method to accurately change the behavior of language models for specific prompts (among other research contributions).
What Were the Biggest Obstacles?
No major obstacles. I left the phone at home again today, and it was great.
Which Goals Did I Meet?
- Write one section for the first paper. The first paper will be a literature review/state-of-the-art.
- Identify a conference for the state-of-the-art.
Which Goals Did I Miss?
- Align the outlet (conference) with my supervisor. (I.e., ask him if he likes the conference and thinks that it fits my research question.
Was It a Good Week?
Yes. Everything starts falling into place, and I have a clearer view of the literature and what I want to write about.
Short-Term Tasks for The Coming Week
- Align the outlet (conference) with my supervisor. (I.e., ask him if he likes the conference and thinks that it fits my research question.
- Decide on whether to prepare a dataset ourselves or take an existing dataset.
About “75-Step Journey Toward a Ph.D. in Natural Language Processing”
You will, from now on, witness my grind. Feel my blood, sweat, and tears.
With this series of articles, you become a real-life weekly witness of my dissertation progress, all in 75 steps. This has multiple purposes:
1) Forcing myself to keep moving through the power of public shame!
2) Helping other (prospective) Ph.D. students to stay motivated and to show that hard times are normal when going through this process.
3) Getting support from the community when I go through hard times.
Share this with your Ph.D. student friends: Substack Blog Telegram WhatsApp LinkedIn Medium Twitter Calendly.
Read More From the 75 Steps Toward a Ph.D. in NLP Series
2022-08-20: Update 1/75 - Kicking Off the Journey Toward a Ph.D. in NLP
2022-08-28: Update 2/75 - Literature Review
2022-09-04: Update 3/75 - Back on Track and Back to Vallendar
2022-09-10: Update 4/75 - Long Test Runtime; Retriever Works
2022-09-18: Update 5/75 - Jour Fixe Joy
2022-09-26: Update 6/75 - Reading Group
2022-10-02: Update 7/75 - Leaving the Phone at Home
2022-10-09: Update 8/75 - Finding a Conference
2022-10-23: Update 10/75 - Still Unsure About the Dataset
2022-10-30: Update 11/75 - NVIDIA DGX-2 and Swiss Cheese
2022-11-10: Update 12/75 - Three Days of Conference via Zoom
2022-11-24: Update 13/75 - Vacation and Roadmap for 2023
2022-11-30: Update 14/75 - Supervising B.Sc. and M.Sc. Theses
2022-12-14: Update 15/75 - A Rather Uneventful Week
2022-12-24: Update 16/75 - Year-End Cleanup Sprint
2023-01-01: Update 17/75 - New Year’s Resolutions
2023-07-20: Update 18-28/75 - A Long Gap and Two Papers Handed In!
2023-12-12: Update 29-50/75 - First On-Site Conference Visit and Increased Focus