Meeting notes with Word and ChatGPT
A small workflow to capture meeting notes with the help of Microsoft Word and ChatGPT.
I hate taking meeting notes. This doesn’t mean that I don’t write them when needed, but the mere act of having to stop listening and write something down breaks my whole mental flow. This is particularly distressing when the orator is saying something really important or interesting (or just speaks really fast). And then, after the meeting, you’re telling me that I have to construct a professional-looking document to share with the attendees? In other words, I have to make sense of my own incomprehensible doodles and scribblings and translate them for other people? (which adds even more time to the cost of the meeting itself.) Don’t get me wrong: being able to share a meet and interact with colleagues in real time is paramount to modern work (especially for a remote worker such as myself). But the overhead of taking and sharing meeting notes is something that I would gladly avoid if possible.
So, with that in mind, I thought: “Surely, in 2025, there’s a tool out there that can already perform this task for me?” And the answer is: yes, there are a lot of alternatives. So many, in fact, that when I started looking, I tried to compile a list of them for comparison, but I quickly got overwhelmed and gave up. My biggest issues were the different ways in how these tools operate: some of them required you to upload audio recordings, others needed you to install specific software, and then there were those that only worked with specific video conferencing platforms, and so on. After that, the pricing models: from subscription-based to pay-per-minute of transcription, none of them were really appealing to me. Even worse, when using their free tiers, the results were often subpar, with poor transcription accuracy and limited features. None of them really made me stop and think, “I can really see myself using this tool regularly”. So I stopped looking for a ready-made solution and thought: “Is there anything that I already have (meaning, something I already pay for) that can help me with this task?” Enter Microsoft Word and ChatGPT. Specifically, the version of Word that comes with a Microsoft 365 subscription.
The workflow
You will need three ingredients for this recipe:
- The transcript of a meeting.
- An LLM that can summarize text. In this case, ChatGPT.
- A prompt to guide the LLM on how to summarize the transcript.
“How does Word come into play here?”, you may ask. It turns out that Microsoft Word has a somewhat hidden transcription feature that can transcribe audio recordings directly within the application. In fact, not only Word, but OneNote as well (that’s where I first discovered this feature). Here’s the official documentation on how to use it, so I won’t go into too much detail here. What’s important to know is that:
- It has a limit of 300 minutes of transcription per month for Microsoft 365 subscribers (which is more than enough for my needs).
- It needs an audio file as input (unless you’re using the live transcription option, which I haven’t tried yet). Also, this file will be uploaded to your OneDrive account for processing, so it will count against your storage quota.
1. Getting the transcript
We’ve established that Word can transcribe audio recordings. Great! But how do we get the audio recording of the meeting in the first place? In my case, this is not a problem, since I have the habit of recording most of my meetings for later reference. Specifically, I use OBS Studio to record both video and audio of my meetings. With it, my workflow is platform-agnostic: I can record meetings from Zoom, Microsoft Teams, Google Meet, or any other video conferencing tool. And the program has a lot of customization options to aid in the recording process (like setting a gain filter to boost the volume of my microphone, for example).
Pro tip: If you have a multi-monitor setup, set OBS to record only one of your screens and make a habit of moving the meeting window to that screen when you start recording, while simultaneously moving any distracting windows to the other screen(s). This way, you avoid recording sensitive information that may be visible on your other screens (like incoming chat notifications, for example). This is especially useful when you have to share your screen during the meeting.
Remember to always inform the other attendees that you are recording the meeting, and get their consent if necessary.
But if you go this route, you’ll probably encounter a new issue: recording a meeting results in a video file which, depending on the length of the meeting and the recording settings, can be quite large (for example, a one-hour meeting recorded on my 1080p monitor at 30 FPS usually results in a file of around 300 to 500 MB). The transcription feature in Word has a file size limit, so most of the time it will be necessary to extract just the audio from the video file. For this, I use Shotcut, a free, lightweight, open-source video editor that has a lot of features, including the ability to export just the audio track from a video file. Granted, its interface is a bit confusing, but since I’m already familiar with it (from my time making gaming compilation videos), it only takes me a few seconds to extract the audio track from a recorded meeting video. Nevertheless, you could also use FFmpeg for this task, which can achieve the same result with a single command line. Whatever method you choose, I recommend exporting the audio track in WAV format, since it’s uncompressed and lossless, which should result in better transcription accuracy.
After you have the audio file, open Microsoft Word, create a new document, and go to Dictate > Transcribe on the toolbar. Select Upload audio, choose your audio file, and wait for the process to finish, which may take a while depending on your connection speed and the length of the audio.
This action will upload your audio file to Microsoft’s servers for processing, so consider this if the content of the meeting is sensitive or there are privacy concerns involved.
Once the process is done, you’ll see the transcript divided into segments, each with a timestamp and a speaker label. In this view, you can change the speaker labels if needed (which I recommend doing to avoid using the generic “Speaker 1”, “Speaker 2”, etc. labels) and even edit the transcript to correct any mistakes (which I don’t recommend doing; more on this later). Finally, click on Add all to document to insert the full transcript into your Word document. If you change any speaker labels, select the option to insert the transcript with them, but do not select the option to include timestamps.
Why not include timestamps?
In my experience, timestamps just add clutter to the transcript. Including them in the final document will only consume more tokens when sending the transcript to ChatGPT for summarization, so I recommend omitting them.
Why not correct mistakes in the transcript?
Depending on the audio quality, the noise in the environment and the clarity of the speakers, the transcription may contain errors. In fact, most of the transcriptions I get from Word have several mistakes. But correcting these manually is a time-consuming task that is usually resolved by the next step of the workflow, which is to summarize the transcript using ChatGPT. Therefore, I don’t recommend making any corrections directly in the transcription.
2. Summarizing the transcript with ChatGPT
Now that we have the transcript in our Word document, it’s time to summarize it using ChatGPT. You might be wondering: “Why not use Word’s built-in AI (Copilot) for this?”. I actually tried it several times, but the results were really disappointing. I don’t really know why, but also I didn’t put much effort into troubleshooting it. So what I do is simply CTRL+E (for Word in Spanish; CTRL+A for English) the whole document, CTRL+C it, and then CTRL+V it into a new ChatGPT conversation.
3. The prompt
To guide ChatGPT in summarizing the transcript, I have a project-specific prompt with the following instructions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
You are an expert meeting summarizer. When I provide you with a meeting transcript, generate meeting notes with the following fixed structure:
1. Meeting Details:
- Project name, date and participants, with roles if applicable.
2. General Summary:
- Narrative summary of the discussion (context and objectives).
3. Topics Covered:
- Development of relevant points discussed in sections or bullet points.
4. Agreements:
- Numbered list of definitions and decisions made in the meeting. These should be finalized items that do not require further iteration.
5. Next Steps:
- Bulleted list of tasks or action items defined during the meeting. Do not use numbering to avoid any interpretation as sequential or prioritized.
6. Upcoming Meetings:
- Record any future sessions mentioned in the meeting (with a date if specified, otherwise marked as "to be coordinated").
In this project, I paste the transcript into a new conversation and then wait for ChatGPT to process it and generate the meeting notes. Since we are working in a text-only realm here, even with a long transcript full of mistakes, ChatGPT does an amazing job at producing a professional-looking result that is easy to read and understand. After that, I just copy the generated notes back into my Word document, do some light formatting if needed, and save it for sharing with the attendees.
Whatever you plan to do with the generated notes (share them via email, upload them to a project management tool, etc.), don’t forget to review them first to ensure that everything is correct and nothing important was left out. With LLMs, there’s always the risk of hallucinations or misinterpretations (especially if the transcript has mistakes), so checking the final output is a must. Trust me: you don’t want your meeting notes to say “the deadline for this feature is the 30th of next month” when it was actually “the 30th of this month”.
Additional tips
With the previous steps, you should be able to produce great meeting notes with minimal effort1. But here are some additional tips, based on my experience, that could help you improve the results even further:
- Try to execute this workflow shortly after the meeting ends, while the information of the meeting is still fresh in your mind, which will allow you to make any necessary adjustments to the final document.
- During the recording, try to capture a snapshot of the participants (for example, a screenshot of the video conference grid view) to include in the meeting notes as a visual reference of who attended.
- Try not to have really long meetings (over an hour), since you’ll then have to take additional steps to produce a concise summary (like splitting the transcript into smaller chunks and summarizing them separately).
- Make an effort to speak clearly during the meeting. Here in Chile, we tend to speak really fast with horrible pronunciation; since I started applying this workflow, I made a conscious effort to slow down and enunciate better, which helps everyone, including the transcription tool.
- If something important was said, repeat it. This will not only help to reinforce the point for the attendees, but will also increase the chances of it being highlighted by the LLM during summarization.
- Use a single conversation thread in ChatGPT for each project. This way, the model will have context from previous meetings and will be able to produce more coherent summaries. This has the added benefit of letting you ask follow-up questions about the project in the same thread, like “When did we discuss X?”.
Yes, this workflow has several manual steps that could certainly be automated (for example, a small script that automatically detects a new video file, extracts the audio, transcribes it with Whisper, and then sends it to ChatGPT for summarization). But I’m so familiar with each step that it only takes me a few minutes to complete the whole process. Perhaps, in the future, I’ll go down in that rabbit hole… ↩︎