I built a three-skill pipeline so I’d never have to “Download transcript” again. The pipeline isn’t the point.
I record a meeting with myself every day.
It’s not a meeting. It’s me, at my desk (or in the car, or on a run), talking out loud to Teams for fifteen or twenty minutes about what I’m trying to do that day: the threads I’m pulling on, the half-finished decisions from yesterday, the things I’m worried about, the projects I need to start or finish, the things I’m excited about. I hit record, I talk, I hang up.
Then Teams transcribes it. And that transcript is gold. Because it isn’t notes I had to type. It’s me, thinking out loud at the speed of speech, in my own voice, with all the messy connective tissue that I never bother to write down. It is the single richest piece of context I can give to my agent for the rest of the day.
But to get it into the agent’s hands, I had to do this every time:
- Open Teams.
- Find the meeting in the chat.
- Click into the recap.
- Open the transcript pane.
- Click the three dots, click Download, pick .docx.
- Convert to txt so the agent can really consume it.
- Move the file from Downloads into the folder my agent watches.
Seven clicks. Every day. Forever.
That’s the kind of friction that kills a workflow before it has a chance to compound. I knew it. I felt it every morning. And every day I did those clicks anyway, because the payoff was worth it. Many days, more than once.
I’ve done this for months, and couldn’t build a way around it, until now.
The real point isn’t the meeting
I want to be honest about what this project actually was, because the surface description (“I automated Teams transcript downloads”) sells it short.
The point is not that downloading a Teams transcript is hard. It isn’t. It takes less than a minute.
The point is that a manual task done every single day, forever, is the single most expensive thing you can put in your workflow. Not because of the minute it takes. Because of the decision cost: the tiny moment every time where I have to remember to do it, choose to do it, and then mentally context-switch out of “I’m about to work” into “I’m doing data plumbing.” That decision cost is what kills the workflow. The transcript stops happening. The agent stops getting the context. The agent’s outputs get worse. I notice. I get frustrated.
The fix wasn’t a better agent. The fix was making the input arrive without me.
The constraint that shaped everything
The clean version of this is one Graph API call. GET /me/onlineMeetings/{id}/transcripts/{id}/content returns the .docx. It’s a one-liner.
It’s also blocked in my tenant. Admin consent on OnlineMeetingTranscript.Read.All is not happening. Nobody is going to grant that permission to one rando in a 200,000-person tenant. The answer was no, and the answer is going to stay no, and that’s fine; I’m not building a product, I’m building a workflow for me.
So the entire design had to assume that the only reliable way to get a transcript is to drive the Teams web UI like a human would. Playwright. Click the buttons. Wait for the download. Move the file.
Once you accept that constraint, the shape of the solution starts to come into focus.
Three skills, one pipeline
I broke the work into three pieces, each doing one thing, each independently useful.
Stage 1: /transcript-watcher polls my OneDrive Recordings folder every thirty minutes. Teams puts a new MP4 there every time a recorded meeting ends. When the watcher sees a new file whose name matches one of my watched series prefixes, it knows there’s a transcript to go get. It pings me in Teams (“I see a new recording for X, going to grab the transcript”) and invokes the next skill. The expensive thing (Playwright) only runs when there’s actual work.
Stage 2: /meeting-transcript is the Playwright driver. It opens Teams in a real browser, finds the meeting in the chat, opens the transcript pane, clicks Download, waits for the .docx, and saves it to a watched folder with a sensible filename. This is also the skill I can invoke directly when I want a transcript for a meeting that happened weeks ago.
Stage 3: /doc-watcher watches the folder. When a new .docx lands, it converts it to plain-text Markdown using Word COM (headless), pings me in Teams that it’s ready, and the original .docx never gets touched.
End to end: about thirty minutes typical, an hour worst case (the recording has to finish processing in OneDrive before stage 1 can see it). Zero clicks from me. If I want it right now (I usually do), I kick it off manually. Soup to nuts in three minutes.
The Markdown file lands in the same folder my agent reads at the start of every conversation. The agent sees this as ambient context. It knows what I was worried about. It knows what threads I’m pulling on. It knows what I’m trying to do that day. I didn’t have to tell it. I just had to talk.
The gotcha that confused me
One detail almost broke the whole thing.
The Recap picker in Teams shows the scheduled meeting time, not the actual recording start time. So my standing one-on-one with myself is scheduled at 18:45, but I actually hit record at 12:20, and the file the pipeline produced was named with 18:45. I do this more than once a day, so multiple files collided because they were all “scheduled at 18:45.”
The fix was to stop trusting the Recap picker and read the recording start time out of the chat thread instead. Three regex cases (English UI, localized variants, edge formatting) cover everything I’ve seen. The skill warns loudly if it falls back to the picker’s value, so I’ll notice if a fourth case shows up.
A workflow you can’t trust is a workflow you’ll abandon. Naming has to be right or none of the rest of it matters.
What this actually unlocked
The transcripts arrive. The agent reads them. I haven’t clicked “Download transcript” once.
What I didn’t expect was the second-order effect. Because the friction went to zero, I started recording more things: short voice memos between meetings, a five-minute postmortem after a hard call, a quick “here’s where I left this” before I close the laptop. All of it ends up as Markdown in the same folder. All of it becomes context.
The pipeline removed seven clicks. Removing those clicks made me record three times as often. The agent’s context window got an order of magnitude richer. Jevons’ paradox in action.
That’s the trade I want to keep making: find the daily friction, automate it down to zero, then watch the behavior on the other side of the friction explode.
The three skills are live
All three are published on SkillWorks if you want to lift them. They’re Clawpilot skills: install them with /install, configure your paths, and you’re running. They should work fine with GitHub Copilot, Copilot CoWork, or other systems with minor modifications. Ask your agent to adjust them.
- Teams Transcript Pipeline: the full write-up (the why, the design, the dead ends)
/transcript-watcher: polls OneDrive for new recordings/meeting-transcript: drives Playwright to pull the .docx/doc-watcher: converts .docx to plain Markdown and pings you in Teams
Each one is independently useful. The doc-watcher in particular has nothing to do with Teams. Point it at any folder of Word documents and you’ve got a feed of plain-text copies that your agent can use.
Your turn
What’s your seven-clicks-a-day workflow? The one you’ve been doing forever because the payoff is worth it, but the decision cost is starting to fray?
Drop it in the comments. Half the time I see somebody else’s, I realize I have the same one and didn’t notice.
More of what I write lives at signalnotsentiment.com. Lessons from doing, not theorizing.




