Cycling, Buses, Transcription, and IoT

back | 2025-03-20

Cycling, buses, transcription, and IoT

Those who know me will remember that back in November, I had a slightly terrifying cycling episode involving a bus. The accident put me in hospital, and while I was relieved to find out there was nothing too serious, I had sustained a broken wrist and elbow, plus a nasty shoulder sprain. In case you were wondering, the bus didn't get off lightly, either:

Bus with a shoulder-sized impact in the windshield

The Aftermath

I got out of the hospital that same evening, but I was slightly heavier than when I went in because I now had a ~2kg cast attached to my arm and hand. Unfortunately, I'd foolishly done all this damage to the right-hand side of my body, which is not ideal for a right-handed person—and so began the 6-8 week interim of I-can-do-some-things-but-not-very-much.

The main challenges of being a third-year student/DevOps/cyclist/swimmer person with only a functional left arm turn out to be:

So, all in all, the situation wasn't great.

Universities and Bureaucracy

Due to academic bureaucracy, a deferral in the final year causes delayed results and graduation. And, while the support from the university was earnest, it wasn't substantially all that good—they couldn't actually give me a timeframe (or even an idea of the timeframe) for the expected graduation. All I knew was that it would all be delayed by an unknown amount of time, and that my grades would be up in the air until then.

Email advice from the Student Hub explaining how it is difficult to discuss hypotheticals.

Now, for most students, delayed results would not be ideal, but have to suffice. However, if you're taking a Master's degree directly after your undergraduate studies, you have to graduate before a specific date to meet conditions for entry. With a year of my life hanging in the balance, deferral was a risk I wasn't willing to take, so I had to find a way to complete those assignments to a reasonable standard even with my reduced ability.

Transcription with Python

My assignments were due in very early January. In total, I think I had something like 15,000 words to type with one hand, plus three technical projects. Being command-line proofs-of-concept, the technical projects were, unfortunately, a gigantic time-sink. On the other hand, the write-ups could be transcribed. The only problem (and it's partially self-inflicted, I know) is that I primarily use LaTeX and VSCode to write. I couldn't even run Word if I wanted to because my laptop runs Linux.

I tried to find a desktop application that did native transcription, but I couldn't find anything that worked at all, let alone worked well. After a lot of searching, I gave up on that plan and had another idea: combining an existing speech-to-text library with a utility for Desktop output. Believe me when I say I tried everything. I tried OpenAI's Whisper, I tried Vosk, etc. None of it worked all that well, until I found RealtimeSTT. I wrote a very short Python script to pipe the output into wtype (keyboard simulation for Wayland), and voila, it worked!

The script shown below is also available on my GitHub (Linux), although the real credit goes to the maintainers of RealtimeSTT for the library.

import subprocess
from RealtimeSTT import AudioToTextRecorder

def process_text(text):
    global recording
    print(f"Detected: {text}")
    if text.lower() == "start typing.":
        recording = True
        print("START")
    elif text.lower() == "stop typing.":
        recording = False
        print("STOP")
    elif recording:
        try:
            subprocess.run(['wtype', text], check=True)
        except Exception as e:
            print(f"Error typing text: {e}")

if __name__ == '__main__':
    recording = False
    recorder = AudioToTextRecorder()
    while True:
        recorder.text(process_text)

The script itself is nothing special, but I did use it to write up the majority of my final-year assignments (worth 42% of my final grade) with relatively accurate punctuation and only minor tweaks needed.

IoT Transcription

One of those assignments was to design and implement an Internet of Things project. Since we had freedom of choice, I thought I'd put that transcription research to good use.

IoT devices are resource-constrained (and a ~25-line PoC doesn't cut it for an entire project), so it would make sense to involve edge or cloud computing—a topic I hadn't yet explicitly covered in my degree.

However, I have used Cloud computing a fair amount at work, and Azure is the epitome of everything I hate about Microsoft. So, I was eager to use AWS instead. Conveniently, Amazon has a strong position in Transcription technologies because of Alexa. They offer AWS Transcribe, a PaaS product aimed at call centers and other niche situations requiring a large (but not necessarily 100% accurate) amount of transcription.

In the business context of meeting transcription, the plan was to stream audio from a transcription device to AWS Transcribe and make use of its advanced features, such as speaker recognition (diarisation).

Due to bus-imposed time constraints, I used a Pi Zero 2W with an Adafruit MEMS microphone on the hardware side to make the process as simple as possible. Once I'd built the code, I made the filesystem read-only so a hard shutdown wouldn't corrupt it.

Assembled prototype device.

The final but slightly rushed software is essentially an adapted version of Amazon's example implementation of their event stream encoding protocol. It extracts the transcription results from the AWS server's response and displays them in the console or forwards them to another server. The codebase also contains a simple PoC display server that can visualise the transcription results as they come in.

Speech transcript

To really root my project in modernity, I also worked in a bit of unnecessary AI. Visiting the /summary web page uses an API call to OpenAI, retrieving a generated summary of the transcript. It actually worked quite well.

Generative AI summary of meeting notes

Results

Ultimately, I got two 83s (%) and an 88 on the affected assignments. Even with transcription, it all took a lot more effort than usual—but because I was worried about the impact on my grades, I overcompensated and did reasonably well (on paper at least; I was very stressed).

Basically, this whole thing was a massive (but interesting) tangent from its inciting incident. Don't get hit by a bus!