Building a Game Player with ChatGPT and Rails - Part 2
DougDoug is a Twitch streamer (and programmer!) who recently did a fun stream making ChatGPT play a children’s point-and-click adventure game. It’s a long bu...
DougDoug is a Twitch streamer (and programmer!) who recently did a fun stream making ChatGPT play a children’s point-and-click adventure game. It’s a long but incredibly fun video, better than any movie currently in theaters.
I wrote a bit about the implementation and how I might approach it from a resiliency perspective, but I didn’t make it too far into actual build due to the whole damn staying-alive-and-having-an-emotionally-draining-job thing. But as stress relief for the last month or so, I have also been streaming on Twitch (feel free to drop in and check it out), and I wanted to try something similar.
This week I’ve been streaming development of my own implementation, which has been a ton of fun. Four evenings of development in total, and I streamed three of them, plus the inaugural day in which I used the new app to play Hitman: World of Assassination - the VODs are linked below in case anyone wants to check them out:
The source code for both backend and frontend are in these repos:
It’s worth noting that DougDoug recently did another “ChatGPT Plays” stream, where the implementation was much more robust. One side-effect is that the AI became less “crazy” over time - in the first stream, the message context would roll off (including the initial prompt with the rules for the AI), and ChatGPT would increasingly be “trained” on it’s own responses. For the second stream, it appears that he built in a “system message” which would be present on all requests (OpenAI terminology - I referred to it as “Prompt Context” in my first post). This led to the AI responses being fairly consistent, and slightly less entertaining (DougDoug ended up doing some “brain surgery” during the stream to try to make it more entertaining).
I also finally spent some time looking at OpenAI’s API docs (reading the manual - imagine that!) and reading more on LLMs, and learned a few important things:
temperature=1.8
or higher tends to crash the API; lower values are consistently fasterThinking about an entertaining MVP for both a coding and streaming perspective, here are the minimum requirements I came up with:
messages
array~For the MVP, I explicitly left out text-to-speech processing for the response - I can read it out myself for a while, and TTS API services are a tiny bit expensive.~ I subsequently decided, after I wrote this but 2 hours before I was going to stream Hitman, to add TTS as optional.
The data storage and OpenAI integration requirements are very easy to implement in a framework like Rails, so that’s where I started (with Postgres for storage).
Because I’m only planning to run this app locally, I’m not currently worried about Dockerizing, adding auth, monitoring, etc.
The most extenuating requirement is around speech recognition. There are several providers that we could call from Rails, but they require us to already have an audio file to process. Ruby and Rails are very lacking in media-related tools (especially compared to Python). I briefly went down a dark path of trying to pipe commands to ffmpeg in the shell, but quickly gave up.
One option could have been to implement via Python, but I am not as familiar with those frameworks and it probably would have tripled implementation time (not something I wanted in this case due to it being a hobby project).
After some research, I settled on building a basic UI in React JS. James Brill has built a fantastic React library that integrates with browser speech recognition tools, meaning that if I run the UI in Chrome I can get very easy out-of-box speech recognition. For styling I chose Joy UI, a variant of Material UI (I tend to prefer Material and it’s kind because I am bad at styling and these frameworks make it easy).
Because there would be a UI component, I opted to develop my Rails APIs in GraphQL, as Apollo’s JS client library is incredibly simple to use.
After deciding to add TTS, I had to make a few last-minute additions. ElevenLabs is my preferred TTS service, although it is rather expensive so I may explore other options (on a $22/month plan, I burned through more than half my quota in one stream - compared to $0.25 for OpenAI in the same stream).
TTS processing is also more time-consuming, and I didn’t want to risk a GQL request timing out in the middle, so I added background job processing via Sidekiq (using foreman
to manage multiple Rails processes).
Making the TTS audio available in the UI was straightforward, I ended up using the built-in HTML5 <audio>
wrapper which provides nice controls with little fuss. It’s ugly and doesn’t match the Material UI feel of the rest of the app, but it was good for a quick implementation.
Getting the MP3 files to actually be available was very troublesome. I didn’t want to bother with a ton of remote file hosting because I was only running this app locally. Browsers can access local system files (I setup the Sidekiq job to write to a local shared directory), so I assumed it would be easy to hook up, but React only has context for files included in public
. I ended up symlinking the directory. Heaven forgive me.
Notifying the front-end of finished job processing briefly sent me down the rabbit hole of GraphQL subscriptions - the React UI could subscribe to a particular query and update in realtime as the backend publishes updates. Unfortunately, the only subscription-management plumbing for Ruby GraphQL that is in the non-pro plan uses ActionCable
which cannot handle publishing updates from background jobs (the two options that enable that use the DB and Cache, and are only available on Pro). I ended up going with the simple solution, setting pollingInterval
in the client query to regularly refetch the message data.
Barring a few small UI bugs, the maiden voyage of the app itself was fairly smooth with just a few complaints.
In the last-minute implementation rush I skipped adding Sidekiq UI. This made it more difficult to see/diagnose issues with the ElevenLabs integration. Remember kids, this is why proper log search systems and observability are important.
Not surprising to anyone with more ChatGPT experience than me, getting the right system message is tricky. It took several tries to get something with the tone that I wanted, but thankfully updating the system message is fairly easy via the UI; so I am glad I took time to build that out.
This was my biggest hope for the tool - introducing more variability - but it ended up breaking in ways I did not expect. Usually the output was not random enough, but sometimes it was too random:
This massively confused me on stream until I figured out it was temperature-related. I ended up removing the variable temperature option from the frontend for now. I might add it back in the future with a more limited range.
I’m very happy with how it turned out - for a tool intended to be used by a single streamer, it does the job well and required minimal live debugging. I do still have a few features I might want to add:
1
), maximum ???? (maybe around 1.5?)DougDoug is a Twitch streamer (and programmer!) who recently did a fun stream making ChatGPT play a children’s point-and-click adventure game. It’s a long bu...
Puma has a pretty interesting feature called after_reply - if there’s a potentially costly operation that’s not on the critical path to responding to a consu...
Watching Twitch streamers and their highlight videos is a guilty pleasure of mine. DougDoug does a lot of streams where the main idea utilizes AI, typically ...
ChatGPT, created by OpenAI, is an amazing tool that’s helpful in refining rote tasks, and will eventually become as commonplace as spellcheck tools. However,...
Most product, design, and engineering folk are well-aware of their app’s Happy Paths - that is, under all the right circumstances, the imagined optimal set o...
I’ve been on vacation this week, which means two things:
Our team recently came across a fun little quirk in the as_json method of an ActiveRecord model.
Jurassic World Dominion just came in movie theaters. It is absolutely terrible, and I love it.
When you’re cleaning up a monolith Rails app, it’s essential to have usage metrics to know what pieces of code are safe to remove. Tools like Datadog APM and...