Building a Game Player with ChatGPT and Rails - Part 2
DougDoug is a Twitch streamer (and programmer!) who recently did a fun stream making ChatGPT play a children’s point-and-click adventure game. It’s a long bu...
Watching Twitch streamers and their highlight videos is a guilty pleasure of mine. DougDoug does a lot of streams where the main idea utilizes AI, typically to help play a video game. One of his recent streams involved using ChatGPT to beat a children’s point-and-click adventure game (warning, this is a long video, but very entertaining!)
DougDoug scripted together multiple pieces with Python to make this work:
The whole thing was written as a very impressive Python script, but that left it prone to crashes and a few other issues. As I was watching, I couldn’t help but think through how to architect this into something more resilient and flexible.
Note: this is not a criticism of DougDoug’s programming, many of the things I note here as “issues” led to the most entertaining parts of his stream. This is simply me exercising my problem exploration and resiliency muscles to keep them sharp, and would generally result in a less funny outcome :)
Let’s start by looking at the basic pieces of information that move around:
The Input Prompt information is the most complex; ChatGPT has to be provided with conversation history inside of a request to be able to reference that history (at least, when using the API).
To look at it visually over the course of a few requests (borrowing from how DougDoug illustrated the process):
A few considerations I have:
Stretch goals:
The speech/text translation pieces arguably make this project better suited for Python, but the remaining pieces are straightforward in Ruby on Rails. That’s where I’m most comfortable, so we’ll be building inside that framework. For resiliency, we should put some components in background jobs (basically, anywhere we call a 3rd party service).
Since we want to eventually track multiple characters, we should start with that as our top-level model, as everything else will branch out from that.
Prompt Context is a single large text field associated with a character; we can store this in the character model.
Input Text and Response Text are both text values. They should be stored relationally to a character. One question: should they have their own individual tables, or be stored in one table with a column to distinguish the types? Well, to construct Input Prompt, we’ll need to be able to sort in chronological order regardless of type. We also need to filter to one type or the other to support our stretch goal. This is easier to accomplish if we store them in a single table.
Input Audio and Response Audio will only be temporary, so we can store them to the filesystem in a temp folder. If we were planning to run this as an external-facing web app, we’d probably want to store it on Amazon S3 or some similar service.
Input Prompt is the fun one because we have two viable options. We could always generate the value dynamically and let it be ephemeral, or we could store it after generation. There is no functional difference in terms of our interactions with ChatGPT, but if we ever change the generation rules then we’d lose visibility into what we had previously sent over. Just for the sake of debugging, we’ll store Input Prompt so we can reference it in the future, but we won’t prioritize visibility.
Given that, we might consider keeping Input Text, Response Text, and Input Prompt in some sort of general “text events” table. Is this a good idea? Well, if we ever wanted to display a full history of these fields in chronological order, that might be helpful. But there may be additional context related to Input Prompt we might want to store, such as ChatGPT API version. So if we do store Input Prompt, we’d likely want to do that in a separate table.
Now that we’ve thought through these details, where do we get started?
Input Audio and Response Audio are very much add-on pieces to the rest of the app, so we can save those for later.
The character model - along with Prompt Context - should probably come first, given that everything else in the data model comes from those.
Next we can model out Input Text and Response Text, then we should be good to setup Input Prompt logic and construct our ChatGPT calls.
Once the modeling is done we can create our APIs. I didn’t dig into the choice for this, but I’ll use GraphQL (partly because I want to experiment with using GraphQL Streaming). With the API in place, we can then craft a basic UI where we can:
As I build this out I’m planning to write posts here as well as sharing PRs used in the process!
DougDoug is a Twitch streamer (and programmer!) who recently did a fun stream making ChatGPT play a children’s point-and-click adventure game. It’s a long bu...
Puma has a pretty interesting feature called after_reply - if there’s a potentially costly operation that’s not on the critical path to responding to a consu...
Watching Twitch streamers and their highlight videos is a guilty pleasure of mine. DougDoug does a lot of streams where the main idea utilizes AI, typically ...
ChatGPT, created by OpenAI, is an amazing tool that’s helpful in refining rote tasks, and will eventually become as commonplace as spellcheck tools. However,...
Most product, design, and engineering folk are well-aware of their app’s Happy Paths - that is, under all the right circumstances, the imagined optimal set o...
I’ve been on vacation this week, which means two things:
Our team recently came across a fun little quirk in the as_json method of an ActiveRecord model.
Jurassic World Dominion just came in movie theaters. It is absolutely terrible, and I love it.
When you’re cleaning up a monolith Rails app, it’s essential to have usage metrics to know what pieces of code are safe to remove. Tools like Datadog APM and...