January 3, 2023

AO3 Disco: The Road to v1.0

AO3 Disco: The Road to v1.0

On January 1st, 2023, I decided to start building the AO3 Discovery Engine. The motivation for this project was three fold:

  • Personal usage. I wanted to build something like this for personal use, as opposed to digging through bookmarks/tags on AO3.
  • Learning modern mobile app development. I haven't touched mobile apps since high school, back when Windows Phone was a thing, and wanted to get back into it.
  • Practical applications for research. I spend a lot of time reading various research papers, this project would put a lot of those ideas to a practical application.

This post will summarize the path from the original idea to the release of AO3 Disco v1.0 on the Google Play store and where I plan to go from here.

Web → Mobile

At the very beginning, I planned to build a web application that would allow people to build collections of works, and then for each collection, they would get suggestions for additional works that fit the "theme" of each collection. We actually got fairly far along in this process before discovering two issues:

  1. According to my sample size of three - we conducted a very scientific and metholodical survey - most people read fanfics on their mobile devices, not their laptops.
  2. Building collections of works was tedious - you had to repeatedly copy and paste URLs. The alternative was asking users to provide their AO3 username/password, which wasn't going to happen.

These discoveries - in addition with the fact that I was interested in learning mobile development - led me to pivot to building a mobile app. The key feature I was looking for? The ability to "share" a work from your browser to the app and get recommendations - no copy and pasting needed.

Then, upon discovering that the registration fee for submitting an app to Android's Play Store was only \$25 while Apple's App Store was \$99, I decided to start with an Android app.

And that's how AO3 Disco was born.

Android App

When building the mobile app, I encountered several false starts. I started by trying to build the app using Jetpack Compose, the latest framework for building mobile apps officially supported by Google. As it turns out, I have no clue how to use Kotlin.

After rapidly going through a bunch of frameworks - Java for Android, Flutter, Xamarin, etc. - and concluding that they were all really hard to use, I finally settled on Ionic/Capacitor, a framework which would allow me to implement most of the app using web technologies and only using native plugins for the tricky stuff.

An early sketch of what the app could have looked like, drafted in Figma.

Having finally decided upon a tech stack for the front-end of the app itself, I proceeded to blunder my way through implementing a MVP which provided 2 capabilities: sharing a work from a browser and allowing users to scroll through a deck of recommendations.

The MVP which was released to a small group of testers after posting in /r/TheCitadel.

I posted in a relatively smaller subreddit (r/TheCitadel) asking users to give our app a try and received lots of actionable feedback. This led us to v1.0 of the app which added new features such as bookmarks, history, filters, snoozing, and more.

Our listing on the Google Play store after two weeks.

This was released publicly a week later and posted to several subreddits, including r/rational, where I received a lot of awesome technical questions, leading me to draft this post.

Discovery Engines

Currently, there are two discovery engines available in the app that the users can choose from. Eventually, I hope to be able to combine them into a single optimized engine by adding a second stage model, but as I have not figured out a way to balance the trade-offs yet, it's currently up to the user to choose which experience they prefer:

  • Classic. Given a specific work, this engine looks at other users who also enjoyed that work, then looks at all of the other works those users enjoyed, and uses that score to provide a set of recommendations.
  • Freeform. Given a specific work, this engine uses a neural network to transform it into a 200-dimensional embedding vector. Then, it looks for other works whose vector representations are closest to the specified vector.

Classic

The benefits of the classic model are clear:

  • Faster + lower cost to run. This can be implemented (mostly) as a SQL query which can be easily optimized with various indexes.
  • Consistent / reliable results. The recommendations only change if the underlying data changes a lot; furthermore, by definition, this will only give you "reasonable" recommendations.
  • Easy to debug. If you get some really bad recommendations, it's easy to dig into the data and figure out what went wrong. And usually, the answer to that is that the system simply didn't have enough data.

This classic model is also very similar to what many others have tried to do (i.e. when looking for similar systems, I found a desktop app + some Jupyter notebooks that do this exact thing). The drawbacks, however, are fairly significant:

  • Popularity bias. This approach would never recommend works that aren't already well-known. If someone writes an incredibly high quality work, I want to see it immediately, not after everyone else has already seen it.
  • Unknown works. If the user provides a work that we don't have much data on (i.e. it's a new work that no one has kudo'sd or bookmarked), then we can't generate any recommendations.
  • Cross-site recommendations. This approach cannot be extended to work across multiple fanfiction sites (i.e. AO3 and FF.net). Although we currently only support AO3, being able to start from an AO3 work and recommend works on FF.net would be really awesome, posing both a modeling challenge and an infrastructure challenge.

To mask some of these drawbacks, in the current system, when the classic model is unable to come up with any recommendations, we fall back to the freeform model.

Freeform

The freeform model is designed to overcome these limitations. Instead of relying on the user-work connections, we use a neural network that analyzes each work independently and generates a vector embedding. Then, to get recommendations, we can efficiently find the nearby vectors using a library such as Spotify's annoy.

There are three classes of features which are passed to the model:

  • Dense features. The dense features include things such as the number of chapters, number of words, and whether the work is completed.
  • Sparse features. The sparse features include things such as the fandom tags, relationship tags, and "additional" tags. Several of the tag types have very high cardinality - i.e. relationships - so we make use of the hashing trick [1].
  • Embedding features. The embedding features include things such as a document embedding of the "summary" of the work, generated via fasttext. These models are pre-trained and not optimized as part of our system.

The architecture of the freeform model is designed to combine these three feature types together into a single embedding vector:

TODO: Replace this picture of a sticky note on my desk with a nicer sketch.

This architecture is quite similar to those proposed in works such as [2] [3]. We currently train this model using a modified triplet loss [4] which aims to pull works in the same collection closer together while pushing randomly sampled works that don't belong to the collection further away.

Of course, this approach has drawbacks as well. On several instances - prior to building a robust validation system - I published a model that would spit out random garbage, and debugging it would require dumping the model parameters, inspecting the gradients, and manually checking the embedding vectors.

Furthermore, this is a lot more computationally expensive and greatly increases  both the latency and the cost of the servers needed to run AO3 Disco.

Future Work

First, I have a long list of planned improvements to the existing Android app, ranging from allowing users to export their recommendations to making it possible to filter on custom tags. In addition, I plan to make upgrades to the "freeform" model and improve the quality of recommendations overall.

TODO: Replace this picture of sticky notes on my wall with an actual plan.

After all of that though, here are the big new directions that I would like to explore:

  • iOS. Since most of the code can be re-used on iOS, I plan to try building an iOS version. However, I'm not sure I want to spend $99 getting an Apple developer license unless the Android version has already a significant number of users.
  • Web. Not a huge fan of this, to be honest, since it will require integrating a bunch of services and dealing with various security issues (i.e. need to handle authentication, server-side storage, etc.) while also having to rewrite a bunch of stuff from scratch. But it might happen.
  • FanFiction.net. Super excited about this, but poses significant technical and modeling challenges. Without tags, we'll have to get very good at natural language processing and uses techniques such as domain adaptation [5] so it can play nice with AO3.
  • Open source. At some point, I plan to refactor the code and open source most of it. However, everything is currently dumped in one giant mono-repo, and I may or may not have git-committed some important access keys and certificates because I was too lazy to configure environment variables ☺, so it will require some time and effort to untangle.

References

[1] https://medium.com/value-stream-design/introducing-one-of-the-best-hacks-in-machine-learning-the-hashing-trick-bf6a9c8af18f

[2] https://quoraengineering.quora.com/Unifying-dense-and-sparse-features-for-neural-networks

[3] https://arxiv.org/pdf/1906.00091.pdf

[4] https://towardsdatascience.com/triplet-loss-advanced-intro-49a07b7d8905

[5] https://paperswithcode.com/task/domain-adaptation