How to Improve your Flashcard Knowledge Base

Even the best flashcard developers among us create bad cards on a regular basis (e.g. too long, ambiguous, useless information).

Given the reality that we all are highly imperfect at developing our flashcards, what should we do to improve these crummy cards as they age so you can spend less time reviewing and remember the concepts better?

I call this process flashcard “refactoring” (a term borrowed from software development).

Why refactor flashcards?

Reviewing old flashcards requires time and effort. Here are a few reasons why it’s worth the price:

  • It improves your understanding of the material. The process of breaking learning material down into the smallest “chunks” possible that fit onto flashcards is an extremely valuable exercise. Reviewing troublesome cards clarifies what you don’t understand and forces you to restructure your knowledge in a way that makes sense.
  • Your worst flashcards take up a disproportionate amount of time and effort, while yielding the worst results in terms of retention and usefulness. Following the 80-20 rule, 20% of your cards leads to 80% of the effort in review. So it’s a high value activity to hunt down this subset of your cards.
  • It provides knowledge construction training. Creating good flashcards is a nontrivial skill built over time. You can read Poitr Wozniak 20 rules for formulating knowledge, but actually observing your own performance on your cards and troubleshooting improvements takes your skills to the next level.

The process I use has two broad steps: selection and revision.

Selecting Problem Cards

I use two main methods to find cards needing review.

The first and most important method is finding cards I keep failing (“lapse” is the term in Anki). In the Anki browser, I can use the command prop:lapses>n to find the cards that have lapsed over n times. For me, cards are never lapse more than 8 times because Anki then marks it as a leech and automatically suspends it. Cards that have lapsed 5 or more times are great candidates for refactoring.

The other method is “marking” cards during review when I notice a card is poorly formed. I also try to make notes on marked cards describing what’s causing the problems (coming back to the cards at a later time, you can easily forget the specific issue that tripped you up).

Reviewing and Revising Problem Cards

The first step examining a difficult card is to ask whether I need this knowledge at all. If not, that’s the end of the process – I just delete the card and I’m done with it. I may also revisit the source material or do some Googling on the topic, which will sometimes reveal that the card is pointless or inaccurate.

If I decide that it’s important and relevant knowledge I want to keep, then I’ll examine the card for issues, using the principles from Poitr Wozniak Twenty Rules of Formulating Knowledge.

Example

Consider this data engineering card from my deck which was recently giving me problems:

  • Side 1: Tail latency amplification
  • Side 2: Even if small % backend calls slow, chance of getting a slow call increases if user request requires multiple backend calls, and so a higher proportion of end-user requests end up being slow.

First off, is this card relevant and worthwhile? For me, the answer is definitely yes: it’s both relevant to my job as a data scientist and my software engineering side projects.

Next, diagnose the problem. On closer examination, there are a few things wrong with the card:

In cases where I add material I don’t fully understand, I find the best approach is to go back to the source (which in this case is the book Designing Data-Intensive Applications by Martin Kleppmann). I then refactored the card like this:

  • Side 1: Tail latency amplification (Kleppmann)
  • Side 2: Multiple back-end calls for a single user request increases chance of encountering a tail latency. (Kleppmann)

As you can see, I added a source to clarify where the information came from (Rule 18: Provide source).

I was curious what other cards I have about tail latency, and it turns out there are none! Seems ridiculous to have a card about tail latency amplification, but not have a single one about tail latency which is a more common term. Not having this in my deck probably contributed to interference since I never tested myself on the distinction between the two concepts. So I added:

  • Side 1: Tail latency (Kleppmann)
  • Side 2: High percentile response time. (Kleppmann)

Note that the tail latency amplification card uses tail latency in its response. I’m hoping this will limit confusion between the two and emphasize the distinction (Rule 13: Refer to other memories). I also italicized amplification, to hopefully further avoid interference.

Since I’ve made these changes, I haven’t had any problems with these cards and feel like I have a better grasp on the material. Consider doing the same for the important knowledge in your decks causing you trouble.

Tips From Anki Flashcard Refactoring: Add Enough Knowledge to your Deck and Review your Sources

My flashcard refactoring for today is a reminder of the classic knowledge construction advice: do not add what you do not understand. It is also a reminder of the importance of providing enough related cards in your deck for a piece of knowledge.

Here’s the card I came across that was giving me trouble, related to SQL programming (double-sided):

  • Side 1: Oracle SQL syntax for creating object table
  • Side 2: CREATE TABLE (table name) OF (object type)

When revisiting this card, I realized that I didn’t have a good concept of what “object tables” are, so this is definitely a case of not understanding the material before committing it to spaced repetition.

But the thing is, I wouldn’t have added it if I didn’t have a good understanding of object tables, at the time of adding knowledge to my spaced repetition system. The problem is I forgot the concept of “object tables”, and seeing the answer to this card was not enough to bring it back. I didn’t have any other cards in my deck about “object tables” and how they differ from other related concepts in Oracle SQL such as nested tables.

In a situation like this, it helps to go back to the source, clarify any misunderstanding, and add new cards that solidify your knowledge.

So, in this case, I looked up Oracle documentation and found a great article almost immediately that clarified the meaning. It also provided a bunch of useful nomenclature for closely related concepts, providing further scaffolding for the knowledge. This lead me to add a bunch of cards:

  • Card 1 (Cloze): Objects can be stored in two types of tables: [object tables] and [relational tables].
  • Card 2 (Basic 1-sided Q&A):
    • Q: What’s the difference between object tables and relational tables? (Oracle SQL)
    • A: Object tables store only objects Relational tables store objects with other table data
  • Card 3 (Basic 1-sided Q&A):
    • Q: What does each row represent in an object table? (Oracle SQL)
    • A: An Object

So to recap, here the main lessons from this refactoring:

  1. Don’t add stuff to spaced repetition that you don’t understand
  2. Make sure you add enough knowledge about the concept in your deck, so there is sufficient context for you to understand again when you forget
  3. When dealing with 1 or 2, the solution is to go back to the original source to understand the knowledge and add more relevant material.

For access to my shared Anki deck and Roam Research notes knowledge base as well as regular updates on tips and ideas about spaced repetition and improving your learning productivity, join “Download Mark’s Brain”.

Tips from Flashcard Refactoring

Include your Sources, Have a Single Answer, and Break-Down Your Cards

Here’s a flashcard related to Oracle SQL that was giving me trouble (lapsed 8 times and was automatically marked as a leech):

  • Side 1: Collection (Oracle SQL)
  • Side 2: Data types in Oracle SQL that lets you internalize parent-child relationships between tables in the parent table.

This was a double-sided card, so both Side 1 and 2 serve as the question. Let’s see if we can improve this one.

First things first: do I need this card at all? Yes: SQL is highly relevant to my career in Data Science, and the organization I work for relies heavily on Oracle database. It’s important knowledge for me that I didn’t want to remove.

Next, figure out the issue with the card. Looking at the card statistics, it turns out I was always getting Side 2 wrong. After some consideration, I realized that this is actually a poor definition of a “Collection”. In fact, it’s not really the “definition” of a Collection, but a characteristic of a Collection. In other words, the flashcard doesn’t have a unique answer: it’s true that a Collection internalizes parent-child relationships, but it does a lot of other things too.

I consulted the original source of the material and there isn’t a clear definition of a Collection there. I did some Googling for other sources and apparently there isn’t really a great definition of an Oracle Collection. It turns out that Collection refers to a generic programming idea not specific to Oracle.

So, rather than trying to define Collection, I’ve opted to break the existing card down into multiple cards, following Rule Number 4 of Knowledge Construction: stick to the minimum information principle, which means if you can break a card into multiple simpler, easier-to-answer cards, do it.

Card 1 (one-sided):

  • Side 1: What Oracle SQL data type lets you internalize parent-child relationships in the parent table?
  • Side 2: Collection

Card 2 (one-sided):

  • Side 1: What kind of relationship does an Oracle SQL Collection help you represent?
  • Side 2: Parent-child (aka “one to many”)

Card 3 (one-sided):

  • Side 1: Does the Oracle SQL Collection data type internalize parent-child relationships in the parent table or child table?
  • Side 2: Parent table

I also tracked down a good definition of the generic “Collection” concept in Computer Science, and added it:

Card 4-5 (double-sided):

  • Side 1: Collection (Computer Science)
  • Side 2: Object that groups multiple items together as a single unit (Computer Science)

I feel confident these cards will be easier to remember, cost less time and frustration, and help me remember the concept much better.

Lessons learned:

  • Flashcards should have a single answer. Multiple correct answers for a card is a recipe for confusion and frustration. Interestingly, this isn’t included in Poitr Wozniak’s Twenty Rules for Formulating Knowledge, although you could interpret this as a form of interference (Rule #11)
  • Keep track of your source material when making cards. It makes it easy to look up more details when needed. 
  • Browse related sources through Google search if you’re unsure about what to do to an item. This will give you more context around the card to see whether the knowledge is even required at all. You may also come across a clarification or better formulation. In the example above, I discovered the generic concept of “Collection” in programming and realized that it was futile to try to include a definition specific to Oracle SQL.
  • Break cards down into a larger number of simpler cards. This is classic knowledge construction advice that is often not heeded, because it feels like more cards means more work. Counterintuitively, it is really a free lunch: you remember the concept better, you spend less time reviewing than you would have with the single complicated card, and reviews become much more enjoyable. 

Combating Knowledge Interference (Flashcard Refactoring)

I came across this computer networking Anki flashcard I’ve forgotten over 7 times:

(NW for Sysadmins: Ethernet) Address Resolution Protocol (ARP) – Maps Ethernet addresses to IPv4 addresses and back.

The card uses cloze deletions [] like this:

  • (NW for Sysadmins: Ethernet) [Address Resolution Protocol (ARP)] – [Maps Ethernet addresses to IPv4 addresses and back].
  • (NW for Sysadmins: Ethernet) Address Resolution Protocol (ARP) – Maps [Ethernet addresses] to [IPv4 addresses] and back.

Interestingly, I haven’t forgotten a single review for the right-hand side clozes. Turns out this was the one causing me trouble:

(NW for Sysadmins: Ethernet) […] – Maps Ethernet addresses to IPv4 addresses and back.

Why? The issue seems to be another card in my deck that is very similar, and I’m confusing the two. The other card quizzes a networking concept called “Neighbour Discovery (ND)”, which performs a similar function to ARP except it maps IPv6 addresses to Ethernet and back rather than IPv4 addresses. This is a good example of interference, which refers to the fact that learning similar things can make you confuse them (see Rule 11 of Poitr Wozniak’s classic article on the 20 rules of formulating knowledge).

So the solution I’m opting for is pretty simple, just add a hint:

  • (NW for Sysadmins: Ethernet) ARP (hint: not ND) – Maps Ethernet addresses to IPv4 addresses and back.

One other small improvement is adding another card for the acronym alone:

  • Front: (NW for Sysadmins: Ethernet) ARP (Unpack Acronym)
  • Back: Address Resolution Protocol

These interference issues are tricky because you can’t really anticipate them in advance. You have to discover them as you review your cards.

Another annoyance is I’m not 100% sure that interference was actually the problem. Ideally, I would have discovered this troublesome card during review, so I could know for sure why I’m failing.

So here are some lessons learned from this little exercise:

  • Use hints as an effective tool for reducing interference.
  • Keep an eye out for interference during review of your knowledge. As soon as you encounter it, note it. In the case of Anki, there is a “mark card” feature. I also recommend actually writing text within the card to remind yourself exactly how you failed the card when you fix it later. It would be nice to be able to see basic card statistics, like number of lapses, during review without having to go into card statistics. I inquired on reddit whether there was an addon for this and while there are some good options for desktop, it doesn’t seem like there’s anything that quite meets this need for mobile (where I do all of my reviews).
  • As you get better at knowledge building, interference will become your most common problem. As Poitr Wozniak says, interference is “probably the single greatest cause of forgetting in collections of an experienced user” of spaced repetition systems since it is hard (impossible?) to avoid even if you are really good. You typically discover it during knowledge review time, not knowledge construction time.