How to Improve your Flashcard Knowledge Base

Even the best flashcard developers among us create bad cards on a regular basis (e.g. too long, ambiguous, useless information).

Given the reality that we all are highly imperfect at developing our flashcards, what should we do to improve these crummy cards as they age so you can spend less time reviewing and remember the concepts better?

I call this process flashcard “refactoring” (a term borrowed from software development).

Why refactor flashcards?

Reviewing old flashcards requires time and effort. Here are a few reasons why it’s worth the price:

  • It improves your understanding of the material. The process of breaking learning material down into the smallest “chunks” possible that fit onto flashcards is an extremely valuable exercise. Reviewing troublesome cards clarifies what you don’t understand and forces you to restructure your knowledge in a way that makes sense.
  • Your worst flashcards take up a disproportionate amount of time and effort, while yielding the worst results in terms of retention and usefulness. Following the 80-20 rule, 20% of your cards leads to 80% of the effort in review. So it’s a high value activity to hunt down this subset of your cards.
  • It provides knowledge construction training. Creating good flashcards is a nontrivial skill built over time. You can read Poitr Wozniak 20 rules for formulating knowledge, but actually observing your own performance on your cards and troubleshooting improvements takes your skills to the next level.

The process I use has two broad steps: selection and revision.

Selecting Problem Cards

I use two main methods to find cards needing review.

The first and most important method is finding cards I keep failing (“lapse” is the term in Anki). In the Anki browser, I can use the command prop:lapses>n to find the cards that have lapsed over n times. For me, cards are never lapse more than 8 times because Anki then marks it as a leech and automatically suspends it. Cards that have lapsed 5 or more times are great candidates for refactoring.

The other method is “marking” cards during review when I notice a card is poorly formed. I also try to make notes on marked cards describing what’s causing the problems (coming back to the cards at a later time, you can easily forget the specific issue that tripped you up).

Reviewing and Revising Problem Cards

The first step examining a difficult card is to ask whether I need this knowledge at all. If not, that’s the end of the process – I just delete the card and I’m done with it. I may also revisit the source material or do some Googling on the topic, which will sometimes reveal that the card is pointless or inaccurate.

If I decide that it’s important and relevant knowledge I want to keep, then I’ll examine the card for issues, using the principles from Poitr Wozniak Twenty Rules of Formulating Knowledge.

Example

Consider this data engineering card from my deck which was recently giving me problems:

  • Side 1: Tail latency amplification
  • Side 2: Even if small % backend calls slow, chance of getting a slow call increases if user request requires multiple backend calls, and so a higher proportion of end-user requests end up being slow.

First off, is this card relevant and worthwhile? For me, the answer is definitely yes: it’s both relevant to my job as a data scientist and my software engineering side projects.

Next, diagnose the problem. On closer examination, there are a few things wrong with the card:

In cases where I add material I don’t fully understand, I find the best approach is to go back to the source (which in this case is the book Designing Data-Intensive Applications by Martin Kleppmann). I then refactored the card like this:

  • Side 1: Tail latency amplification (Kleppmann)
  • Side 2: Multiple back-end calls for a single user request increases chance of encountering a tail latency. (Kleppmann)

As you can see, I added a source to clarify where the information came from (Rule 18: Provide source).

I was curious what other cards I have about tail latency, and it turns out there are none! Seems ridiculous to have a card about tail latency amplification, but not have a single one about tail latency which is a more common term. Not having this in my deck probably contributed to interference since I never tested myself on the distinction between the two concepts. So I added:

  • Side 1: Tail latency (Kleppmann)
  • Side 2: High percentile response time. (Kleppmann)

Note that the tail latency amplification card uses tail latency in its response. I’m hoping this will limit confusion between the two and emphasize the distinction (Rule 13: Refer to other memories). I also italicized amplification, to hopefully further avoid interference.

Since I’ve made these changes, I haven’t had any problems with these cards and feel like I have a better grasp on the material. Consider doing the same for the important knowledge in your decks causing you trouble.

Tips From Anki Flashcard Refactoring: Add Enough Knowledge to your Deck and Review your Sources

My flashcard refactoring for today is a reminder of the classic knowledge construction advice: do not add what you do not understand. It is also a reminder of the importance of providing enough related cards in your deck for a piece of knowledge.

Here’s the card I came across that was giving me trouble, related to SQL programming (double-sided):

  • Side 1: Oracle SQL syntax for creating object table
  • Side 2: CREATE TABLE (table name) OF (object type)

When revisiting this card, I realized that I didn’t have a good concept of what “object tables” are, so this is definitely a case of not understanding the material before committing it to spaced repetition.

But the thing is, I wouldn’t have added it if I didn’t have a good understanding of object tables, at the time of adding knowledge to my spaced repetition system. The problem is I forgot the concept of “object tables”, and seeing the answer to this card was not enough to bring it back. I didn’t have any other cards in my deck about “object tables” and how they differ from other related concepts in Oracle SQL such as nested tables.

In a situation like this, it helps to go back to the source, clarify any misunderstanding, and add new cards that solidify your knowledge.

So, in this case, I looked up Oracle documentation and found a great article almost immediately that clarified the meaning. It also provided a bunch of useful nomenclature for closely related concepts, providing further scaffolding for the knowledge. This lead me to add a bunch of cards:

  • Card 1 (Cloze): Objects can be stored in two types of tables: [object tables] and [relational tables].
  • Card 2 (Basic 1-sided Q&A):
    • Q: What’s the difference between object tables and relational tables? (Oracle SQL)
    • A: Object tables store only objects Relational tables store objects with other table data
  • Card 3 (Basic 1-sided Q&A):
    • Q: What does each row represent in an object table? (Oracle SQL)
    • A: An Object

So to recap, here the main lessons from this refactoring:

  1. Don’t add stuff to spaced repetition that you don’t understand
  2. Make sure you add enough knowledge about the concept in your deck, so there is sufficient context for you to understand again when you forget
  3. When dealing with 1 or 2, the solution is to go back to the original source to understand the knowledge and add more relevant material.

For access to my shared Anki deck and Roam Research notes knowledge base as well as regular updates on tips and ideas about spaced repetition and improving your learning productivity, join “Download Mark’s Brain”.

Tips from Flashcard Refactoring

Include your Sources, Have a Single Answer, and Break-Down Your Cards

Here’s a flashcard related to Oracle SQL that was giving me trouble (lapsed 8 times and was automatically marked as a leech):

  • Side 1: Collection (Oracle SQL)
  • Side 2: Data types in Oracle SQL that lets you internalize parent-child relationships between tables in the parent table.

This was a double-sided card, so both Side 1 and 2 serve as the question. Let’s see if we can improve this one.

First things first: do I need this card at all? Yes: SQL is highly relevant to my career in Data Science, and the organization I work for relies heavily on Oracle database. It’s important knowledge for me that I didn’t want to remove.

Next, figure out the issue with the card. Looking at the card statistics, it turns out I was always getting Side 2 wrong. After some consideration, I realized that this is actually a poor definition of a “Collection”. In fact, it’s not really the “definition” of a Collection, but a characteristic of a Collection. In other words, the flashcard doesn’t have a unique answer: it’s true that a Collection internalizes parent-child relationships, but it does a lot of other things too.

I consulted the original source of the material and there isn’t a clear definition of a Collection there. I did some Googling for other sources and apparently there isn’t really a great definition of an Oracle Collection. It turns out that Collection refers to a generic programming idea not specific to Oracle.

So, rather than trying to define Collection, I’ve opted to break the existing card down into multiple cards, following Rule Number 4 of Knowledge Construction: stick to the minimum information principle, which means if you can break a card into multiple simpler, easier-to-answer cards, do it.

Card 1 (one-sided):

  • Side 1: What Oracle SQL data type lets you internalize parent-child relationships in the parent table?
  • Side 2: Collection

Card 2 (one-sided):

  • Side 1: What kind of relationship does an Oracle SQL Collection help you represent?
  • Side 2: Parent-child (aka “one to many”)

Card 3 (one-sided):

  • Side 1: Does the Oracle SQL Collection data type internalize parent-child relationships in the parent table or child table?
  • Side 2: Parent table

I also tracked down a good definition of the generic “Collection” concept in Computer Science, and added it:

Card 4-5 (double-sided):

  • Side 1: Collection (Computer Science)
  • Side 2: Object that groups multiple items together as a single unit (Computer Science)

I feel confident these cards will be easier to remember, cost less time and frustration, and help me remember the concept much better.

Lessons learned:

  • Flashcards should have a single answer. Multiple correct answers for a card is a recipe for confusion and frustration. Interestingly, this isn’t included in Poitr Wozniak’s Twenty Rules for Formulating Knowledge, although you could interpret this as a form of interference (Rule #11)
  • Keep track of your source material when making cards. It makes it easy to look up more details when needed. 
  • Browse related sources through Google search if you’re unsure about what to do to an item. This will give you more context around the card to see whether the knowledge is even required at all. You may also come across a clarification or better formulation. In the example above, I discovered the generic concept of “Collection” in programming and realized that it was futile to try to include a definition specific to Oracle SQL.
  • Break cards down into a larger number of simpler cards. This is classic knowledge construction advice that is often not heeded, because it feels like more cards means more work. Counterintuitively, it is really a free lunch: you remember the concept better, you spend less time reviewing than you would have with the single complicated card, and reviews become much more enjoyable. 

Combating Knowledge Interference (Flashcard Refactoring)

I came across this computer networking Anki flashcard I’ve forgotten over 7 times:

(NW for Sysadmins: Ethernet) Address Resolution Protocol (ARP) – Maps Ethernet addresses to IPv4 addresses and back.

The card uses cloze deletions [] like this:

  • (NW for Sysadmins: Ethernet) [Address Resolution Protocol (ARP)] – [Maps Ethernet addresses to IPv4 addresses and back].
  • (NW for Sysadmins: Ethernet) Address Resolution Protocol (ARP) – Maps [Ethernet addresses] to [IPv4 addresses] and back.

Interestingly, I haven’t forgotten a single review for the right-hand side clozes. Turns out this was the one causing me trouble:

(NW for Sysadmins: Ethernet) […] – Maps Ethernet addresses to IPv4 addresses and back.

Why? The issue seems to be another card in my deck that is very similar, and I’m confusing the two. The other card quizzes a networking concept called “Neighbour Discovery (ND)”, which performs a similar function to ARP except it maps IPv6 addresses to Ethernet and back rather than IPv4 addresses. This is a good example of interference, which refers to the fact that learning similar things can make you confuse them (see Rule 11 of Poitr Wozniak’s classic article on the 20 rules of formulating knowledge).

So the solution I’m opting for is pretty simple, just add a hint:

  • (NW for Sysadmins: Ethernet) ARP (hint: not ND) – Maps Ethernet addresses to IPv4 addresses and back.

One other small improvement is adding another card for the acronym alone:

  • Front: (NW for Sysadmins: Ethernet) ARP (Unpack Acronym)
  • Back: Address Resolution Protocol

These interference issues are tricky because you can’t really anticipate them in advance. You have to discover them as you review your cards.

Another annoyance is I’m not 100% sure that interference was actually the problem. Ideally, I would have discovered this troublesome card during review, so I could know for sure why I’m failing.

So here are some lessons learned from this little exercise:

  • Use hints as an effective tool for reducing interference.
  • Keep an eye out for interference during review of your knowledge. As soon as you encounter it, note it. In the case of Anki, there is a “mark card” feature. I also recommend actually writing text within the card to remind yourself exactly how you failed the card when you fix it later. It would be nice to be able to see basic card statistics, like number of lapses, during review without having to go into card statistics. I inquired on reddit whether there was an addon for this and while there are some good options for desktop, it doesn’t seem like there’s anything that quite meets this need for mobile (where I do all of my reviews).
  • As you get better at knowledge building, interference will become your most common problem. As Poitr Wozniak says, interference is “probably the single greatest cause of forgetting in collections of an experienced user” of spaced repetition systems since it is hard (impossible?) to avoid even if you are really good. You typically discover it during knowledge review time, not knowledge construction time.

Flashcard Refactoring

I’ve started a weekly habit of flashcard review. I want to share with you my thought process for modifying my cards, because I think this will be valuable to help you improve your own knowledge construction skills.

I also want my flashcard development out in the open so you can call me out when I make mistakes and provide suggestions for further improvements. Please do reach out! I am by no means the ultimate expert in knowledge construction.

So, I will be doing a regular series I call “Flashcard Refactoring” (Refactoring comes from the programming term which basically means revising and improving your code).

To sniff out poor flashcards, I ran prop:lapses>7 in the Anki browser to get all the cards I’ve forgotten over 7 times. Here’s one I came across about a command in the Linux command line to a suspend a job:

  • Side 1: ^Z (Linux Command Line)
  • Side 2: Suspend a job running in the foreground (Linux Command Line)

The card is reversible, so are two cards in total: one with Side 1 as the question, and another with Side 2 as the question.

At first glance, it doesn’t look too bad. It’s fairly concise. But one quick and easy change is reduce words in Side 2, in accordance with the 12th rule of Formulating Knowledge (“Optimize Wording”):

  • Side 2: Suspend foreground job (Linux Command Line)

This is a nice little improvement, but why am I really forgetting this card? I think it’s because ^Z doesn’t really have any meaning – it seems arbitrary and it has no clear connection to suspending foreground jobs.

So, I’ll create a fake connection, i.e. a mnemonic.

The mnemonic that immediately came to mind was the fact that the beginning of “Suspend” kind of sounds like a “Z”, e.g. “Zuspend”. I think this is all that’s required for this to stick in my memory (but only time will tell).

When you come up with a mnemonic, it’s a good idea to create a separate card for it, so I added the following to my deck: 

  • Q: Mnemonic for remembering ^Z suspends foreground job in Linux Command Line.
  • A: Zuspend

Anki / Spaced Repetition Tip: Review your Weak Flashcards

I’ve been a long-time user of spaced repetition tools. I’ll never forget first hearing about SuperMemo from a close friend as I started my undergraduate degree in 2005. I was immediately sold on the value of spaced repetition, and I particularly liked the idea of computers automatically taking care of review scheduling for you. I started using SuperMemo as a central tool for studying, and saw my academic performance skyrocket.

Over the years, I’ve slowly improved my skill in designing flashcards. It is by no means a trivial skill: it took me years to get pretty good at it, and to this day I still often make flashcards that are complete failures.

I believe there will eventually be an open collaborative platform for flashcard development and sharing, where experts can contribute and refine perfectly crafted cards. Users contribute their deck statistics, revealing poorly formed cards and contributing to our understanding of optimal flashcards.

But until that day, it pays to develop your flashcard creation skills.

Flashcard quality is top of mind for me since I’ve revisited the classic article by Peter Wozniak (of SuperMemo fame), “Effective Learning: Twenty Rules of Formulating Knowledge)”. It is a must-read for anyone that creates flashcards for learning (i.e. almost everyone at some point in their life). I’ve published my summary notes on this article (aside: my notetaking tool of choice is Roam my notes are easy to copy-paste into your own Roam database if you happen to use it as well).

One great way to improve your flashcard development skills, while simultaneously improving the quality of your deck, is to review your old cards regularly. Review your top 10-20 most problematic cards weekly, and for each one you encounter, do one of the following things:

  • Revise: With the Twenty Rules of Formulating Knowledge by your side, refine your card or break it down into a larger number of small, easy to digest cards.
  • Suspend: If you don’t think you need to have a card in spaced repetition anymore, but don’t want to delete it entirely, suspending is a good option.
  • Delete: If you know the knowledge is completely useless to you, trash the card entirely.

But what cards should you review? If you’re like me, you have a pretty big collection, and it’s just not feasible to review all your cards every week to find the weak ones.

Anki makes it quite easy to find these problematic cards. Two main search commands in the Anki Browser are useful here:

  • tag:leech – this finds all of the “leeches” in your Anki deck, which are cards that you keep forgetting. By default, Anki tags your card as “leech” when you fail a card 8 times.
  • prop:lapses>n – this reveals all of the cards you have failed (“lapsed“) over n times. You can set n to whatever number you like. Start with high-n cards and work your way down.

In addition to using these search techniques, I try to make a habit of “marking” cards that are problematic or poorly formed in some way, during review. If it’s an easy correction (e.g. obvious suspension, or small text changes), I’ll make the change right away in the mobile app. Otherwise, I will simply mark the card and filter it out during weekly review to make improvements.

When you do revise your cards, I recommend “resetting” the card so it’s like a “do-over” – the card should be reviewed again as if you just created it. This serves two purposes: it ensures that the card will no longer show up in your “problem cards” lists when you do the above queries. It also provides you with more opportunities to review your new formulation of the knowledge.

Unfortunately, it seems the only way to do this in Anki is do create new card(s) with the information you want and delete the old one. There is an option for “rescheduling” the card, but this only restarts the review process and doesn’t delete your review history. As a result, the card will still appear as one of your problem cards if you do a query like prop:lapses>n. Luckily, it’s not much extra effort to do this.

I have to admit that I do not entirely practice what I preach here. Weekly review of my cards is something I haven’t fully incorporated yet, but I’m resolving to start doing it today. In the next weeks, I’m going to experiment with a Flashcard Refactoring series to illustrate the card refinement process. Stay tuned!