MaiMemo Memory Algorithm Experiment 2025 - Preliminary Report
Abstract
This document presents the "MaiMemo Memory Algorithm Experiment 2025 - Preliminary Report," detailing the initial outcomes of a large-scale A/B test conducted by the MaiMemo team to evaluate the practical effectiveness of spaced repetition algorithms. The core experiment (Exp.01) compared the performance of MMX-5 (a variant of FSRS-3) and MMX-6 (a variant of FSRS-6) among over 8,900 newly registered users on the MaiMemo flashcard (墨墨记忆卡) platform.
The study revealed significant discrepancies between theoretical benchmarks and actual user behavior. Although MMX-6 demonstrated superior performance in offline machine learning metrics (achieving lower RMSE/LogLoss and higher AUC), it underperformed MMX-5 in key operational and learning metrics during real-world application. Specifically, the MMX-6 group exhibited lower user retention rates (including daily task completion rates and learning engagement) and failed to demonstrate improvements in learning efficiency.
The report highlights a critical "gap" between laboratory research and real-world application. These findings underscore the necessity of establishing online A/B testing infrastructure to balance memory prediction accuracy with user motivation and long-term learning sustainability. Subsequent experimental phases will explore personalized parameter training and other algorithmic variables.
Key Keywords: Spaced Repetition, FSRS Algorithm, A/B Testing, Memory Prediction, User Retention, Learning Efficiency, MaiMemo.
1 Introduction
1.1 Background
Summary: This is the "MaiMemo Memory Algorithm Experiment 2025 - Preliminary Report" by the MaiMemo team. The report documents the team's A/B user experiments on memory algorithms, covering topics such as: the impact of algorithm upgrades, the effects of different algorithm frameworks, the influence of default parameters versus periodically personalized training parameters, the impact of adding random variations, the effects of displaying the next scheduled interval, and disallowing review completion with vague feedback. The aim is to bridge a critical gap in memory algorithm research, facilitating its transition from the "laboratory stage" to achieving "real-world impact."
Purpose of this Publication: Primarily, we find this research inherently interesting and hope to discuss and exchange ideas with others who share this interest. MaiMemo has been dedicated to memory research for over a decade. During this time, we have not only gained many significant and even counter-intuitive insights but have also encountered our share of challenges. By sharing this journey, we hope to attract more attention to and foster a deeper understanding of memory research, thereby contributing to the long-term advancement of the field.
Related Background: The relevant research is based on our two core products: MaiMemo Vocabulary (墨墨背单词, launched in 2014) and MaiMemo Flashcard (墨墨记忆卡, launched in 2021). As of October 2025, the platform has accumulated 133.8 billion data points on user memory behavior from over 40 million registered users. Academically, we have successfully published two research papers in ACM SIGKDD 2022 and IEEE TKDE 2023. Furthermore, to promote the development of the field, we open-sourced a dataset of 220 million records in 2022.
For other types of inquiries or collaboration, please feel free to contact us via email at datascience@maimemo.com.
1.2 Experiment Overview
- Exp.01(2025.09), Exp.04(2025.10), Exp.05(2025.10), Exp.06(2025.10) have been launched.
- Exp.02, Exp.03 are pending launch.
1.3 Reporting Schedule
- MaiMemo Memory Algorithm Experiment 2025 - Preliminary Report, 2025.11
- Explain the overall experimental plan, focus on analyzing the preliminary findings of Exp.01.
- MaiMemo Memory Algorithm Experiment 2025 - Mid-term Report, 2026.05
- Provide an update on the status of Exp.04, 05, and 06; complete the reports for Exp.01, 04, 05, and 06; analyze the preliminary findings of Exp.02.
- MaiMemo Memory Algorithm Experiment 2025 - Final Report, 2026.11
- Complete all experiments (01, 02, 03, 04, 05, 06) and finalize the report.
2 Exp.01 Status
2.1 Experimental Design
A portion of newly registered users will be divided into two groups for an initial AA test to confirm there are no differences between the groups.
Afterward, an A/B test will commence: Group A will use the MMX-5 algorithm (a variant of FSRS-3), while Group B will use the MMX-6 algorithm (a variant of FSRS-6).
Both groups will operate with default parameters, without personalized parameter optimization for users, to observe the impact on overall business performance and learning outcomes.
The platform for this experiment is MaiMemo Flashcards (墨墨记忆卡). Due to the unique nature of the memorization materials in MaiMemo Vocabulary (墨墨背单词), its algorithm is more specialized and thus not discussed in this experiment.
2.2 Algorithm Description
2.2.1 MMX-5
MMX-5 is the internal version of FSRS-3.
- Default Parameters
[-0.6051, 1.2609, 1.0101, -0.9817, -2.181, 2.5985, -0.7287, -0.0232, 1.2021, 0.3485, 0.7679, 0.8443]
- Formulas
w_irepresents the i-th parameter,user_params[i]. This version uses 12 parameters. The memory state is represented by Stability (S) and Difficulty (D).Rating System: MMX-5 uses a unique rating mapping system. The user's choice (1-4) is mapped to an internal grade level (G) for calculations.
- 1: Familiar → G=3
- 2: Vague → G=2
- 3: Forgot → G=1
- 4: Well Familiar → G=4
- Initial stability after the first rating:
- where is the initial stability when the first rating is
Forgot. When the first rating isWell Familiar, the initial stability is .
- where is the initial stability when the first rating is
- Initial difficulty after the first rating:
- where is the initial difficulty when the first rating is
Familiar.
- where is the initial difficulty when the first rating is
- New difficulty after a review:
- The new difficulty is adjusted from the current difficulty D based on the grade G. When the rating is
Familiar(G=3), the difficulty does not change. The value is clamped within the interval [1, 10]. Unlike FSRS, this formula does not include a "mean reversion" mechanism.
- The new difficulty is adjusted from the current difficulty D based on the grade G. When the rating is
- Retrievability after t days since the last review:
- where, when t=S, R(t,S)=0.9.
- The next interval can be calculated by solving for t in the above equation, where R is replaced by the Target Retention. MMX-5 uses a fixed Target Retention of 0.85:
- The final calculated interval value is rounded to the nearest integer and clamped within the range of [1, 36500] days.
- New stability after a successful recall:
This formula is used for ratings of
Familiar(G=3) orWell Familiar(G=4).- Similar to FSRS, the growth in stability () is influenced by the following factors:
- Difficulty (D): The larger the value of D (with ), the smaller the value of SInc. This means that the memory stability of difficult material grows more slowly.
- Stability (S): The larger the value of S (with ), the smaller the value of SInc. This indicates that the more stable a memory is, the harder it is to make it even more stable.
- Retrievability (R): The smaller the value of R (i.e., the longer the review delay), the larger the value of SInc. This reflects the spacing effect.
- If the review is successful, the value of SInc is always greater than or equal to 1.
- Similar to FSRS, the growth in stability () is influenced by the following factors:
- Stability after forgetting (i.e., stability after an incorrect answer):
This formula is used for ratings of
Forgot(G=1) orVague(G=2).- This formula is used to calculate the new stability after a user fails to recall an item. It is worth noting that, unlike FSRS, this version's post-forgetting stability calculation does not depend on difficulty (D).
2.2.2 Comparison between FSRS-3 and MMX-5
2.2.3 Comparison between FSRS-6 and MMX-6
- Symbol Definitions
- : Retrievability (probability of recall)
- : Stability (the interval in days when the probability of recall, is 90%)
- : New stability after a successful recall
- : New stability after forgetting
- : Difficulty ()
- : Grade (the rating in Anki):
- :
againcorresponds to "Forgot" in Maimemo. - :
hardcorresponds to "Vague" in Maimemo. - :
goodcorresponds to "Familiar" in Maimemo. - :
easycorresponds to "Well Familiar" in Maimemo.
- :
- Default Parameters
[0.3265, 1.21955, 2.4329, 8.2956, 6.41275, 0.834, 3.0125, 0.0314, 1.89125, 0.2144, 0.8208, 1.56435, 0.0409, 0.3591, 1.74945, 0.69375, 1.8729, 0.5425, 0.0912, 0.0658, 0.1]
- Changes Relative to FSRS-6
- MMX-6 is based on FSRS-6, with specific optimizations for the memory model. This version uses the same 21 parameters as FSRS-6 but modifies the formulas and updates the default parameters.
- MMX-6 does not consider feedback during the short-term learning phase.
- Success Threshold Modification
- In MMX-6, the success threshold is changed from to This means:
- Success (Recall): (
good,easy) - Failure (Forgetting): (
again,hard)
- Success (Recall): (
- This change affects the branching logic in the stability calculation.
- In MMX-6, the success threshold is changed from to This means:
- Removal of the "Hard" Penalty in Successful Reviews
- The stability after a successful review in FSRS-6 includes a "Hard" penalty:
- In MMX-6, the "Hard" penalty ( when ) is removed, and the formula becomes:
- The stability after a successful review in FSRS-6 includes a "Hard" penalty:
- This change means that "Vague" feedback ( ) is now treated as a failure and handled by the forgetting formula.
- Addition of an "Forgot" Penalty in Failed Reviews
- The stability after forgetting in FSRS-6 is:
- In MMX-6, a "Forgot" penalty is added when
- The stability after forgetting in FSRS-6 is:
- This means that "Forgot" feedback () receives an additional penalty compared to "Vague" feedback (), resulting in a lower post-failure stability.
- Updated Default Parameters
- MMX-6 uses different optimized default parameters from FSRS-6, specifically:
- Lower decay parameter: (compared to 0.1542 in FSRS-6)
- Adjusted penalty/reward parameters: ,
- Modified initial stability parameters: , ,
- MMX-6 uses different optimized default parameters from FSRS-6, specifically:
- Practical Implications
- "Vague" Feedback as Failure: "Vague" feedback is now incorporated into the forgetting model instead of being treated as a penalized successful recall.
- Enhanced Differentiation: The model better distinguishes between "Forgot" (complete failure) and "Vague" (partial failure) through the "Forgot" penalty.
- Improved Calibration: The updated parameters and modified penalty system provide better retention rate predictions for the specific use cases MMX-6 is optimized for.
- Consistent Forgetting Curve: All other formulas (initial difficulty, difficulty update, mean reversion, etc.) remain consistent with FSRS-6.
- Interval Preview
- Retention rate set to 0.85
2.2.4 Comparison between MMX-5 and MMX-6
- Model Structure
- The number of parameters was increased from 12 to 21.
- The exponential forgetting curve in MMX-5 has been replaced with a power-law forgetting curve.
- Initialization and Difficulty Update
- Initial stability is now read directly from parameters instead of being an exponential mapping: MMX-5 used , whereas MMX-6 uses , while retaining item-wise customization for grades 1–4.
- Initial difficulty has changed from a linear decrease to an exponential decrease .
- Difficulty update is no longer a simple translation , but now incorporates linear damping and mean reversion: where is the initial difficulty for a grade of 1.
- Stability Update Formula
- The recall branch has changed from a multiplicative amplification dependent on to an inverse adjustment based on difficulty and stability:
- MMX-6 removes the previous "difficulty penalty" but retains a reward coefficient, for "Well Familiar" feedback; MMX-5 had no similar separate bonus for "Well Familiar."
- The forgetting branch has been changed from a power function solely dependent on stability, , to one that simultaneously considers difficulty, stability, and an "Forget" penalty:
- The rating threshold remains G>2 to be considered a successful recall. However, because "Vague" feedback is now incorporated into the forgetting branch and "Forget" feedback is penalized separately, the scope of influence between success and failure is more distinct than in MMX-5.
2.3.1 Evaluation Results
- Evaluate the error between the model's predicted recall probability and the user's actual recall results.
- Smaller RMSE, Log Loss, and ICI values indicate lower prediction errors.
- A larger AUC value indicates more accurate ranking of recall probabilities.
Based on the evaluation from the SRS Benchmark, we can observe that mmx-6-default performs best in machine learning metrics, while mmx-5-default and fsrs-6-default also show strong performance.
2.4 User Experiment(A/B Testing)
We launched this experiment and are now conducting statistical analysis on the relevant data. The data can be found in the appendix.
Group A had 4,570 registered users, while Group B had 4,412 registered users.
The data covers business metrics (daily task completion, paid purchases) and learning metrics (learning feedback records) from Day 00 to Day 30 after user registration.
2.4.1 User Retention Rate - Completed
For business metrics in software, user retention rate is a very important evaluation indicator.
Here, we have calculated the daily user retention of two groups of users who completed their learning tasks.
- Relevant data visualization
- By observing the above data visualization, we can see that B.MMX-6-default (FSRS-6) maintained decent retention before Day06 of registration, but its user retention rate performed poorly after Day06.
- By Day30, the retention rate had already dropped by approximately 10% relatively.
- Relevant data details
- Currently, it appears that B.MMX-6-default (FSRS-6) has a somewhat negative trend in tser retention rate - completed. However, the existing data is still insufficient.
- We conducted a statistical test on the Day30 data, resulting in a P-value of 0.11. Since a value of 0.05 is generally required to indicate significance, it would be worthwhile to increase the number of participants in the experiment and extend the observation period.
- At the same time, there's a peculiar phenomenon that needs to be addressed: the user retention rate for the A.MMX-5-default (FSRS-3) group actually increased after Day 18 instead of declining, which is counterintuitive.
- Additional context: some of the analyzed users registered between October 1, 2025, and October 8, 2025—China's National Day holiday—which might have impacted their learning patterns.