Design patterns are a way to document the experience and knowledge of experts into actionable advice that all practitioners can apply to solve their problems using battle-tested approaches. Just as we have traditional software design patterns, microservices design patterns, API design patterns, game development design patterns, among others; it makes sense to discover and document design patterns in machine learning.
The common challenges in machine learning are related to data quality, reproducibility, data drift, re-training, scale, and others. These challenges are specific enough to this discipline to justify the emergence of machine learning patterns. In this article we’ll explore what a design pattern is, why machine learning design patterns are required, and we’ll explore an example.
What are design patterns?
Christopher Alexander et al, created the idea of patterns and a catalog of patterns in the field of architecture. In his book, A Pattern Language1, he documented more than 200 architecture patterns.
A pattern describes a problem which appears repeatedly in the architecture discipline, then a solution to that problem is described in a way that the given solution could be used as a “recipe” repeatedly.
Every pattern has a name to save architects from having to continually explain the problem and accompanying solution details. Every solution is explained in a general and abstract way, so every architect can solve it in their own way, using their own skills, thus adapting the solution to the context and environment they are facing.
What are software design patterns?
A software design pattern is a general reusable solution to a commonly occurring problem within a given context. A design pattern isn't a final design that can be implemented directly into code. It is a description or template for how to solve a problem that can be used in many different situations.
Patterns are based on software development best practices and principles. They are in a sense a shared vocabulary that is used to communicate our intentions with other developers. As such, we are expressing more than a pattern name, we are implicitly talking about characteristics, qualities, constraints, potential pitfalls, and even potential implementation details.
When we use a pattern in a description, other developers know precisely what design we have in mind. Design patterns allow us to discuss a problem and its solution at an abstract level without worrying about the implementation details yet. A team knowledgeable in design patterns gets consensus quickly with less misunderstanding.
Design patterns is a compacted language, allowing us to express a lot with less. We therefore get experience reusability (design level) instead of code reusability (source code level).
Why do we need machine learning (ML) design patterns?
ML, like other computer science disciplines, started in an academic context where scalability, reliability, performance, and other software quality attributes were not the primary goal. Today, deploying machine learning models in production is considered an engineering discipline, so we must take advantage and apply to ML those software & data engineering best practices that have been applied in business problems.
It is important that ML practitioners take advantage of existing tried and proven software engineering methods to solve recurring problems and develop new ML specific design patterns.
Developing ML projects presents a set of unique challenges (like data quality, concept drift, reproducibility, bias, explainability, etc.) that influence the solution design. Documenting these problems, context and the solutions is a great way to transfer knowledge, communicate and democratize the machine learning discipline.
The book “Design Patterns: Elements of Reusable Object-Oriented Software”2 centered on explaining software design patterns and is considered a seminal book in our field. Most software design patterns are documented using the template explained in this book. Machine Learning patterns is still a field in development, there's still no universally accepted standard to document them yet. It’s highly likely to see some template proposals in the next few years.
An example ML pattern: Rebalancing
Sometimes it is easier to understand something using an example. Let’s delve into a common problem in Machine Learning: classes are imbalanced in classification or regression problems. This could impact the performance of the trained model.
It’s common to face Machine Learning problems (classification predictive modeling) where the dataset classes are imbalanced. This means the distribution of examples across the known classes is severely biased or skewed. Imbalanced datasets present a challenge because most of the machine learning algorithms used for classification are designed on the assumption of an equal number of examples for each class; therefore, the trained model will probably have poor performance for the minority classes. Additionally, the minority classes are usually more "important".
Machine Learning models have the best performance when they are trained using a similar number of examples for each class in a dataset. However, real world machine learning problems are rarely neatly balanced.
Let us consider examples related to credit card fraud detection or a model to predict the existence of melanoma given an image
The problem with imbalanced classes is "blindly" believing in accuracy values. If we train a melanoma prediction model and only 3% of our dataset contains melanoma images, probably the model will have around 97% accuracy no matter the machine learning algorithm chosen (neural networks, support vector machines, decision trees, etc.) or without any modifications to the dataset.
Although the 97% accuracy number is mathematically sound, probably the model is literally guessing the majority class (no melanoma) for each example. It means that even a model that always predicts the majority class will have good performance, this can also be accomplished without using Machine Learning at all.
Therefore, the model is not learning anything about how to predict the minority class (usually the "important ones"). In regression modeling, imbalanced datasets happen when the data has outliers that are either much lower or higher than the median in your dataset.
As we just discussed, accuracy is affected by imbalanced classes, so the first step is to choose the right metric to evaluate the model. We can apply techniques at the dataset or model level.
- At the data level we can apply downsampling or upsampling.
- At the model level we can convert our classification problem into a regression one.
We’ll leave the last one for the reader to investigate and we’ll provide a summary about upsampling, downsampling and select the appropriate metric in this article.
Choosing an evaluation metric
For imbalanced datasets one must prefer metrics like precision, recall, or F-measure to evaluate the model. Let’s analyze how those metrics are calculated to understand why they are better than accuracy for imbalanced datasets.
Condition positive (P): the number of real positive cases in the data.
Condition negative (N): the number of real negative cases in the data.
True positive (TP): a test result that correctly indicates the presence of a condition or characteristic.
True negative (TN): a test result that correctly indicates the absence of a condition or characteristic.
False positive (FP): a test result which wrongly indicates that a particular condition or attribute is present.
False negative (FN): a test result which wrongly indicates that a particular condition or attribute is absent.
Precision (or positive predictive value):
Recall (sensitivity or true positive rate):
To help our intuition, let’s assume we trained a fraud detection model with an imbalance dataset. We used a test dataset of 1000 samples to evaluate our model. The dataset contains 50 fraudulent transactions and 950 valid transactions, an obvious imbalance dataset. Our model produced the following confusion matrix (a way to easily visualize: TP, TN, FP, and FN):
Let’s calculate some metrics:
As you can appreciate, precision is 42.8%, recall is 30% and f-measure is 35.2%. All of them are better than accuracy (94.5%) in capturing the fact that the model cannot correctly identify fraudulent transactions. Accuracy is too high, optimistic, and clearly a misleading metric in this scenario. Therefore, metrics other than accuracy are recommended for imbalanced datasets.
If you check how accuracy is calculated, you’ll easily spot the problem. In other words, accuracy measures something like: “what percentage of true negatives (non-fraud) and true positives (fraud) the model correctly predicts”. Given almost all samples in the dataset are true negatives (non-fraud) it’s highly probable that any model trained with the imbalanced dataset will have a high accuracy.
Downsampling consists in decreasing the number of examples from the majority class used during model training. This could sound counterintuitive because we think a larger dataset is always better, however a large imbalanced dataset just makes things worse as discussed before. To apply downsampling you combine all samples from the minority class with a small random sample taken from the majority class, reshuffle the data and use this dataset for training. As you can guess this will create a more balanced dataset than the original, improving the model's performance.
Several Machine Learning frameworks and libraries allow us to explicitly tell our model that some specific labels are “more important” than others during training. Therefore, you just assign more weight to samples from the minority class. Setting the weight value is a model hyperparameters so it’s up to you to experiment to find the best values.
The idea is to over-represent the minority class by duplicating minority class examples and generating additional “synthetic” samples (there are algorithms designed to create synthetic examples). For example, by analyzing the feature space of minority class dataset it’s possible to generate similar examples within this feature space using a nearest neighbors approach.
Using upsampling, we merge all samples from the minority class, all synthetically created examples, a random subset taken from the majority class and then re-shuffle the newly created dataset for training.
Is this enough?
Is it enough just documenting the problem and the solution as we just did? The answer is: no.
Design pattern documentation should include at least the context in which the pattern is applied, the forces within the context that the pattern wants to resolve, and the proposed solution.
There is no single, standard for documenting design patterns, but they usually include the following:
- Pattern Name and Classification: A unique and descriptive name.
- Intent: The goal behind the pattern and the reason for using it.
- Motivation: A scenario explaining the problem and a context in which the pattern can be applied.
- Applicability: Situations in which it makes sense to use the pattern.
- Structure: A graphical representation of the pattern (diagram).
- Consequences: The results, side effects, and tradeoffs created by using the pattern.
- Sample Code: A code illustration of how the pattern can be used.
- Known Uses: Examples of real usages of the pattern.
Probably in the next few years one standard for documenting machine learning patterns will emerge and be preferred over others.
Design patterns encode the experience and knowledge of experts into advice that all practitioners can follow. Machine Learning design patterns capture best practices and solutions to commonly occurring problems in designing, building, training, and deploying machine learning systems. Machine Learning design patterns are a natural add-on to traditional software development design patterns, they extend the software engineering body of knowledge and help to avoid common pitfalls by using proven solutions.
1. A Pattern Language: Towns, Buildings, Construction by Christopher Alexander, Sara Ishikawa, and Murray Silverstein (1977)
2. Design Patterns: Elements of Reusable Object-Oriented Software by Erich Gamma, Richard Helm, Ralph Johnson & John Vlissides (1995)