In many circumstances, a business needs to provide data to viewers outside of an authorized group, such as user training, sales demonstrations, and software tests. While the data needs to be functional, it cannot be sensitive, proprietary or personally identifiable. So, how do businesses effectively change the data to be shared and stored safely? How can this be accomplished at scale? The answer lies in the technology practice of data masking. This guide explains the types, techniques, and best practices of data masking.
What is data masking?
Data masking is altering data so it can be used and shared without compromising sensitive information. Data masking aims to create a version that is indecipherable while maintaining the required data functionality.
As endpoints increase exponentially in our increasingly digitally-enabled world, data masking is a top priority for the following reasons:
- Data masking reduces data risks connected to cloud use.
- Data masking makes data useless to an attacker while also maintaining the inherent functional properties of the data.
- Data masking enables data sharing to authorized people like testers and developers without exposing proprietary information.
- Data masking is an effective way to sanitize data, an important alternative to deleting data. The standard process of deleting files still leaves data traces, but sanitization replaces old values with masked values so that the remaining data traces are unusable.
- Data masking helps organizations maintain their regulatory compliance and still use the data for purposes such as software development and testing.
Types of Data Masking
There are several different ways to alter data, including:
- Static Data Masking - Static data masking creates a sanitized copy by altering all the sensitive data until a copy can be safely shared. The process often involves creating a backup copy of a database, loading it into a separate environment, eliminating unnecessary data, masking it while it is in stasis, and then pushing the masked database to the target destination.
- Deterministic Data Masking - Deterministic data masking replaces one value with another throughout an entire data set. For example, "Peter Smith" is always replaced with "James McDonald." The process requires two sets of data with the same information type.
- On-the-Fly Data Masking - On-the-fly data masking alters data in small data subsets and transmits it as needed. The data is stored in the non-production system's development and testing environment. This type of data masking is helpful for companies that need to continuously stream data from production to multiple testing environments because the data is masked in manageable segments. However, on-the-fly data masking must be implemented at the beginning of a project to prevent compliance and security issues.
- Dynamic Data Masking - Dynamic data masking is similar to on-the-fly data masking except for one significant distinction - the data is not stored in a development and testing environment. Instead, it is streamed directly from the production system to another system for consumption.
Data Masking Techniques
Many techniques are available for masking data, including:
- Data Encryption - Data encryption masks data with an encryption algorithm. To unmask data, the viewer must have the decryption key. Data encryption is the most secure form of data masking and the most complex to implement. It requires technology to encrypt data and secure mechanisms to continually store and share the encryption keys.
- Homomorphic Encryption - Homomorphic encryption retains the same structure as the original data set, converting the data into ciphertext that allows it to be used as if it was not encrypted. The technique allows for complex mathematical operations to be performed with the data, as if it was not encrypted, without compromising integrity or security.
- Data Tokenization - In the data tokenization technique, data is replaced with fake but realistic alternative data values with no exploitable or inherent meaning.
- Data Shuffling - Data shuffling is similar to tokenization but with one significant difference. In data shuffling, the data values are switched within the data set, resulting in data that looks real but is not.
- Data Scrambling - The data scrambling technique involves randomly reorganizing characters in a word or number to mask the data. The process is relatively simple to implement, but it is not very secure. Additionally, it is only relevant for specific types of data. For example, scrambling an ID number from 568723 to 378526 is likely more functional than scrambling a price from $15,000 to $00,150.
- Nulling Out - A simple technique to mask data is redacting or nulling it. Removing the data will protect sensitive data but not provide a suitable replacement for practical use.
- Values Variance - In the values variance technique, data values are replaced by a function. For example, masked data can be replaced with a range of data points. This allows the data to remain useful and keeps private information secure.
- Pseudonymization - The technique of pseudonymization is defined by the General Data Protection Regulation (GDPR) as any data masking method that renders data unusable for personal identification. Personal identifiers like names must be removed, and multiple identifiers that together can point to an individual must also be removed. Furthermore, encryption keys and other securing mechanisms must be stored separately from the data for maximum security.
Data Masking Best Practices
Best practices must be followed regardless of the data masking type and technique. Here are several best practices to understand:
- Identify What Data is Sensitive - Not all data needs to be masked. It is best practice to determine what data needs to be secured, who can view it, and how it will be used.
- Choose the Right Technique(s) - Large businesses are not likely to use just one data masking technique because no one approach applies to all data types. Plus, sometimes companies create their own masking techniques.
- Ensure Referential Integrity - Even in varying the data masking techniques, some important types of consistency must be maintained. Referential integrity means that the same type of data uses the same masking technique and ensures that masked data can be used across departments.
- Secure the Algorithms, Data Sets, or Dictionaries - For the data to stay secure, the data masking tools must also remain protected. The best practice is typically to separate duties. For example, the IT department can choose the general type of data masking algorithm, and the data owners choose the algorithm settings and data lists.
Partner with an Expert - Encora has a long history of delivering exceptional software engineering & product engineering services across a range of tech-enabled industries. Our Identity and Access Management (IAM) and Cybersecurity services enable sensitive data protection through multiple means, such as encryption, data classification techniques, tokenization, and masking. We are deeply expert in the various disciplines, tools, and technologies that power the emerging economy, and this is one of the primary reasons that clients choose Encora over the many strategic alternatives that they have. To get help properly and securely masking data, contact Encora today.