Tokenization, when applied to data security, is the process of substituting a sensitive data element with a non-sensitive equivalent, referred to as a token, that has no intrinsic or exploitable meaning or value. The token is a reference (i.e. identifier) that maps back to the sensitive data through a tokenization system. The mapping from original data to a token uses methods that render tokens infeasible to reverse in the absence of the tokenization system, for example using tokens created from random numbers.[1] A one-way cryptographic function is used to convert the original data into tokens, making it difficult to recreate the original data without obtaining entry to the tokenization system's resources.[2] To deliver such services, the system maintains a vault database of tokens that are connected to the corresponding sensitive data. Protecting the system vault is vital to the system, and improved processes must be put in place to offer database integrity and physical security.[3]
The tokenization system must be secured and validated using security best practices[4] applicable to sensitive data protection, secure storage, audit, authentication and authorization. The tokenization system provides data processing applications with the authority and interfaces to request tokens, or detokenize back to sensitive data. The security and risk reduction benefits of tokenization require that the tokenization system is logically isolated and segmented from data processing systems and applications that previously processed or stored sensitive data replaced by tokens. Only the tokenization system can tokenize data to create tokens, or detokenize back to redeem sensitive data under strict security controls. The token generation method must be proven to have the property that there is no feasible means through direct attack, cryptanalysis, side channel analysis, token mapping table exposure or brute force techniques to reverse tokens back to live data.
Replacing live data with tokens in systems is intended to minimize exposure of sensitive data to those applications, stores, people and processes, reducing risk of compromise or accidental exposure and unauthorized access to sensitive data. Applications can operate using tokens instead of live data, with the exception of a small number of trusted applications explicitly permitted to detokenize when strictly necessary for an approved business purpose. Tokenization systems may be operated in-house within a secure isolated segment of the data center, or as a service from a secure service provider.
Tokenization may be used to safeguard sensitive data involving, for example, bank accounts, financial statements, medical records, criminal records, driver's licenses, loan applications, stock trades, voter registrations, and other types of personally identifiable information (PII). Tokenization is often used in credit card processing. The PCI Council defines tokenization as "a process by which the primary account number (PAN) is replaced with a surrogate value called a token. A PAN may be linked to a reference number through the tokenization process. In this case, the merchant simply has to retain the token and a reliable third party controls the relationship and holds the PAN. The token may be created independently of the PAN, or the PAN can be used as part of the data input to the tokenization technique. The communication between the merchant and the third-party supplier must be secure to prevent an attacker from intercepting to gain the PAN and the token.[5]
De-tokenization[6] is the reverse process of redeeming a token for its associated PAN value. The security of an individual token relies predominantly on the infeasibility of determining the original PAN knowing only the surrogate value".[7] The choice of tokenization as an alternative to other techniques such as encryption will depend on varying regulatory requirements, interpretation, and acceptance by respective auditing or assessment entities. This is in addition to any technical, architectural or operational constraint that tokenization imposes in practical use.
The concept of tokenization, as adopted by the industry today, has existed since the first currency systems emerged centuries ago as a means to reduce risk in handling high value financial instruments by replacing them with surrogate equivalents.[8] [9] [10] In the physical world, coin tokens have a long history of use replacing the financial instrument of minted coins and banknotes. In more recent history, subway tokens and casino chips found adoption for their respective systems to replace physical currency and cash handling risks such as theft. Exonumia, and scrip are terms synonymous with such tokens.
In the digital world, similar substitution techniques have been used since the 1970s as a means to isolate real data elements from exposure to other data systems. In databases for example, surrogate key values have been used since 1976 to isolate data associated with the internal mechanisms of databases and their external equivalents for a variety of uses in data processing.[11] [12] More recently, these concepts have been extended to consider this isolation tactic to provide a security mechanism for the purposes of data protection.
In the payment card industry, tokenization is one means of protecting sensitive cardholder data in order to comply with industry standards and government regulations.[13]
Tokenization was applied to payment card data by Shift4 Corporation[14] and released to the public during an industry Security Summit in Las Vegas, Nevada in 2005.[15] The technology is meant to prevent the theft of the credit card information in storage. Shift4 defines tokenization as: “The concept of using a non-decryptable piece of data to represent, by reference, sensitive or secret data. In payment card industry (PCI) context, tokens are used to reference cardholder data that is managed in a tokenization system, application or off-site secure facility.”[16]
To protect data over its full lifecycle, tokenization is often combined with end-to-end encryption to secure data in transit to the tokenization system or service, with a token replacing the original data on return. For example, to avoid the risks of malware stealing data from low-trust systems such as point of sale (POS) systems, as in the Target breach of 2013, cardholder data encryption must take place prior to card data entering the POS and not after. Encryption takes place within the confines of a security hardened and validated card reading device and data remains encrypted until received by the processing host, an approach pioneered by Heartland Payment Systems[17] as a means to secure payment data from advanced threats, now widely adopted by industry payment processing companies and technology companies.[18] The PCI Council has also specified end-to-end encryption (certified point-to-point encryption—P2PE) for various service implementations in various PCI Council Point-to-point Encryption documents.
The process of tokenization consists of the following steps:
Tokenization systems share several components according to established standards.
Tokenization and “classic” encryption effectively protect data if implemented properly, and a computer security system may use both. While similar in certain regards, tokenization and classic encryption differ in a few key aspects. Both are cryptographic data security methods and they essentially have the same function, however they do so with differing processes and have different effects on the data they are protecting.
Tokenization is a non-mathematical approach that replaces sensitive data with non-sensitive substitutes without altering the type or length of data. This is an important distinction from encryption because changes in data length and type can render information unreadable in intermediate systems such as databases. Tokenized data can still be processed by legacy systems which makes tokenization more flexible than classic encryption.
In many situations, the encryption process is a constant consumer of processing power, hence such a system needs significant expenditures in specialized hardware and software.
Another difference is that tokens require significantly less computational resources to process. With tokenization, specific data is kept fully or partially visible for processing and analytics while sensitive information is kept hidden. This allows tokenized data to be processed more quickly and reduces the strain on system resources. This can be a key advantage in systems that rely on high performance.
In comparison to encryption, tokenization technologies reduce time, expense, and administrative effort while enabling teamwork and communication.
There are many ways that tokens can be classified however there is currently no unified classification. Tokens can be: single or multi-use, cryptographic or non-cryptographic, reversible or irreversible, authenticable or non-authenticable, and various combinations thereof.
In the context of payments, the difference between high and low value tokens plays a significant role.
HVTs serve as surrogates for actual PANs in payment transactions and are used as an instrument for completing a payment transaction. In order to function, they must look like actual PANs. Multiple HVTs can map back to a single PAN and a single physical credit card without the owner being aware of it. Additionally, HVTs can be limited to certain networks and/or merchants whereas PANs cannot.
HVTs can also be bound to specific devices so that anomalies between token use, physical devices, and geographic locations can be flagged as potentially fraudulent. HVT blocking enhances efficiency by reducing computational costs while maintaining accuracy and reducing record linkage as it reduces the number of records that are compared.[21]
LVTs also act as surrogates for actual PANs in payment transactions, however they serve a different purpose. LVTs cannot be used by themselves to complete a payment transaction. In order for an LVT to function, it must be possible to match it back to the actual PAN it represents, albeit only in a tightly controlled fashion. Using tokens to protect PANs becomes ineffectual if a tokenization system is breached, therefore securing the tokenization system itself is extremely important.
First generation tokenization systems use a database to map from live data to surrogate substitute tokens and back. This requires the storage, management, and continuous backup for every new transaction added to the token database to avoid data loss. Another problem is ensuring consistency across data centers, requiring continuous synchronization of token databases. Significant consistency, availability and performance trade-offs, per the CAP theorem, are unavoidable with this approach. This overhead adds complexity to real-time transaction processing to avoid data loss and to assure data integrity across data centers, and also limits scale. Storing all sensitive data in one service creates an attractive target for attack and compromise, and introduces privacy and legal risk in the aggregation of data Internet privacy, particularly in the EU.
Another limitation of tokenization technologies is measuring the level of security for a given solution through independent validation. With the lack of standards, the latter is critical to establish the strength of tokenization offered when tokens are used for regulatory compliance. The PCI Council recommends independent vetting and validation of any claims of security and compliance: "Merchants considering the use of tokenization should perform a thorough evaluation and risk analysis to identify and document the unique characteristics of their particular implementation, including all interactions with payment card data and the particular tokenization systems and processes"[22]
The method of generating tokens may also have limitations from a security perspective. With concerns about security and attacks to random number generators, which are a common choice for the generation of tokens and token mapping tables, scrutiny must be applied to ensure proven and validated methods are used versus arbitrary design.[23] [24] Random-number generators have limitations in terms of speed, entropy, seeding and bias, and security properties must be carefully analysed and measured to avoid predictability and compromise.
With tokenization's increasing adoption, new tokenization technology approaches have emerged to remove such operational risks and complexities and to enable increased scale suited to emerging big data use cases and high performance transaction processing, especially in financial services and banking.[25] In addition to conventional tokenization methods, Protegrity provides additional security through its so-called "obfuscation layer." This creates a barrier that prevents not only regular users from accessing information they wouldn't see but also privileged users who has access, such as database administrators.[26]
Stateless tokenization enables random mapping of live data elements to surrogate values without needing a database while retaining the isolation properties of tokenization.
November 2014, American Express released its token service which meets the EMV tokenization standard.[27] Other notable examples of Tokenization-based payment systems, according to the EMVCo standard, include Google Wallet, Apple Pay,[28] Samsung Pay, Microsoft Wallet, Fitbit Pay and Garmin Pay. Visa uses tokenization techniques to provide a secure online and mobile shopping.[29]
Using blockchain, as opposed to relying on trusted third parties, it is possible to run highly accessible, tamper-resistant databases for transactions.[30] [31] With help of blockchain, tokenization is the process of converting the value of a tangible or intangible asset into a token that can be exchanged on the network.
This enables the tokenization of conventional financial assets, for instance, by transforming rights into a digital token backed by the asset itself using blockchain technology.[32] Besides that, tokenization enables the simple and efficient compartmentalization and management of data across multiple users. Individual tokens created through tokenization can be used to split ownership and partially resell an asset.[33] [34] Consequently, only entities with the appropriate token can access the data.
Numerous blockchain companies support asset tokenization. In 2019, eToro acquired Firmo and renamed as eToroX. Through its Token Management Suite, which is backed by USD-pegged stablecoins, eToroX enables asset tokenization.[35] [36]
The tokenization of equity is facilitated by STOKR, a platform that links investors with small and medium-sized businesses. Tokens issued through the STOKR platform are legally recognized as transferable securities under European Union capital market regulations.[37]
Breakers enable tokenization of intellectual property, allowing content creators to issue their own digital tokens. Tokens can be distributed to a variety of project participants. Without intermediaries or governing body, content creators can integrate reward-sharing features into the token.
Building an alternate payments system requires a number of entities working together in order to deliver near field-communication (NFC) or other technology based payment services to the end users. One of the issues is the interoperability between the players and to resolve this issue the role of trusted service manager (TSM) is proposed to establish a technical link between mobile network operators (MNO) and providers of services, so that these entities can work together. Tokenization can play a role in mediating such services.
Tokenization as a security strategy lies in the ability to replace a real card number with a surrogate (target removal) and the subsequent limitations placed on the surrogate card number (risk reduction). If the surrogate value can be used in an unlimited fashion or even in a broadly applicable manner, the token value gains as much value as the real credit card number. In these cases, the token may be secured by a second dynamic token that is unique for each transaction and also associated to a specific payment card. Example of dynamic, transaction-specific tokens include cryptograms used in the EMV specification.
The Payment Card Industry Data Security Standard, an industry-wide set of guidelines that must be met by any organization that stores, processes, or transmits cardholder data, mandates that credit card data must be protected when stored.[38] Tokenization, as applied to payment card data, is often implemented to meet this mandate, replacing credit card and ACH numbers in some systems with a random value or string of characters.[39] Tokens can be formatted in a variety of ways.[40] Some token service providers or tokenization products generate the surrogate values in such a way as to match the format of the original sensitive data. In the case of payment card data, a token might be the same length as a Primary Account Number (bank card number) and contain elements of the original data such as the last four digits of the card number. When a payment card authorization request is made to verify the legitimacy of a transaction, a token might be returned to the merchant instead of the card number, along with the authorization code for the transaction. The token is stored in the receiving system while the actual cardholder data is mapped to the token in a secure tokenization system. Storage of tokens and payment card data must comply with current PCI standards, including the use of strong cryptography.[41]
Tokenization is currently in standards definition in ANSI X9 as X9.119 Part 2. X9 is responsible for the industry standards for financial cryptography and data protection including payment card PIN management, credit and debit card encryption and related technologies and processes. The PCI Council has also stated support for tokenization in reducing risk in data breaches, when combined with other technologies such as Point-to-Point Encryption (P2PE) and assessments of compliance to PCI DSS guidelines.[42] Visa Inc. released Visa Tokenization Best Practices[43] for tokenization uses in credit and debit card handling applications and services. In March 2014, EMVCo LLC released its first payment tokenization specification for EMV.[44] PCI DSS is the most frequently utilized standard for Tokenization systems used by payment industry players.
Tokenization can render it more difficult for attackers to gain access to sensitive data outside of the tokenization system or service. Implementation of tokenization may simplify the requirements of the PCI DSS, as systems that no longer store or process sensitive data may have a reduction of applicable controls required by the PCI DSS guidelines.
As a security best practice,[45] independent assessment and validation of any technologies used for data protection, including tokenization, must be in place to establish the security and strength of the method and implementation before any claims of privacy compliance, regulatory compliance, and data security can be made. This validation is particularly important in tokenization, as the tokens are shared externally in general use and thus exposed in high risk, low trust environments. The infeasibility of reversing a token or set of tokens to a live sensitive data must be established using industry accepted measurements and proofs by appropriate experts independent of the service or solution provider.
Not all organizational data can be tokenized, and needs to be examined and filtered.
When databases are utilized on a large scale, they expand exponentially, causing the search process to take longer, restricting system performance, and increasing backup processes. A database that links sensitive information to tokens is called a vault. With the addition of new data, the vault's maintenance workload increases significantly.
For ensuring database consistency, token databases need to be continuously synchronized.
Apart from that, secure communication channels must be built between sensitive data and the vault so that data is not compromised on the way to or from storage.