Professional Documents
Culture Documents
A UID Numbering Scheme
A UID Numbering Scheme
sumer products such as groceries and cereal boxes. In the USA, bar code design follows a standard called the Universal Product Code (UPC), whereas in Europe bar codes follow another standard called the European Article Number (EAN). In this era of global retail chains and multinational manufacturers it was found rather inconvenient to have two separate systems designed to serve essentially the same function operating on different continents. There were also some structural shortcomings in how UPC codes were designed. These problems led retailers and manufacturers to change over to a new global standard for bar codes called the Global Trade Item Number (GTIN). The reasons that many of these numbering systems have fallen short also have some commonalities. We also note that as these two previous examples suggest, once a system is widely in use, changing it is difficult, costly and fraught with delays and confusion. Before outlining a specific proposal, we shall consider below various design tradeoffs we considered while designing a unique national ID system for India.
Privacy and security concerns are two other reasons, described later, for avoiding having a structure embedded in the UID string.
Longevity
Systems that are to be as widely used and for multiple different applications as UID tend to be very sticky in the sense that these systems would be in active use for centuries. Once a billion plus people have been assigned a UID, and applications using the UID to conduct their transactions are evolved, anything that requires modifications to existing software applications and databases will cost a lot. Y2K problem where one did not anticipate that software systems would last beyond the turn of the century is a good example of unwittingly designing systems that would become obsolete. For instance, if one were to use a structure in UID that encodes as a part the year of birth for the assigned individual, one needs to make sure that Y2K type of problem is avoided. One cannot always anticipate what future needs may evolve that requires format changes but one should assume that it is likely and thus making the format such that it remains compatible with any future format changes (embodied typically as version numbers).
Privacy Issues
There are a number of ID cards and identification numbers around the world. They range in lengths from 8 to 16 digits depending on the size of the country. Some encode gender. Many involve a physical card that has a significant amount of personal content. Some use machine-readable technologies that range from barcodes to smartcard/RFID. There has been a great deal written about the privacy implications of national ID cards in various countries. For example, consider the Electronic Privacy Information Centers (EPICs) analysis of the US Department of Homeland Security REAL ID project. 1 Many of the issues raised in the EPIC analysis pertain to public policy and to the uses of the ID, especially as relevant to specific US concerns. A policy analysis of the privacy implications of the UID is beyond the scope of this paper. However, the EPIC document raises technical issues as well. We address the relevant ones below, as well as point out the specifics of the UID that were designed to protect the residents privacy. 1. The UID program does not issue a smartcard or any type of card or mandate any machine-readable format such as a barcode or RFID. Even if a barcode were used on the UID letter delivered to the citizen, it would merely encode the UID itself, and not personal information pertaining to the citizen. For example, even the
1
http://epic.org/privacy/id-cards/
name of the card-holder would not be in the barcode. This would make it difficult for an unauthorized entity to glean any information about the resident. 2. The UID number has been carefully designed to not disclose personal information about the resident including - the regional, ethnic and age information. The US Social Security Number, for example, has enough of a pattern that an expert can guess a persons number from their birth-date and from the location at which it was issued. 2 It is also possible to guess the date and location at which the card was issued from the Social Security Number. The UID is a random number that makes guesswork virtually impossible. 3. The UID approach is designed on an on-line system data is stored centrally and authentication is done online. This is a forward-leaning approach that makes it possible to avoid the problems associated with many ID card schemes.
Alpha-Numeric Values
There have been national numbering systems that have used alpha-numeric values for UID string. That has a presumed advantage of being more secure because it will be harder to guess, and provides more available numbers in compact length. We did not conside this as being a viable option for us given the multi-lingual society that also has high levels of functional illiteracy.
Memorization of UID
This section is about how long the string length should be. In short, the string has to be as short as possible but that meets density requirement and does not include alphabet characters, just numbers. It is important to keep the UID simple and small to help residents to remember their number. Since it is likely that increasingly the UID will be used by several service providers (government agencies, private institutions and NGOs) it is important for a resident to be able to remembers it in the absence of a token such as a card. Firstly the use of the hindu-arabic numeral system(0,1,2,3,4,5,6,7,8,9) is suggested since these numerals are recognized/used by the largest subset of people in the country. Secondly we suggest the use of 12 digits (11 + 1 check sum) since 11 digits gives us a 100billion number space which in turn can provide a low density of used numbers. Thirdly the UID authority can send a letter to the resident with the assigned UID number and the details of the persons record as recorded in the UID database. A small perforated portion can be used as a tear-off portion that can be used at the time of authentication.
De-activation of UIDs
When a person dies, eventually one would see a need to de-activate the UID associated with the person. One simple way to deal with that is to flag UID record as inactive once one confirms the death. In a country of a billion people, updating UID records based on the death register is not easy, especially since a large number of cases of death are not reported and moreover the registration of births and deaths is maintained at local distributed levels across the country making it difficult to update them at a central UID system. One way to ensure that UIDs are not misused by others after a persons death is to inactivate the UID if it has not been used say in 10 years (timeout can be changed). Using the lack of activity as being an indicator of being deceased is also not without its pitfalls. In the case that a UID in inactivated of a person who has simply not authenticated himself/herself in a long time, s/he can simply activate their UID by a simple re-activation procedure that involves authentication.
count numbers etc. Thirdly there are identifiers meant for communication such as mobile numbers, landline phone, email addresses etc. While the above identifiers of an individual are relevant in specific sectors such as finance, health, communication etc, the UID is a pure identifier which is not tied to any particular sector or application and this abstract quality of the UID has distinct benefit in delivering cross sectoral benefits. There are distinct benefits in using a pure ID to stitch together transactions cutting across multiple sectors, some of the identifiers are also addressable in their local context and hence not global, the UID help make these identifiers globally addressable. The below example highlights the use of the UID for financial inclusion programs. Financial Inclusion Example: Lets take the example of NREGA and look at how direct benefits can be delivered to the poor, by directly depositing their NREGA earnings to a bank account. Currently several innovative pilots have been implemented which integrate the NREGA payments workflow to direct bank deposit and also a payment solution that include business correspondents as well POS devices and/or mobile technologies to deliver NREGA cash payments at the village level. These solutions require a custom integration of the banking, mobile and NREGA backend that can be expensive and time consuming. If there exists a backend financial inclusion infrastructure that is able to make direct deposit to a resident bank account at any bank using his/her UID through a national payments switch then it simplifies the way direct benefits is delivered by any service provider such as NREGA across the country. One of the key components of such a FI infrastructure is the ID mapper, that maps the key IDs of the resident such as UID, Bank Account and mobile number (optional) in order to make direct deposits easily. ID Mapper : The ID Mapper links the various identifiers of a given resident. For instance the UID, Bank Account and mobile number are 3 key identifiers that are useful for financial inclusion as shown in the diagram below. Now it is easy to target payments to a UID since the ID Mapper can resolve a UID to a specific Bank Account and infact a SMS can be sent to the persons mobile number which is also available in the ID Mapper. The key benefit of a central ID Mapper is the resident has full flexibility to change the bank where he/she banks and automatically the direct benefits of the govt. will target the latest bank account that the resident has updated on the ID Mapper, similarly the resident has full flexibility to choose the mobile provider and device. An advanced ID Mapper can also be policy driven for instance the resident can upload multiple bank accounts and set policies as to which bank account to use for what type of transaction. UID Numbering Scheme
The number should be extensible with backwards compatibility to have more decimal digits, for example, 20 digits, if a need should arise in future. This can be accomplished by reserving a value of 0 for first digit of the number There should be well known, reserved numbers for purposes outlined below. The number should have an error detecting (but not necessarily error correcting) check digit that detects as many data entry errors as possible. To make it harder to guess a correct valid ID assigned to anybody we propose a much larger space (80 billion IDs) than the maximum number of IDs needed at anytime (estimated to be about no more than 6 billion IDs). This notion is captured by the density, which we define as a ratio of total number of IDs assigned to the total number of IDs available. Our target is that by design we keep this ratio below 0.05 at all times. The sparseness of the numbers will make it difficult to simply guess numbers. The check digit is another form of sparseness but, being a publicly disclosed algorithm, will not prevent guessing. The check digit enables error detection at data entry. There may be a need to assign UIDs to entities such as schools, companies and other institutions to facilitate symmetry in transactions between residents and entities. For example, financial transactions for NREGA scheme may deposit NREGA wages direct from a Gram Panchayat to a particular resident. Such transactions could take the form: TransferMoney(GramPanchayat_UID, Resident_UID, Amount). While several other backend systems need to be in place for such transactions to take place, currently we are making a provision to assign up to ten billion entity UIDs. This requirement is met by reserving the value 1 for first digit (a1=1) to entity UIDs.
1. The Version Number: Some digits may be reserved for specific applications. This is an implicit form of a version number embedded into the numbering scheme. We recommend the following reservations: 0- numbers (a1 = 0) could be used as an escape for future extensions to the length of the number. For example, in future if we need 16 digit numbers, then we could say that 0 means that the number is 16-digits. As of now we can simply declare all 0numbers as TBD (to be decided). 1- numbers(a1 = 1) could be reserved for entities rather than individuals. Alternatively, 11- could be reserved for entities (or 111-) to match the size of the reserved space to the number of entities expected. We could use 2-9 numbers (a1 =2,39) right away to assign UIDs. That is 80 billion numbers -- plenty of space. 2. Number Generation: The numbers are generated in a random, non-repeating sequence. There are several approaches to doing this in the computer science literature. The algorithm and anyseed chosen to generate IDs should not be made public and should be considered a national secret. 3. Lifetime: Individual UID is assigned once, at inception, and remain the same for the lifetime of the person, and for a specified number of years beyond. At this point there is no consideration of reusing numbers. 4. Entity IDs: We expect that entity ID numbers (1- numbers) will have different rules for periods of validity and retirement. 5. The Checksum: There are several schemes possible. We recommend the Verhoeff scheme. More on this in the section titled Checksum.
The Checksum
The point of the Checksum is to eliminate data-entry errors. There have been a number of studies of data-entry errors.
For example, the tables above show errors as measured by Verhoeff, and reproduced in Kirtland. 3 The check digit must certainly guard against the top two errors first: single digit errors and adjacent digit transposition, while also detecting other errors with high probability. Many check digit approaches are in use in such identifiers as the EAN barcode, the ISBN number, the serial numbers on currency notes, check numbers and so on. Surprisingly, many of these schemes do not detect single-digit and transposition errors. Since we expect nearly a billion IDs to be issued by UID-AI, we must ensure that the check digit scheme we select will absolutely minimize data-entry errors; this will reduce server load, customer service load, and overall aggravation. There is one scheme that meets our requirements: the Verhoeff Scheme. This scheme is relatively complex, and in the days before ubiquitous computing, there was a tendency to avoid it in favor of simpler schemes. In this day and age however, and at the scale of the UID, precision must be the goal.
Kirtland, Joseph. Identification Numbers and Check Digit Schemes. The Mathematical Association of America, 2001.
The Verhoeff scheme catches all single errors and all adjacent transpositions. It also catches >95% of twin errors and >94% of jump transpositions. 4
Figure 1: The Dihedral Group Operator from (2). Red digits are not commutative.
We will not describe the Verhoeff scheme in detail here, but the basic idea is as follows. First, instead of using standard addition modulo 10 or 11, as is common in other check digit schemes, the Verhoeff Scheme uses the group operator shown in the table above. It can be seen that this operator, indicated by a #, is generally non-commutative. This helps capture transposition errors. The table derives from a group called the dihedral group D5, which are symmetries of the pentagon. Second, Verhoeff defines a permutation such as (0)(14)(23)(56789). Other permutations can also be used. The check digit polynomial works by applying the nth power of permutation to the nth element in the ID, and composing them together using the operator # defined above. The scheme can be implemented quite easily by an experienced programmer. A freeware version for PCs, cell phones and the web can easily be obtained. It is important to emphasize that the checksum scheme is not intended to be a secret. On the contrary, it needs to be published widely so that and users of the UID system are able to detect keying and transmission errors.
http://www.cs.utsa.edu/~wagner/laws/verhoeff.html
Other Issues
We have certain recommendations and comments: We recommend that UID-AI set up a number issuing system: when the registrar needs to issue a number, they submit the enrollment information and requests a number from the central server. The server generates the ID number returns it to the registrar. We suggest creating a dummy UID which is never issued to anyone. This number can be displayed in public but will never violate anyones privacy. We recommend the creation of legislation around the display and publication of individuals UIDs. We recommend penal provisions to deter individuals from committing identity fraud such as trying to obtain multiple UIDs