DRAFT DO NOT DISTRIBUTE
A Digital Library
Authentication and Authorization Architecture
November 2, 1999
Introduction
This document describes an architecture, protocol and operational model for using X.509 digital certificates for authentication and a directory service to serve user attributes to determine the level of authorized access to licensed online materials. The model was developed by the participants at a Digital Library Federation sponsored meeting January 19-20, 1999 in Oakland, CA. The meeting participants included:
Joan Gargano, University of California Office of the President
Ariel Glenn, Columbia University
Rebecca Graham, Digital Library Federation
Sal Gurnani, University of California Office of the President
Leah Houser, OCLC
David Millman, Columbia University
Spencer Thomas, JSTOR
Vance Vaughn, University of California Office of the President
This document specifically does not address any issues involved in generating and distributing certificates by the institution or the components of related policy.
The development of the architecture was driven by the following functional requirements.
Privacy considerations
The individual requesting a service must be able to choose whether to use a persistent identity or a anonymous identity for a given transaction.
The institution should not have to reveal information that can be used to identify a particular individual, in order to allow that individual access to a licensed resource.
The information contained in the certificate payload should be minimized to ensure that only information strictly necessary to determine the institutional affiliation of an individual and to locate externally stored access control information.
Localization of information
The amount of institution-specific information that the publisher must keep should be minimized, as well as the amount of publisher-specific information that the institution must keep.
The content provider must be provided with enough information to deliver a baseline level of service in a degraded condition where access control information is not available.
Only the licensing institution (university, college, campus, etc.) can determine the eligibility of each of its members to use each licensed publication and the assignment of access control attributes based upon the license terms.
Each eligible member will be assigned to (at least) one "class of service," e.g. access level. The available classes are negotiated as part of the license agreement. For some publishers, there may be only a single class of service, for others there may be several.
Only the publisher can determine the precise set of access permissions corresponding to a particular class of service, as specified in the license agreement with a particular institution.
The system must allow for temporal change. E.g., a person's status at an institution may change before the expiration date of their digital certificate from that institution.
Assumptions include the following:
Each institution has its own certificate authority (CA). Thus, the information contained within the certificate is sufficient to identify the institution. Clearly, the architecture will need to the case in which the institution is not also the CA, possibly by requiring that the institution be identified in a designated field within the certificate.
The full authentication and authorization process is performed infrequently (e.g., once per "session") so that directory load can be minimized.
Key Design Decision: Separation of Authentication and Authorization
The criteria of localization of information, accommodation of temporal change, and privacy considerations led to the conclusion that authorization information cannot be explicitly included in the certificate payload. Thus, the institution must have a directory or attribute server which, given some information from the certificate, can determine eligibility for the service. To simplify the directory and access protocol discovery, we decided to place a URL encoding the query in the certificate. The service provider does not need to interpret the contents of this URL beyond interpreting it as a URL.
Architectural Overview
The architecture consists of the following components:
X.509 certificates issued according to a Certification Practice Statement defined by the issuing institution and accepted by the information content provider.
An extension field in the certificate, containing the directory query string.
Directory attributes that encode access rights for a user.
A description of the sequence of operations involved in the transaction that authenticates a user and authorizes access to a service for that user.
Discussion of the items that must be negotiated, exchanged, or emplaced prior to executing any authentication/authorization transactions.
Certificate contents
The exchange of attribute information is based upon the use of a single extension field with a registered, unique object identifier (OID) in the certificate. For purposes of this document, we will call it the query field. The query field contains a single URL that specifies a directory query. The protocol specified by the URL should be a secure protocol, either LDAPs or HTTPs. The contents of the query, other than as required to execute the query using the given protocol, should be treated as opaque by the publisher. The OID for this extension is located in the appendix.
In addition, the certificate should contain enough information to identify the issuing institution. If an institution has its own certificate authority, the Issuer field may be sufficient to identify the institution. If the certificate is issued by a third party (e.g., Verisign), the issuing institution should be identifiable from another field such as ":Subject". The use of the ":Subject" field is not described in this paper or tested in the pilot project.
Directory attributes
Institutions licensing content will be contracting with multiple publishers that may provide more than one type of service or level of access. As a result, the attributes available for authorization decisions must be flexible enough to accommodate a wide variety of product offerings. The required set of attributes defined for a transaction is an ordered triple called Service Class, consisting of the fields Vendor, Service Name, and Service Type. These three attributes are defined in the table below.
| Attribute | Value | Description |
| Vendor | String value defined by Publisher | Usually the publisher domain name {jstor.org, oclc.org} |
| Service Name | String value defined by publisher | The name of the licensed service. For many publishers, only a single service is available. {jstorl, FirstSearch} |
| Service Type | String value defined by publisher {berkeley.edu, 100053231} | The access level and suite of online services available for the particular service. |
The attribute server must be able to return the value of these three attributes for each user, for each publisher. The value of these attributes will be determined by the license agreement or ancillary negotiation between the institution and the publisher. A couple of examples may clarify this.
JSTOR only offers one Service. All faculty, students, and staff at each participating institution currently have full access rights to the JSTOR Service. Thus, the value of Service Name is "jstor" and the Service Type has the value "domain name" for all licensed members.
OCLC offers multiple services and multiple "authorization numbers" which may correspond to different sets of access rights. One of these may be used by general users, while a second (more expensive or more powerful) one might be used by librarians. The Service Type attribute would be set to the appropriate authorization number for each individual.
The Vendor value will typically contain the domain name ("jstor.org") or other unique identifier for the publisher. Currently this information could be deduced from domain name address included in the exchange, however the inclusion of this field simplifies parsing the information in the case of information brokers that may provide services for multiple content providers and offers redundancy for troubleshooting and problem resolution. The Service Name value will be one of a set of values specified by the publisher. The Service Type value will be assigned by the institution from a publisher-specified set of values.
The attribute exchange protocol is structured so that the publisher will be given the right to see only those values of the Service Class attributes that apply to it. Thus, Service Class values for OCLC will not be visible to JSTOR, and vice-versa.
Several optional fields are under consideration to provide additional service management information.
Class of Service Description
A fourth directory attribute related to multiple classes of service may be returned at the option of the Institution. The Class of Service Description applies only to the ordered quadruple (Vendor/Service Name/Service Type/Class of Service Description) in which it appears. The purpose of the Class of Service Description is to provide descriptive information to allow a user to choose between multiple Service Classes, should multiple values be returned by the LDAP server.
For multiple Service Classes, one Service Class may have the Class of Service Description set to "default" to designate a default setting to the content provider. For other Service Classes, the fourth field will be a description such as "student", "faculty", "All subscription", "Graduate-level Resources" -- whatever makes sense to the population using the resource. Upon receipt of multiple Service Classes the vendor system will have 2 options: 1. To use the default Service Class for the session; 2. To present the Class of Service Description strings to the user via an HTML page so the user can specify which Service Class should be used for the session
Statistical Role
The International Consortium for Library Consortia (ICOLC) has defined a set of statistical measures for information access, "Guidelines for Statistical Measures of Usage of Web-based Resource," November 1998, HTTP://www.library.yale.edu/consortia/webstats.html. A Statistical Role field is proposed to meet the ICOLC special data element requirement, item 4. This field will include a printable string, which can be used by content providers to accumulate statistics on system use without compromising anonymous access. The Statistical Role field will be in Unicode format, as an ASN.1 defined record.
Persistent Identifier
A persistent identifier or ID allows the provider to maintain state for user. The identifier must be a unique alphanumeric string within the institution, i.e. an MD5 hash of an actual ID.
Access denied message
In cases where the content server may not be able to deliver an adequate error message, an optional "access denied" message string should be provided to deliver a message to the user including a reason for the failure and contact information.
The Protocol
Establishing the Service
At the time a license agreement is executed, or whenever any licenses change, the publisher should supply the institution with the following items:
A X.509 client certificate and its corresponding CA certificate chain, which will be used by the publisher to establishing a secure authenticated query connection. This permits the institution to determine which publisher is executing the query.
A unique publisher identifier, preferably as a top-level domain name which will be used to populate the Vendor attribute.
A set of values for the Service Name attribute developed in consultation with the licensing institution.
For each Service Name, one or more values for Service Type. Constraints on who may use each Service Type depend upon the details of the license agreement.
The institution will provide to the publisher the following items:
Identification of the Certificate Authority for the institution and a copy of the CA signing certificate.
If a third party CA is used, the method for determining the identity of the institution from the contents of the certificate.
The Authentication and Authorization Protocol
The protocol for the exchange of X.509 certificates and attribute information is outlined below.
The client attempts access to a controlled resource, usually through a Web interface.
The publisher server requests that the client present a certificate.
The client presents a certificate and the publisher verifies that the certificate:
Is issued by a recognized certificate authority.
Asserts that the holder is a member of a licensed institution.
Has not been revoked, expired, or altered in any way.

The publisher extracts the query URL from the certificate and connects to the specified attribute server using the prescribed secure protocol, presenting its own client certificate to establish the secure connection.
The attribute server verifies that the publisher's certificate is valid and uses the publisher's identity to determine access permissions to the information in the directory.
The attribute server resolves the query. The result of the query is presumed to be a list of attribute name-value pairs. A given attribute name may be repeated in the results. The list of results is returned to the publisher.

The publisher looks at the value(s) of the ServiceClass attribute. If at least one value is valid for the publisher and service requested, the user is granted access. The precise access rights may depend on a number of things including the ServiceClass attribute value(s), the institution to which the individual belongs, and other factors (e.g., number of current users).

Optional Attribute Exchanges
If the "StatisticalRole" attribute is present, its value should be used by the publisher to aggregate access statistics.
If access is denied, for example if no Service Class attributes are returned, the attribute "AccessDeniedMessage" may be presented to the user. If so, it will contain a human-readable message detailing the reason for denial of access.
If the "PersistentID" attribute is present, its value can be used by the publisher to provide value-added services, such as recalling the state of a previous session, maintaining preferences, etc.
Operational Model
The current implementation relies on the following standards based services:
An X.509 Certificate Authority
The secure version of the Hypertext Transport Protocol (HTTPS)
The secure version of the Lightweight Directory Access Protocol v3 (LDAPS v3)
Conclusion
The Digital Library Authentication and Authorization Architecture and Protocol provide a standards based method for the exchange of authentication and attribute information for access to restricted information in a way that ensures an acceptable degree of anonymity of the patron. This method is easy to implement technically and integrates into normal business practices since the information exchange flows from existing licensing processes.
The trustworthiness of authentication under this model will vary between licensing institutions, depending on their certificate practices, however, even the simplest methods of issuing certificates online using user IDs and passwords provides a higher level of access security than the current, most common form using IP address authentication. In addition, the exchange of attribute information provides the opportunity to regain system usage monitoring that was lost in the move to the current web based environment.
Next Steps
The architecture will need to be extended to handle the case in which the institution is not also the CA, requiring that the institution be identified in a designated field within the certificate.
Expand the testbed to include three more educational institutions and three more publishers.
Test the protocol using the optional data elements.
Determine the ability of this model to support the delivery of use statistics specified by the International Consortium of Library Consortia in, "Guidelines for Statistical Measures of Usage of Web-based Resources," November 1998, HTTP://www.library.yale.edu/consortia/webstats.html.
Specify the https interface for an institutional directory server.
Develop a strategy to deal with multiple statisticalRoles for a single individual.
Investigate stronger methods for maintaining user anonymity, including the use of short-term certificates, IP masking, and means of defeating traffic analysis.
Establish more secure verification of client certificate, including real time CRL checks.
Address issue of certificate maintenance and distribution for both institutions and publisher.
Provide a reference implementation, and establish an interoperability testbed to allow other institutions to participate in the next phase of design and development.
Appendix
OID Extensions and Attribute Definitions
User certificates must contain an X509v3 certificate extension, defined as follows:
|
id-clir |
OBJECT IDENTIFIER ::= { iso(1) member-body(2) us(840) 114006 } |
|
id-clir-dla3 |
OBJECT IDENTIFIER ::= { id-clir 1000 } |
|
id -clir-dla3-queryUrl |
OBJECT IDENTIFIER ::= { id-clir-dla3 1 } |
queryUrl ::= OCTET_STRING
The query must use either LDAP over SSL (LDAPS) or HTTP over SSL (HTTPS).
The query string must comply either with the syntax of RFC 2255 for LDAP URLs (except that the query shall start with the string 'LDAPS' instead of 'LDAP')
or with RFC 1630 for HTTP URLs (except that the query shall start with the string 'HTTPS' instead of 'HTTP').
In the case of an LDAPS query, the SASL EXTERNAL method of authentication shall be used as described in the internet draft "Authentication Methods for LDAP," HTTP://search.ietf.org/internet-drafts/draft-ietf-ldapext-authmeth-03.txt.
When presented with such a query an institutional LDAP server must authenticate the query sender and return the following attributes about the user of the certificate:
serviceClass, where this is defined as follows:
serviceClass ::= SEQUENCE {
vendorName OCTET STRING,
-- human-readable string unique to the vendor and
-- intended to be representative of the vendor name
serviceName OCTET STRING,
-- human-readable string unique to the particular service
and intended to be representative of the service name
serviceType OCTET STRING
-- contents to be defined by the vendor
serviceClassDescription OCTET STRING,
-- human-readable string identifying an individual serviceClass value
}
statisticalRole (if supplied), defined as
statisticalRole ::= OCTET STRING
-- used by the vendor for logging so that statistical
-- reports can be provided to the user's institution
persistentID (if supplied), defined as
persistentID ::= OCTET STRING
-- used to provided a persistend identity for the user
-- across sessions
accessDeniedMessage (if supplied), defined as
accessDeniedMessage ::= OCTET STRING
-- used to provided a human-readable error message when access is denied