^M00:00:07
>> Welcome to the second section--Security Controls on Applications under the Security Controls part of our lectures.  We're choosing to talk about some specific controls on applications.  The questions we are going to ask here are what are the methods used to identify a person to an application and have them specific privileges or rights.  These things fall under the category of authentication and authorization signing on and having the right kind of access.  Then we move to the third leg on the triple A part of it.  It is the audit logs and the question we want to ask is what methods are used to know activities of a user and what he or she is up to.  And part of it is that, that we discussed the previous lecture the Security Event and Incident Management System.  And we then go to a separate topic of data loss prevention which is, "What are the techniques that protect against accidental data loss?"  As we constructed these lectures, in both platform and application and looking at the various kinds of controls, we specifically chose not to say that here is the definition of HIPAA and find a particular control and method by going line by line, going through HIPAA and saying, this is what you do in HIPAA and this is what you do.  In the next part of HIPAA we talked a little bit about the value of having servers in and a data center.  But HIPAA also says, "If you go read that specifically that if there is ever a maintenance work that's going on in the data center, you should have a log of what the work was."  We didn't think that was necessary to be talked about in our certificate codes.  We thought that if we want to give you specific examples of generic controls, we could do that but we would pick up only the controls that makes sense from the security perspective and not necessarily from a regulatory perspective.  So I would encourage any of you who want to know more about HIPAA to obviously hit the internet and if you have any specific questions definitely come and ask us.  But our goal in our D sections was to take the high level concepts and drive through them because those would be applicable in all cases.  Authentication or signing on has always been very straightforward.  There's a user ID and there's a password.  User ID represents an identity, identity of a person most of the time.  It could actually also be a server to server authentication in which case the user ID could be the name of the server or its MAC address or its IP address or a combination thereof.  Password is the secret.  So in authentication business, we always say who you are, what you know and what you have.  There are three components to establishing the full identity.  So the user ID is just a string that is well known to everybody.  Password is secrets to me and sometimes what you have is the second factor authentication such as a token card or a swipe card or a credit card or nowadays, it's becoming a specific mobile device.  All of those can act as second factor authentication.  It is not very common but it is going to become more common because passwords have problems.  So user ID passwords provide baseline protection.  Typically there's going to be a password requirement.  It's not just going to be--yes, you need a password and then walk away and you choose your password to be the letter A or numbers1234.  So there are strength requirements which typically talk about how long they must be. They talk about whether they have to be combinations of small letters, capital letters, numbers or special characters, how many letters can be repeated.  Changing the password periodically is typically a password requirement and the fact that you couldn't choose perhaps last three because you don't want to be reusing some of those.  And there's controversy about changing password as a requirement and it definitely has a cost to the users and that's where we are going next.  But it is something that is accepted practice, in most of the world their passwords need to be changed periodically even though it's somewhat dubious how much security that adds to the password itself.  So what are the problems for these that are already in password systems?  The problems are many.  First we have many applications therefore we have many pairs of user IDs and passwords.  And the load--just the knowledge load onto the user if they have different kinds of user IDs, different kinds of passwords is a bad enough load that they would write it down and carry it in pieces of paper and if that papers is lost, we definitely have lost the battle of security.  The second problem with this is that because we have password change requirements, many of these applications are environmental one.  You could change passwords and they would choose different periods of interval upon which the passwords expire.  Some may choose it to be 90 days, some may choose it to be 180 days, some may choose it to be 360 days.  And because you are changing them different times in different applications and you have many such application, it just feels that you're changing the password all the time and you can't even remember what those most of these are.  Applications have different password strength requirement.  Some of them want special characters and others basically prohibits special characters in which case you could really can not choose the same password for across the systems.  You would have to add some kink to it.  And that adds to the problem of management.  And finally, the real reason why the passwords are essentially go away is because they are increasingly vulnerable to brute force attacks.  The machines have become so much faster that a simple 8-character password that's easy to remember and type is quite breakable in hours of time as long as you had the secret that the encrypted password or--or hash password so as to test your brute force attack.  Brute force attack simply means you enumerate all passwords and you try them one each time--it's just a brute force.  And this problem is actually becoming more acute as time passes and computing power increases.  We all believe that user ID passwords will remain but you'll have to add a token hard on a second factor authentication to make these things more secure.  Nevertheless, user ID passwords are common denominator that almost every application would support and it happens to be inexpensive and has been very, very useful for [inaudible].  In healthcare, we find other issues with account management.  So we talked about user ID and password.  And we find that in large institutions where there are many systems, the vendor's implementations of account management are still quite inflexible.  It's improving, it's gotten much better than what it used to be but it's still inflexible.  And how so?  Well first of all, what we would like to do for enlarged enterprise is a user is given a single user ID and is given a password to a central directory service and we want every application to essentially obey authentication to this directory service so that the user has a single user ID and password to sign on to all applications that he or she needs to.  But if vendors do not permit their authentications to be anywhere other than internal, then this is detrimental to enterprise like common user ID password.  You can do common user ID, you can try to do the password even though they have been stored in different place but the bottom line is it is not as clean and manageable if the user has to remembered to do this properly.
^M00:09:59
>> Vendors also have systems where no automated account provisioning or deep provisioning exists.  Provisioning is a function with which an account is actually created within an application and appropriate privileges are assigned to that account.  An automated account provisioning simply means that we get to know a physician is starting in 7 days.  We then--we know which various clinical systems the physician is going to be needing access to and we set up a  workflow process which is completely automated that his or her user ID password are assigned and then accounts are created in various applications automatically.  More important than provisioning, deep provisioning which says when the user goes away have we terminated the accounts.  Once again if the applications do not offer us any method to do these things automatically and insist that everything has to be done through GUI, for a human being to sit in front and doing it.  This is detrimental to enterprise computing.  Unwilling to support external authentication--the example I told you about user ID passwords, unwilling to support external specification of role authorization as part of provisioning.  Not only we want to say that this is the account, we also want to say automatically that this person happens to be a Doctor as well as happens to be an employee of this component as well as happens to be an administrator who should have this accesses.  So if we can specify those roles upfront, we need an automatic way of having that provisioned entitlements or privileges in each of those applications based upon those roles, very, very patchy implementation of such things by the vendor world with clinical systems in health care.  Consequence of these problems is that we have inefficiencies in security risks because we all [inaudible] one of things and relying on users and other people to do these things correctly.  There's an encouraging trend that these things are getting better but you should know that in order to have a common user ID and password, you will have to insist you can not automate that process in the applications today.  So what are the requirements for good authentication?  Well since authentication is more about people than anything else, proper recognition, we call it robust recognition here, of the work force is a very important concept.  One minute data from various human resources, perhaps credentialing department, knowing who the students are and all kinds of temporary people such as the nurses and the contractors and the and other temporary people who come to work and having them all in one place so that you can generate a common user ID for them is a critical requirement for good authentication.  And then one needs definition of manageable number of roles which is doctors, nurses, therapists, et cetera and it has to be manageable and not count in hundreds.  Next one needs a clear mapping of roles and access privileges so that if there's a doctor, we know they need access to this file, these applications and this should be their entitlements.  If this is a Research Nurse, they need access to these clinical applications with these other entitlements.  Having this clear mapping is a very important consideration.  Not only having a mapping is important, one has to then have a structure so that those mapping stay up to date.  It is very common for human resources to add new titles everyday to people and then if you want to automate their process you must be able to take their title, ask the questions what roles they should get, understand their functions and have to keep them mapping structure up to date because of that.  So when we do automated account provisioning based on roles to the degree that it can be done with the current existing vendor world, you have to devote account creation and termination.  So when we recognize that somebody has been terminated for specific human resources we have to terminate their access.  What happens if they leave one human resource and join another human resource and both organizations have both human resources a part of your work force?  You have to be extra careful and understand that losing one affiliation to one human resource is a sufficient cause to take all access out and only then add when the next new affiliation has occurred to add all the entitlements back again.  This is never done very well.  This is also somewhat messy but there are identity management systems that are coming up which are helpful in addressing issues like this in much more clarity and much more simplicity so that we believe that these solutions would be getting better.  We talked about the problem that the user has which is multiple user IDs and passwords.  So the question is, what techniques--what controls exists so that you can actually reduce that kind of a problem.  So one thing you can do is no matter what application it is, even if it has different authentications, you insist that the same user ID is used in each of those applications.  And do not give a user like me who's name is Sumitra Sengupta [phonetic] to have user ID SENGUPT in one system and SSEN in another system because there is no single place where this can be managed so why not go ahead and make sure that everybody gets the same user ID across all applications.  Then the next question is what about the password?  And one control exist called password synchronization across applications--and this sort of a control helps users to navigate the maze of user ID passwords because even if they have to type their user ID passwords again and again going across multiple applications, at least if they happen to be the same as in all systems, it is not a memory burden for them.  So a control such as this provides a single place for user interaction.  It does one password standard and it allows self service.  That means when you're gonna change passwords as it's required, you change it in one place and that password then gets propagated to all the application systems or most of the application systems.  Self service is another function which is--which is always a problem with password reset.  People forget their passwords and then they ask, "Okay what am I going to do?"  We offer them service desks and they would call there and they will wait 20 minutes before they'll get somebody.  Instead the trend being self service, basically you ask them a set of questions.  If they can answer those questions you allow them to reset their passwords.  There is a very good proven return on investment on password synchronization as it saves time on the user end.  And typically these things are not unreasonably priced.  You would have to dedicate some human being person to actually run such a system.  So that we set forth the same user ID using automated provisioning or through procedural control that was whenever an account is created, insist that the same user ID is used.  People who offer identity management solutions along with the password synchronization, companies such as Oracle, IBM, [inaudible] as well as what we call P-Synch that's actually being used in our environment, it's now owned by Hitachi.  So, the choices to use the number of sign-ons is to then make the applications go to a single place to just test the user ID and the password instead of having user ID passwords replicated even if it's the same user ID in multiple applications, you now change the way they authenticate if you can so that they go to a central place for the authentication.  And typically that would be a standard directory in some secure authentication system.  A great example is active directory which does both of those things.  It's a great standard directory as well as it offers a very good secure authentication system based on a protocol called Kerberos which originally came 20 years ago from MIT.  And it's been proven to be a very secure algorithm.  The [inaudible] in these things is these things require changes in application because application has now somehow outsourced its authentication to somewhere outside.  So, what you may find--what we found that whenever we had a large application that we wanted to do this with, we may--in earlier years, they would know that, that would be a desire and so they would actually charge you money for development of such thing.  Nowadays most applications are getting better and they offer a smorgasbord of choices of different authentication if one of them happens to fit your institution.
^M00:20:00
>> If one of them happens to fit your institution great, there's no extra charge for it; otherwise, you'll have to get your program written and there might be charges for it.  A simple directory authentication especially if it's to LDAP such as OpenLDAP or Sun1LDAP which is an Oracle LDAP, Java server LDAP.  Such directional authentication is based upon user IDs and password.  They are pretty good, but from a security perspective they are not as robust as Kerberos itself, so active directory actually has better authentication that way and then what we have is referral authentication across disparate directories do not work well.  What that means is if you now have multiple directories and then you have to synchronize user ID/passwords across the directories before the applications you use, a specific directory to authenticate, you need password synchronization to work across the directories, so even though we like directories, we have to also say, we do not like too many directories because then the synchronization has to work across those directories.  So here are some examples of directories that are used to provide a central authentication place with their user IDs and passwords that applications can come to.  It turns out that in healthcare, the problem of signing on has been felt for a long time and NCHL7 has actually come up with the standard for clinical single sign on function.  This particular standard is called CCOW, Clinical Context Object Workgroup.  The concept here is most wonderful, implementation being somewhat difficult.  The concept is that the user authenticates once to its desktop and as it signs on and then starts other applications when they double click on those applications they do not have to sign on one more time and the user will be taking them in automatically, so user authenticates only once and all other applications now entrust that authentication, thus to allow them to go in without any additional authentication.  A nice twist to that is, because it's a healthcare standard, that if a user had chosen or picks a particular patient in a particular clinical system that they were looking at and then they went and clicked on another clinical system, not only the system authenticates them through, it also brings up the patient that they're currently looking at in the other previous application.  What that means is there's a context of what the patient being viewed is and that gets translated to all other applications.  Fully seek out compliant clinical applications are becoming available slowly and the problems really there are these complexity in supporting the legacy stuff and there are plenty of nonseek-out compliant applications.  The one and only company which has been very successful in selling products in this area is Sentillion, which has recently been purchased by Microsoft and other than Microsoft, there are a couple of other companies such as care effects [phonetic] who provide these kinds of functions, but they have not been very successful as yet in the marketplace.  That brings us to the next topic in this section, the authorization.  In clinical systems, HIPAA privacy typically specifies minimum necessary use and need to know for accessing clinical data, we understand that.  We have discussed that before.  Why is it so?  It is so, because by necessity, the necessity of which providers should look at which patients is pretty much impossible to decide of prior.  This reflects how care organizations are organized.  What the caregiver practices are, if there are fewer number of care providers, it maybe possible to separate some of those things, but in a large environment where any physician maybe called for a consult for another physician, this is a very difficult task to do and because it also reflects the care organization practices so far, what we find is the consequently access authorizations typically are quite broad and not very honed down.  For sake of information, there is a British Medical Association model.  It is an interesting novel model of how authorization ought to take place.  We are not sure if this is actually implemented or not, but this model basically says that at any given time, the patient and a physician who takes care of the patient have access to their record and anybody else can see this patient's record only, if and only if the primary care physician or primary physician has delegated to them some aspect of clinical care, so that they have a need to really look at the data about the patient.  It is an interesting concept that with the delegation model, we haven't seen an implementation of it as yet.  Continuing on authorizations, we find that even in clinical systems, there are some access restrictions.  Only a full physician can order tests and medications that include residents and attending.  If somebody is a medical student, they can order tests, but those tests cannot go through unless countersigned by a resident or an attending and that is reflective of the role that the students play that they're learning how to make those orders.  Other examples are sensitive data such as HIV and mental state.  Mental state data are typically limited to the psychiatric departments and their access.  If one has a VIP person, an important celebrity or otherwise VIP then you may want to restrict access of their data to specific users even and this is done in rare cases, but it is possible to have access restrictions done that way.  One of the interesting ordering test restrictions we have come across is chemotherapy orders.  It turns out that the chemo orders are by regulation required to be limited to a set of oncologists who have undergone the appropriate kind of training and other people should not be allowed to order it.  These restrictions that happen in the applications, especially clinical applications are role-based.  A role typically implies assignment of a role of function to a particular individual and an individual can have multiple roles.  They are physicians, they are administrator, they are researcher of specific type, so those roles are essentially monikers or attributes that you assign to the human being.  What happens on the other side is each application comes with wide, wide variety of entitlement flags.  They are allowed to change this field.  They are allowed to order.  They are allowed to print this report.  They are allowed to view this data and these are all called entitlements within an application.  The next step after you have the entitlement flags is to create roles and map them to a set of entitlements and by doing that, what you do is going forward is anybody new who comes in, you assign them the role and you expect automatically to get the set of entitlements once provisioning is actually in place, automatic provision is in place.  We also believe that having more than 20 roles in any large organization makes it pretty unmanageable to understand and remember what the distinctions are, so that the right kind of access privilege can be granted to them.  One has to discover such roles, we actually have a project that goes and looks into various applications and users and finds what kind of access they have today and whether they've used it or not and then looking at that data across multiple applications, we want to discover what the roles are to be.  It's not just going to be a doctor role, but it's going to be doctor role one, doctor role two doctor role three and we hope to find such distinctions because that will help us do the right kind of provisioning and deprovisioning.  So, the whole concept of compliance management requires that when somebody is given a particular privilege that every year that person's supervisor is asked whether this person should continue to have those privileges.  And that is called a compliance management which says that we are checking with people and making sure that they have just the right kind of accesses.  And so compliance management in that case requires recertification of a person and their roles periodically, again the control here is a rule in compliance management which falls under the identity management and if such a system is installed and implemented then one could go to the supervisors of all employees--you take a group of people and you go to the supervisor and you can ask the question whether they should continue to have specific access or not and act according to the results of that query.  We now move on to the audit logs.  Because authorization is broad in healthcare the compensating control, the mitigating control is for somebody to look at the audit logs, look at the actual accesses that are made and ensuring that inappropriate accesses were not made even though they were permitted to make those accesses.  Hence, emphasis on audit log in healthcare applications continue to grow for many, many different reasons including security.  So, even an audit log where there are at least 2 kinds of components here one is the SIM itself and what happens in the SIM is that all the platform logs of Windows and Unix machines, network logs, firewall logs, VPN logs, those all go into this automated system called Security Information Management system.  Some examples of good company in that arena is ArcSight, Trust Geo, Splunk, et cetera.  Even with an audit log there is a set of standards that ASTM had actually worked on.  It's called standard specification for audit and disclosure logs for use in the health information systems.  We're not sure how well distributed it is and whether the vendors have actually started to follow it or not.  There was more success with this other group called Integrating the Healthcare Enterprise, IHE, who in order to address issues with RSNA and HIMMSS actually proposed a specific way of doing these kinds of audit logs and their standards and what structure they were gonna have.  The standard that's used by IHE is called ATNA.  The delivery of those logs typically comes through Unix based sys logs and Windows logs and they happen to be formatted in XML content [inaudible] typically.  We start with the SIM, and see what the SIM actually does.  A SIM's job is to collect all platform level logs in one database.  Typically such vendor, SIM vendor will create connectors that make it easy to collect this data from various kinds of operating systems, Windows, Macintosh, Unix of all variety, databases, Oracle, Sybase, IBM, DB2s, web servers Apache's obviously iAS, different kinds of firewalls, different kinds of system servers, different kind of VPN gateways, radios and other applications.  So the company makes these connectors, they would sell them for some particular price and you connected so that all the data from all kinds of platforms are going into simulated base.  Then the SIMs would apply pattern matching of their analysis techniques in wonderful graphic user interface techniques.  To identify intrusion detection or any anomalous behavior, one could write plenty of rules on such systems ourselves and get great value out of what the usage patterns of these systems are and what they're telling us in terms of the desirability of such systems.  So these are the examples of the commercial products that we talked about.  I think I had said Geo Trust in the previous slide, that's incorrect so there are companies that are [inaudible] Splunk and TriGeo which have decent commercial products in this area.  A typical clinical systems audit log has information such as this, with variations.  Now the standards still don't specify exactly what has to be in these logs, so we find that there's typically information such as this in these logs.  So, thinking about what the access of the users were to the PHA applications we want to capture in the audit log who are the participants that means who is the user, who is the patient, what was done, what kind of data classes are involved and what actions were taken, those data classes.  So an example would be viewing a radiology image.  So action would be view and data class would be radiology image of type CT of date X, Y, Z.  When was this access done, date and time?  How it was done, through what kind of applications, what kind of programs?  If servers and database names are added because there could be multiple places they are going to.  The next question relates to where the request is coming from.  Is it local or remote?  If it's either way what is the IP address.  So that decides what the network locations are.  And then finally we ask the question why do you have to have this access?  In some particular specific areas and times we have asked that question and you could just ask why and see what an answer comes back as, whether the appropriate minimum necessary I need to know what are being followed.  Some additional events that get captured typically is sign on events, sign off event as well as timing out while it's still signed on so that one can then count how many times people is not signing off but instead they are timing out in their access.  What are the issues with the audit logs?  Well first of all because they are multiple applications and they're logging in a different format we have an integration problem if we are going to do an investigation, an investigator would then have to be very smart and be completely knowledgeable about going to 5, 6 different applications looking at the logs in their native GUI interface and make deductions about what time happened what things--at most they can produce on paper and try to rearrange them.  Instead a better solution is to bring the whole logs to a central place and apply an analysis system on top of all the logs that have been collected such as the SIMs.  So now we are talking about clinical logs which are different from the platform logs and the intelligence to make logs useful would be required and procedure and investigations have to happen but here are some things that we want to highlight.  An audit log is not same as a general level network and system log which is at a lower level.  It is not same as logs of disclosures or EPHA transfers, that means disclosures to government or any other sources that is outside the treatment, payment and operation functions and it is not a log of that it is not a log of that somebody's data went from one system to another system.  This is a log about people looking up data.  These are not performance logs.  They are not measuring how fast a particular piece of code is executing, whether there is subroutine or a program code that entered and exited in a specific period of time, all that helps to find out where the performance is degrading.  That is not what this log is.  Then next it's not a debug log.  You are not trying to figure out whether your code is working correctly or not working correctly, working flaky and has errors.  You may want to do a lot of logging to find out where exactly it is going corrupt.  Again audit logs for security purposes are not the same as debug logs.  And then finally the journal logs which are used only on the writing side of things.  So that by definition they are not a complete audit log simply because audit logs ought to have all the read activities and general logs are typically the right activities.  Although it is important to have the journal logs to get, to create redundancy in other locations but it is by itself not same as the secured audit log.  So, what can you do with audit log and what kind of reports can you come up with?  Here are some very old examples of some of the audit logs we had created.  In some of our environments some of the medical record numbers that are given to the employees or to the celebrities are marked with a sign of being an employee or being a VIP.  And an interesting question that the audit log report can answer is for the VIPs who are in the hospital today, how many users went and looked at that account and if the number is greater than a particular threshold should we initiate an independent investigation?  So here's an example which shows that these are MRNs, they are special medical record numbers and they have been accessed by more than 5 users on that particular day.  Specific details about who those users are, users accessing more than 3 special MRNs you can get to look at the results here.  The user Nancy Smith happens to be a medical chart analyst and in order for doing billing for as a group of people they have come across doing multiple employees and therefore it's triggered.  Similarly users when they access a VIP patient remember the discussion we had that sometimes you just have to ask why do you want access to that.  So, in this particular case we actually do ask the question, when somebody is going to a special MRN that this person has a special relationship to the hospital and the university, "Can you please describe why you need to go into this record?  And there are various examples of responses that people have given on the right-hand side.  You can see self stop, that means somebody didn't proceed any further but went back.  There were not interested in looking at that record anymore and that's a good security at that moment.  And there are other examples like coding, and myself which is an interesting thing.  In our institution we have not determined that this is not an acceptable behavior, and other such examples we see.  And they're helpful in understanding what kind of reasons why people are driven to look at this kinds of medical record use.  Here's an example of an audit log report where users who access social security numbers, they are asked why do you want to have to see social security numbers which is a PII and has an expense associated with the [inaudible].  So here are some examples, some of them being reasonable.  VNS sheet, that means visiting nurse services has a place to fill up the social security number so they are looking up to fill that out.  Although [inaudible] that data is unclear [inaudible], tumor registry abstracting, that means when we are sending it to the registry, registry is expecting to have social security numbers.  Here is a good example, check to see if patient has multiple MRN numbers.  So, they're going to use social security number to search in the enterprise master patient index and then all the MRNs will show up once they search it that way.  And other such reasons.  But sometimes we get answers that are inappropriate, so one example here is to submit a medication grant program application.  It is unclear why a medication grant application program would require a social security number so an answer like that would require us to go and visit the person or call them up and ask the question, "Did you really need to say this or did you need something else?"  A response such as I feel like it is an extremely inappropriate response and one may want to take sanctions against that.  We're trying to show the value of asking the question and following up on what kind of answers they are giving especially when they are being given the real sensitive information such as social security numbers.  A couple of other examples where it's not clear why that should be the case that they need to see social security numbers, one is this is my patient.  So that doesn't mean that you have to see social security number and other things such as work related excuses.  Different ways of looking at log analysis and finding whether the breach would have been catchable if we had these kinds of log analysis tools.  And the answer is yes and we have learned that and talking about it helps understand how the log analysis actually can be done.  One example is that in a large organization it is very rare to see a particular user accessing consecutive MRNs like MRNs with everything same except changing the last digit from 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 within a day because it is not typically possible for such an access pattern to really evolve in a busy place.  It can happen in small locations where maybe over the weekend there is only a single registrar who is registering patients and clearly more than 10 patients can arrive over the week.  But looking at consecutive MRNs by a person in a large organization typically implies that there is a systematic approach to looking for something and is it looking for bad information or information for bad purposes or there is legitimate purpose, either way this deserves a look see of calling and asking, "Tell us why?  Why you do that."  The next 2 examples are about medical record numbers, sometimes the number of medical record numbers or patients looked at in a particular day by a specific user compared to what they have been accessing so far.  So everyday you've been accessing 5, 5,5,10, 7, 5 medical record numbers and one day that number jumped to 75.  Then it is a sufficient separation from the normal behavior that a behavior based alert will be generated and that will be sent over and clarifications require why somebody has accessed so many records on that particular day.  Similarly the same thing is done looking at excessive access of MRNs across the peer group of users.  We have some graphs to look at that, next on that.  the number of hours with logs, if a particular person has been active for more than 12 hours but we know that they were on a single shift and one can only surmise that his or her password is actually being shared.  And since that's not an appropriate action we would get that alerted and try to adjust the issue.  So here's an example of what happens when we recognize that there is a consecutive MRNs, so one has to think about what the process is going to be once an alert is generated.  Without a process of how to handle the alert just getting an alert is not sufficient.  So the trigger here is that there are consecutive MRNs, there is an alert date, we know when a response was received and we're gonna look at what the response was and what we find is that this particular person who is a private employee in one of the hospitals is the person who did this consecutive accesses.  It was sent to their director and here are some evidence files and the email essentially says that AXX9014 accessed 13 consecutive MRNs or patient ING [phonetic] files and you actually get the list on that date.  Then again after some time, it looks like 28 minutes they access 11 more consecutive MRNs.  By sending this letter to the director we accomplish multiple tasks.  A, this works as a raising awareness that there is an audit function where people are looking at it and actually want to get an answer associated with it.  And the second thing we are investigating this incident itself with the help of the supervisors who have every reason to be not fall for a problem such as this.  And we, our experience has been very good, we collect all of this information along with all the responses.  We put it up on a web page.  It gives us a nice security measure that works as a good metric for us.  Here is an example of excessive access of a particular person who one day accessed way more than what they average typically is or was and you can sort of see in the top part of the picture for MRNs accessed per day by this person on around October 22, the number of patients accessed shot up to 300 whereas most of the days it is less than 50.
^M00:50:06
>> And similarly in the lower part of it we are showing the number of hours the person works and we also find that the number of hours worked on that particular day was not as often the outlier that the number of patients they looked at itself is.  Looking at this, we would find a similar kind of query to the boss, to the supervisor of this person and we ask them to explain why this access had the spike.  And once we get the responses, we would document that into the website.  Here is an example of a picture where a user's--lots of users, they all have admitting representative as their title, as it says on the top.  And all of these users in each row is a user showing what their access pattern is and where you see a little black dot right in the middle, that is the median of their access over the past 60 days in terms of number of patients to look at per day.  So it is set in an increasing order for them and that's where the graph actually goes up towards the right as it goes up.  And in this picture then, we are asking the question that regularly if you look at the person who is on the second row from the top, if you see their median access, and the top row itself is medians of medians, that means it's an average of all these numbers over here, we see that the person who is in the top row becomes an outlier to the normal behavior pattern that we can expect for people in this picture.  So what it does is it essentially tries to highlight whether a particular person's access in terms of number of patients per day is consistently too high compared to all their peer groups, people with the same title.  And if it is so, why is it so, is a query that we would send out to their supervisors?  In the commercial audit log solution for healthcare applications, there is essentially one company called FairWarning, which collects information the way we do it, that was grown in-house and has connectors for various audit logs, for various clinical systems and can very easily and quickly bring data into a central place and conduct reporting from there.  We recently talked with them and found that there were some of the deficiencies still exists, and it had some kind of interesting analysis which may or may not be very useful.  So, one of the example it has is it matches patient name with the username to try to make a guess whether they are related, if their last names are the same and some additional criteria like their emails being the same, et cetera.  What this does is this checks for if a person who is an employee and has access to the system accesses their own data and unless there's a policy which says it's prohibited, they can do that and FairWarning would actually alert on them saying that they are actually looking at their own data, because in this particular case, patient name will have an exact match as the username as well as their credit card, as well as their address may actually match as well.  So that is--the address match is the next criteria where FairWarning would check the address of the patient from the demographic systems and check the address of the user from the human resource systems, put them in Google Maps and construct a possible neighbor category or a household member.  And if they happen to have the same name, maybe they are the same people but if they don't have the same name but this will come from the same household member, that might be a question to ask whether it's an appropriate thing to do or not.  [Inaudible] correlations as you go forward to look for inappropriate or fraudulent acts or so otherwise it's definitely an area of potential research and continue for some time getting good results to that kind of an activity.
^M00:55:05
[ Pause ]
^M00:55:10
>> This brings us to the next topic we want to discuss.  It's called data loss prevention.  And the reason this has become an interesting component and a control to look at is because of the new high-tech breach notification logs.  It turns out that more data are lost through accidental losses, and if we can prevent that and protect that, that definitely helps the overall security portion.  So the purpose of data loss prevention systems is to protect against accidental losses of data.  So how can that occur?  There are 3 possibilities.  They have nice, catchy names, data in motion, data at rest and data in use.  So the data goes through these kinds of things, whether it's moving or it's sitting at rest or it's actually being used.  And related to each of those categories, we have to have appropriate control to prevent some sort of a data loss.  Data loss prevention is a kind of a search engine.  It is a dynamic, big search engine.  It's checking everything automatically in a real time sense.  What it's doing is it's looking at all outgoing traffic or data flow and when we say outgoing it means it's going out to the internet.  And we're searching whether there's PHI and PII in those transferred packets or sessions.  The searches ultimately are pretty straightforward and they are rudimentary.  An example would be if we had a way to search social security numbers, it would be look for consecutive 9-digit numbers, maybe with dashes in between them, or if you're looking for PHI, look for dictionary numerical terms as well as the word MRN and see if there's a real MRN in there and so now you know they are--this is the data associated with PHI and there's no encryption on it so we need to be able to protect it.  And that's what data loss prevention's job is.  There are solutions in which interesting innovative solutions are possible.  An example is that one of the systems would take the entire MPI database, master patient index database and load all the demographic data of all patients into the data loss prevention system so that if at anytime data is being leaked with a specific name, et cetera which happens to be exactly the way our patients are, then that is an alert that has much better specificity to capture that information.  So we then go through each of these cases, data in motion, data at rest, data in use, and see what kind of controls are in place to check for whether data is being leaked out.  So obviously data in motion means it's sitting in the network, it's moving on the network, so one place to look is to monitor the network for outgoing PHI and PII.  Another place to look for is in the outgoing email flow where somebody may be sending PHI and PII accidentally over the emails to outside world.  An interesting component here is that the data leakage prevention system, actually it has an option of what to do with that email, should it just press it as well as send a new message back to the sender or should it initiate an encryption action on the email and then continue to forward it.  It definitely would depend upon the policy of what should be done.  Such data in motion appliances or systems can also hook into the proxy server, recall the proxy server as the server through which the local users go to the internet.  And so if they are posting something, if they are putting up something that looks like institutional PHI and PII, the DLP system through a proxy server will be able to recognize and actually be able to stop that certain thing from happening.  In each such cases, the data in motion systems can actually prevent that data to really be moving out but they need to have sufficient connectivity and authority on the system to actually do that.  Note that in looking for data in motion that matches the criteria of PHI and PII, they're definitely dependent or related to what the policy definition is, and that policy definition decides how much of false positives you are gonna get.  An example would be if you are doing a 9-digit social security numbers, you're gonna put 9 numbers that needs to be searched for, but if you're doing PHI, in addition to doing medical terms that come from the dictionary in a particular message or particular interaction or a session, along with the word perhaps PT or patient or MRN because for it to be PHI, there has to be obviously an unidentifiable person and the medical information.  So, it is a big search and it's obviously always gets tweaked and changed to do the right thing.  Data at rest is a way of looking at data that is sitting in the servers or in the back end systems and the question really is, should they be sitting exactly where they are sitting right now?  So, some of the data leakage prevention controls would be looking at or scanning, say web servers and looking at whether web servers have some files that are open to the world or to, even internally where PHI and PII contents are in those files.  And once we find such a thing, we can actually intervene--no, we cannot intervene, we have to call the web server people and actually get those files removed.  Or we can remove it ourselves as long as we have the right kind of authority and access.  The other thing that can be done for data at rest is one can scan the desktops and ask if there is PHI and PII even though we do not like the fact that data should be sitting on desktops because you're not backing it up.  And of course you can scan the servers to see whether there's PHI and PII and the answer is there's going to be PHI and PII on the server.  So the question worth asking beyond searching the server is, is it appropriate to have the data where it is, and it is in the right directory because if it is not, then somebody has put it and that can cause a problem to the data.  One such data I found, especially in the servers that a particular directory is not supposed to have clinical data but it does have right now, one strategy is to actually literally remove that file from that particular location and leaving a smaller file which just has a stub information which says, "This file has been removed because it failed the data leakage prevention standards."  It is something that we are actively considering whether we want to do it or not at the moment it's unknown.  So, the stub could actually explain why it was removed and how to get it back as well as space sanctions.  So we'll see how that goes.  So we finally get to the data in use component of it, and it's typically addressed at the desktop level.  So typically there's an agent of DLP that runs on the desktop and its job is to watch over whether someone, or the user in this case obviously is copying PHI and PII data to a CD or a DVD or a USB at the desktop and as they do so, we're capturing that information and may alert on it and ask the question, why are you putting it in the CD or the USB.  Other such things that data in use can do is it can check on the desktop as a user maybe using Gmail or may be using Facebook and if they're cutting and pasting from their workspace into any of these environments, the system may not allow them to do that because that's one way of accidentally losing our data.  Examples would be thinking of WikiLeaks, that's how the leaks to wiki may have happened.  But since users are definitely using social media such as Facebook a lot and cutting and pasting data, this is a good check to see that institutional PHI data are not going out through the Facebook or Gmail or any other programs.  This brings us to the conclusion of this section where we discussed authentication, authorization and audit logs as the main controls.  We presented the new control, that's in pretty much in vogue everywhere the data leakage protection concepts.  As we mentioned before, if our security expects risk assessment as an important concept in old system management, so we think you should pay attention to that.  And also as risk assessment is done, we believe that the government will come and test us against the NIST security standards, which are the same standards that the government agencies themselves get evaluated upon.  And that's probably a good thing because it's a way of looking at the people who would have come to audit would actually follow the rules of NIST and not anything else's.  I thank you for the opportunity to talk to you regarding some of these interesting security issues as well as controls and I wish you all good luck.  Thank you.
^M01:06:17