Tuesday, August 16, 2011

The RSA SecurID debacle: Why it happened

The RSA SecurID saga was one of the more interesting security stories of 2011. Analyzing the background of this story can give some insight as to how security decisions are taken and why security systems fail.

The public story
SecurID is a security product from RSA which uses a hardware token to authenticate users connecting to a network.  According to RSA, “SecurID offers the strength of security, broad application support, variety of authentication methods and reliability required to protect the network assets of companies worldwide. Each RSA SecurID authenticator has a unique symmetric key that is combined with a proven algorithm to generate a new one-time password (OTP) every 60 seconds.”  SecurID is widely used in the American defense industry probably due to the fact that the American Department of Defense (DoD) approved defense contractors to use SecurID to secure their networks.

In March 2011 RSA announced that hackers had broken into their network and stolen SecurID secrets - presumingly keys. In the following couple of months reports came from at least three large Defense contractors (Lockheed Martin, Northrop Grumman and L-3 Communications) regarding attempts to hack their systems using the stolen SecurID secrets. In June RSA announced that they will replace all SecurID tokens (some 40 million). Damages to RSA came to at least $66 million in direct costs, not including the damaged reputation (including a Pwnie award for “Lamest Vendor Response”).

This story, combined with the Stuxnet, Sony PlayStation and counterfeit Chinese chips with a backdoor stories (which I hope to get to later), make 2011 the year in which large scale targeted hacks reached the news.

An analysis
Note: the following is all conjecture and is not based on any specific knowledge of the background and facts beyond what was reported in public sources.

Security failures as large as the SecurID debacle tend to derive from multiple failures. Following is an attempt to list the specific failures of this case and guess the probably causes of these failures.

Failure #1: RSA holds all secrets for all SecurID customers
Each SecurID customer received from RSA the secret keys for the SecurID devices purchased from RSA.  In addition RSA kept copies of these keys. It’s not clear from the reports why they did so. It’s possible that they simply never took the trouble to delete the keys from their database. Perhaps they kept these keys as a backup in case the customer lost their keys.

Whatever the case may be it’s likely that the decision to keep this database of keys at RSA was intended to reduce operational failures and costs. If the security risk in doing so was taken into account, RSA decided that ease of operations trumps this security risk – a bet RSA clearly lost.

What would be worse is if the security risk wasn’t even considered. In many companies the security personnel have a limited scope of view that emphasizes the bits and bytes security aspects of the system but does not deal with the wider scope of operations. This is either because the teams dealing with operations are not aware of the security implications of their decisions and don’t involve the security team or because the security personnel are focused on the classic security issues of protocols, algorithms and securing the implementation (e.g. the token) and don’t consider the less sexy operational security aspects.

Failure #2: RSA keeps the keys in a network which is accessible to clients that are also connected to the open internet
All security systems must make certain assumptions (law #1). Until recently one such assumption that was commonly made was that the most likely attack vector on an IT system entails physical access.  Common security mechanisms like Windows Credentials and Disk Encryption are intended to prevent someone with such physical access from accessing IT resources. In fact, SecurID as a product is based on the same premise – that if a PC is in the hands of the legitimate user then it cannot be used for illicit purposes.  Once the authorized user inserts the SecurID and password, the client gains access to the relevant IT resources – which is safe as long if we can assume that the client is under the full control of the authorized user.

It is likely that when RSA decided to keep the keys in a network accessible to clients connected to the open internet they relied on the same assumption – that an attacker would need to get physical access to one of these clients. If these clients were kept in a secure premises then the threat of an attacker gaining such access is low – and even after gaining such physical access the attacker would still need to enter the correct user name and password to enter the system.

But security is dynamic (law #6). The advent of Remote Access Trojans has changed this. RATs started appearing in the last 1990s with the rise of broadband (and permanently online computers) and became widespread in the mid-2000s. RATs allow attackers to access IT resources on a device using malicious software installed in that device and without physical access. Today this threat is clearly more significant that the physical access threat – yet much more efforts are still being invested in combating the old attack vector. While wild untargeted RATs can usually be identified by standard security software, targeted RATs which are developed for a specific target are not.

Security comes at a cost (law #4). Setting up a dedicated standalone network to hold the keys has both a dollar cost (the network itself) but also, more importantly, an operational cost. It makes the servicing of keys to customers more difficult as it requires manual transfer of keys from the dedicated network to the externally connected network. Since the assumption at the time the system was originally designed was that an attacker would need to have physical access to a network client to access the database there was no justification at the time to pay the price of a separate network.

Failure #3: SecurID users relied on SecurID making it a single point of failure
When RSA revealed the breach they sent a letter to their customers with “immediate remediation steps for customers to take to strengthen their RSA SecurID implementations”. These steps are essentially a list of actions that reduce the amount of damage that a system would incur from a hack of the SecurID.

I don’t know if RSA gave such advice when they originally delivered their product to customers. I would be surprised because it’s not easy for a security vendor to admit that their product can be compromised and that it is important to instate additional security mechanisms that would allow one to recover from such a breach.

But even if RSA did give such advice, I would expect most companies to ignore it.  The reason you buy a product from a reputable security company like RSA is because you don’t want to worry about security – you want the security company to deal with that. You pay top dollar for all those hardware security tokens and in exchange you expect to get grade-A security and no more worries.

Failure #4: Many (if not all) DoD contractors use SecurID to secure access to their networks
It all depends on the attacker’s motivation (law #2). Because the DoD approved usage of the RSA SecurID solution for defense contractors many of them (if not all) chose to use them. This greatly increased the motivation of strong potential attackers (e.g. the Chinese government) to attack the system. Combine this with a centralized database of keys and it’s no wonder the system was breached.

When the breach occurred I suspect many fingers were pointed towards the RSA IT security team. I don't think they are to blame. Protecting a network which is connected to the internet from a powerful highly motivated attacker.is a Sisyphean task. The only real way to deal with this threat is to simply not put anything on such a network that such an attacker would be strongly motivated to attack.

Bad luck?
If it wasn’t for this combination of failures the SecurID hack would have been prevented or at least have had less of an impact. One feels that this was all just a lot of bad luck. In fact this story is a tragedy – it just had to happen. Each one of the failures was simply inevitable.

Security people designing a security system don’t tend to think of the operational aspects. You ask yourself how secure is my device? You ask how good is my crypto? You don’t ask where are the keys kept in my back office.

Now one would think this is obvious. The magnitude of a security risk can be calculated like any other risk – by multiplying the probability by the impact. Even if one underestimates the probability of the threat that an attacker will gain access to this database, the potential damage is so large that the risk had to be mitigated.

When you design a security system you make assumptions based on what is known at the time. Modifying a system after it's deployed in response to new threats comes at a high cost which is hard to justify economically - especially if your customers aren't aware of the risks and keep on buying the old system.

So failures 1 and 2 are almost inevitable - unless there is a high degree of awareness of the risks. One positive outcome of the SecurID debacle is that at least now such awareness will increase.

Failure 3 derives from the conflict of interest that exist between a security provider and his customer. The security provider wants to sell the system. But no system is secure unless it is used and configured correctly as part of an overall security solution. The provider may advise the customer to do so, but this comes at a cost to the customer. Even if the security provider takes full responsibility for the security of the system (and not just for a specific product like SecurID) the security provider is not well positioned to cause the customer to take such additional steps - attempts to do so increases the overall cost of the security solution and can alienate decision makers from using the vendor's products.

Regarding failure 4 - this is the kind of behaviour you'd expect from a government agency.  Government agencies make wide security decisions which in turn create large targets.


In fact, when the “---- hit the fan” RSA also did things right.  They identified the failure, contacted their customers, advised how to correct the problem and replaced their customer products.

Investing in security is a long term endeavor that requires vision and a high degree of forbearance. In today's market, with its short term financial incentives, not many companies have what it takes to keep a system secure in the long run.


If you happen to know more about what led to the SecurID breach or just have different opinions on some of these issues - please use the comments below.

No comments:

Post a Comment