Let’s say you’re developing an application and you reach the conclusion that you need to secure communications between your client and your server. You decide to use SSL/TLS using a good cipher-suite and you implement this using some standard library.
To test your implementation you set up a server with an SSL certificate and see that the client application communicates correctly with the server. Great – so you've tested your SSL implementation, right?
Wrong! It’s possible that your client application isn't actually using SSL at all and is communicating with the server without encryption. So to make sure that the communications are actually encrypted, you set up a packet analyzer (e.g. Wireshark) and see that indeed everything is encrypted. Now you’re finished, right?
Wrong! Your client application might use SSL when it talks to a server that supports SSL, but will be willing to work with a non-supporting server without SSL. Since the only purpose of SSL is to secure communications against hackers, this would make SSL worthless—hackers can perform a Man-in-the-Middle attack by setting up a fake server that will communicate with the client in the clear and communicate with the server with SSL. So to test this, you remove the certificate from your server and check that the client stops communicating with the server. Great – now we’re done?
Not even close! The anchor of SSL security is the validation of the server’s certificate. Perhaps the client application is willing to accept a badly signed or expired certificate? So you set up a server with a badly signed certificate, and another with an expired certificate, and check that the client doesn't communicate with either server. Finished?
Not yet! Maybe your application is willing to accept someone else’s correctly signed certificate? Or maybe the application will accept a certificate issued by a non-trusted certificate authority?
To test the implementation of a security feature, such as SSL, it is not enough to test that it works when it should—it’s critical to test that it doesn't work when it shouldn't Otherwise, you haven’t actually tested the feature, because part of the feature’s functionality is that it should not to work in an insecure situation. Defining such “negative” tests—meaning checking that the system stops working when it’s not secured—is much more difficult than defining ordinary functional testing.
This may sound obvious but in reality most implementers of security features simply do not perform such security testing, or at least not enough of it. Researchers at the University of Texas recently published a study (PDF) analyzing dozens of SSL implementations and found that many fail to properly validate the SSL certificates and are thus entirely insecure. This included major applications such as Amazon, PayPal, and Chase Bank, and platforms such as Apache Axis. Another recent study (PDF) found that 8% of sampled Android apps that use SSL fail to properly validate certificates.
These failures would not have occurred if proper security testing had been done. Such security testing isn't pentesting - it’s part of the basic functional testing of security features.