Insecure databases

People love storing information in databases, because databases make it easy to store, sort, and search large amounts of data. Sometimes those databases are not as secure as they should be.

Traditional databases are great for storing structured data, like a list of books. Books are sort of uniform, in that you describe books pretty well with a small set of identifiers (like title, ISBN, author, year of publication, publisher, etc.), and those identifiers don’t change a lot over time or from book to book. A spreadsheet will often suffice for this kind of thing.

Describing people is harder, because people are weird. Consider medical records. Women would need lots of columns men don’t need, and vice versa. A patient with diabetes would have lots of columns not relevant to a non-diabetic. Likewise for a cancer patient.

A relatively new class of database called NoSQL is good at storing records on people and other complicated subjects, because NoSQL databases can store (and sort and search) unstructured data. MongoDB is a popular open-source NoSQL database product.

The idea is that a company installs MongoDB on their server, pours data into it, and writes a web application (or some other kind of interface) to access the data. Earlier versions of MongoDB had some poorly-chosen default settings which would make the database itself directly available over network connections. More recent versions of the software have better defaults, but the damage is done: lots of people installed MongoDB with the network-available default, and they never changed it.

So even if they wrote a web application with good access controls, the database itself might be open to the internet. If the database’s network port wasn’t firewalled, anyone could completely bypass the web application’s access restrictions by connecting directly to the database (and they could download as much data as they wanted).

It’s important to note that this problem is not specific to MongoDB. This could happen with any network-enabled database system. But because of some recent discoveries of internet-accessible MongoDB databases, they’re in the spotlight. The Office of Inadequate Security has reported on several high-profile examples of open MongoDB databases, including a voter registration database with 191 million records. A security researcher named Chris Vickery used Shodan to find these databases.

That bears repeating: an ordinary guy used a search engine to find a database with the voter registration data of 191 million Americans.

All too often people don’t take care of their data. The Office of Inadequate Security reports on data breaches large and small all the time. Sometimes it’s 191 million voter records over a network connection, and sometimes it’s patient records left on a sidewalk next to a trash can when a doctor’s office goes out of business. That site might be a good place to look for inspiration when you’re writing a character that needs to acquire data that wouldn’t (or shouldn’t) be widely available. Whether your character needs to do some port scanning or some dumpster diving, she might be able to get her hands on all kinds of data.

HTTPS is not infallible

When your browser connects to a web site whose address starts with https, you’re connecting to a “secure server.” It’s considered secure, because (at least some of) the traffic between your browser and the web server is encrypted.

This business has a formidable amount of jargon. Your browser connects via one of several types and versions of protocols, and it uses one of many possible ciphers. Newer protocols are more secure than older protocols, and ciphers with longer encryption keys are more secure than ciphers with shorter keys. If an attacker can exploit some protocol vulnerability, he may be able to capture enough information to decipher encrypted data.

When your browser and a web server negotiate a connection, they try to pick the most secure combination of protocol and cipher that they can both understand. If an older browser connects to an up-to-date server, one of two things will happen:

  1. If the server has been configured to support older protocols, the server will use one of those older protocols in order to be able to talk to the browser. This is the  less secure choice.
  2. If the server has been configured not to support older protocols, the browser won’t be able to connect at all (the user will get an error message in their browser). This choice is more secure, but it causes problems for users with older browsers.

SSLLabs has a nifty web page that lets you test a web server. Type the address of your online banking site into the SSLLabs server test page and see how your bank’s site looks. My bank’s site got an F, because it supports older protocols and weak ciphers. The knuckle-dragging server pukes that work for my bank had to choose between requiring strong encryption and getting complaints from customers (and they clearly made the wrong choice).

(SSLLabs also has a page which lets you test your browser, and the browser I use for banking is not vulnerable to the things the page tests. That assuages some of the misgivings I have about using my bank’s web site.)

In the past year or so, several protocol vulnerabilities have been revealed (and corrected). These flaws often have catchy names:

This class of vulnerability is typically exploited by a man-in-the-middle (MITM) attack (see footnote). Imagine that Alice and Bob communicate with each other using written messages which they encrypt using some method that they both know how to decrypt. Alice writes a message, encrypts it, and then gives the encrypted message to a courier named Eve who takes it to Bob (these names are traditional: the courier is named Eve, because she likes to eavesdrop).

If Eve learns how to decrypt the messages, then she (the “man” in this MITM attack) can read what Alice and Bob are saying to each other. Eve could even alter the messages she delivers. In a real example, Alice would be your browser, Bob would be the web server, and Eve is someone who is somehow able to capture the traffic between the two (like someone who has tapped into the network at Alice’s ISP).

A couple of those vulnerabilities (FREAK and logjam) allow the attacker to force the the server and browser to use an older protocol and/or a cipher with a shorter key length than the browser and server might otherwise elect to use. Eve then has an easier time decrypting the traffic that she’s able to capture.

That’s easier, not easy. The traffic is still encrypted, and it takes time and computing resources to break the encryption. There are a couple of things to take away from all of this:

  1. It’s really important to keep your browser up-to-date so that it has the most modern set of protocols and ciphers.
  2. If you’re writing about a character who wants to eavesdrop on a target’s encrypted traffic, the attacker probably has to overcome the formidable obstacles of compromising the target’s network connection and have the computing resources to break encrypted traffic. It might be more believable to have your character try to get the target to fall for a phishing attack that installs a keylogger.

Footnote: Heartbleed is the exception here. That was something that potentially gave the attacker the ability to read the contents of a web server’s memory (which might include the private keys that would decrypt the server’s connections.)