Two-factor authentication

Many online services offer two-factor authentication (2fa) to protect users’ accounts. When you enable 2fa on an account, it means that you still log in with a username and password, but then you have to enter a one-time code before you can access your account. How you get the code (typically a six-digit number) depends on the implementation: twitter sends you the code in an SMS text message, while others (like google and facebook) have you look up the code in a mobile app. Either way, you generally get the code with your phone. The point of this is to make it harder for someone to break into your online account, because they’d have to know your password and have access to your phone.

So logging in with 2fa means taking an extra step which at times can feel like a nuisance. The way I look at it is that it’s a minor inconvenience for me, but it’s a significant inconvenience to someone who wants to steal my account.

2fa isn’t a new innovation. I knew someone in the late 1990s who worked for a government-funded research facility, and he carried around a little device in his pocket. When he needed to log in to one of the facility’s computers, he’d have to look at the code that appeared on the device’s screen and enter that code in order to complete his login process. It worked very much like modern 2fa implementations.

More and more online services are offering 2fa, and I encourage you to start using it wherever you can. The Two Factor Auth (2FA) web site provides of list of who does and who doesn’t offer 2fa login features. This can be a good place to see which of your accounts have 2fa available, and the 2fa site typically has a link to the documentation on how to set up 2fa for each service.

2fa makes it a lot harder for someone to take over an account, but it’s not perfect (and this is the part that might be useful to a writer who needs her main character to defeat a 2fa-protected account). Someone gained control of the twitter account of political activist DeRay Mckesson, an account that had 2fa enabled. The criminal contacted Verizon (Mckesson’s mobile provider) and convinced the billing department that Mckesson’s cell phone number had changed. So SMS messages that should have gone to Mckesson instead went to the criminal’s phone. The criminal then used twitter’s “forgot my password” feature and received an SMS message with the code the criminal needed to complete the account theft.

This is a good reminder of how effective social engineering can be. Some people will do anything to end a phone conversation with an angry-sounding customer. Sometimes the best hacks exploit people, not computers.

By the way, that Naked Security post (near the end) has some tips on how to enable security features on the accounts of several mobile providers, including Verizon. That might or might not have made a difference in DeRay Mckesson’s incident, but it might have made it easier for him to regain control of his Verizon account.

Passwords: salts, hashes

In the previous post we saw that network and web site accounts with reasonable security use hash functions to protect passwords.

But even using a hash function isn’t enough, because the bad guys have rainbow tables. A rainbow table is a list of common passwords (like Password123) and their hash values. So if a site suffers a data breach exposing account data, a simple hash function won’t be much of a barrier, because the criminal can compare the hash values (in the breach data) against a rainbow table and recover many of the weaker passwords.

To counter rainbow tables, sites typically salt users’ passwords: when someone creates a password, the site generates some random characters (the salt), appends that to the user’s password, runs the salted password through the hash function, and then stores the hash value and the salt. When the user tries to log in, the site takes the password they typed, appends the salt stored with the user’s account, sends that through the hash function, and compares the hash values.

Salting hashes sets the bar a lot higher, because the criminal would need to compute a new rainbow table for each password (because each password will have a different salt).

This is why I get frustrated with fiction that makes it look easy to crack passwords. Any account worth hacking will likely be protected by some or all of the following safeguards:

  1. salted and hashed passwords
  2. a password policy enforcing complexity rules (e.g., your password has to be at least eight characters, has to include numbers and punctuation characters, and can’t look too much like a word)
  3. active response locking an account after too many failed attempts
  4. two-factor authentication (you log in with your username and password, but then the site won’t give you access until you enter a code it sends to your cell phone)

Active response in particular makes guessing passwords impractical. If your character is trying to break into a network or web site account, too many failed attempts are going to end up locking the target account. Your character is better off trying to steal the password with a phishing attack, social engineering, using a keylogger, exploiting a flaw in the “forgot my password” feature, or even a security camera pointed at the keyboard.

And an account protected by two-factor authentication is nearly unassailable, because your character would need the target’s password and their cell phone. Social engineering might be best here.

Passwords: this isn’t a game show

Have you ever seen a movie where someone is running a computer program to crack a password (or a missile launch code), and it discovers one character at a time? It looks like a Wheel of Fortune contestant correctly guessing a letter or buying a vowel. This is a trope I think writers and screenwriters should avoid.

Passwords don’t work like Wheel of Fortune. If they did, it would mean that each individual character is stored separately, and it would take around the same (boringly brief) length of time to guess each one.

When you sign up for a account on a web site that has reasonable security, the site takes the password you provided and puts it through something called a one-way hash function. The hash function turns your password into a hash value: it transforms something like

Password123

into something like

b2e98ad6f6eb8508dd6a14cfa704bad7f05f6fb1

or

Password124

into

ae5b6bf3a00dabe4bcb06918044f3032c6e7c80c

Hash functions (there are many of them–I used sha1sum for these examples) have several important features:

  1. It works the same way every time (sha1sum always hashes Password123 to b2e98ad6f6eb8508dd6a14cfa704bad7f05f6fb1).
  2. It’s very difficult to find two passwords that have the same hash value, but it’s not impossible. (This means that a hash value does not uniquely determine a password.)
  3. You generally can’t work backwards from the hash value to recover the password (it’s different from encryption, which allows you to decrypt the encrypted value).
  4. A minor change in the password (changing 3 to 4 above) drastically affects the hash value.

So when you set your password, the site hashes your password and saves the hash value, not the original password. Next time you log in, it uses the same hash function to hash the password you just typed in and compares that hash value to the hash value stored next to your username. If the hash values match, then you typed the correct password, and the site gives you access. If they don’t match, you get the “invalid password” message.

That’s one of the many reasons that the Wheel of Fortune thing is so absurd. The site checking the password you just typed doesn’t even know the original password. It’s checking one hash value against another–the whole thing matches or the whole thing fails. So even if a criminal compromises the site through some security vulnerability and manages to download the username/password database, they get a bunch of stuff they can’t read.

In the next post we’ll see that hashing passwords is better than storing passwords in clear text, but that it’s still not sufficient.

Insecure databases

People love storing information in databases, because databases make it easy to store, sort, and search large amounts of data. Sometimes those databases are not as secure as they should be.

Traditional databases are great for storing structured data, like a list of books. Books are sort of uniform, in that you describe books pretty well with a small set of identifiers (like title, ISBN, author, year of publication, publisher, etc.), and those identifiers don’t change a lot over time or from book to book. A spreadsheet will often suffice for this kind of thing.

Describing people is harder, because people are weird. Consider medical records. Women would need lots of columns men don’t need, and vice versa. A patient with diabetes would have lots of columns not relevant to a non-diabetic. Likewise for a cancer patient.

A relatively new class of database called NoSQL is good at storing records on people and other complicated subjects, because NoSQL databases can store (and sort and search) unstructured data. MongoDB is a popular open-source NoSQL database product.

The idea is that a company installs MongoDB on their server, pours data into it, and writes a web application (or some other kind of interface) to access the data. Earlier versions of MongoDB had some poorly-chosen default settings which would make the database itself directly available over network connections. More recent versions of the software have better defaults, but the damage is done: lots of people installed MongoDB with the network-available default, and they never changed it.

So even if they wrote a web application with good access controls, the database itself might be open to the internet. If the database’s network port wasn’t firewalled, anyone could completely bypass the web application’s access restrictions by connecting directly to the database (and they could download as much data as they wanted).

It’s important to note that this problem is not specific to MongoDB. This could happen with any network-enabled database system. But because of some recent discoveries of internet-accessible MongoDB databases, they’re in the spotlight. The Office of Inadequate Security has reported on several high-profile examples of open MongoDB databases, including a voter registration database with 191 million records. A security researcher named Chris Vickery used Shodan to find these databases.

That bears repeating: an ordinary guy used a search engine to find a database with the voter registration data of 191 million Americans.

All too often people don’t take care of their data. The Office of Inadequate Security reports on data breaches large and small all the time. Sometimes it’s 191 million voter records over a network connection, and sometimes it’s patient records left on a sidewalk next to a trash can when a doctor’s office goes out of business. That site might be a good place to look for inspiration when you’re writing a character that needs to acquire data that wouldn’t (or shouldn’t) be widely available. Whether your character needs to do some port scanning or some dumpster diving, she might be able to get her hands on all kinds of data.

HTTPS is not infallible

When your browser connects to a web site whose address starts with https, you’re connecting to a “secure server.” It’s considered secure, because (at least some of) the traffic between your browser and the web server is encrypted.

This business has a formidable amount of jargon. Your browser connects via one of several types and versions of protocols, and it uses one of many possible ciphers. Newer protocols are more secure than older protocols, and ciphers with longer encryption keys are more secure than ciphers with shorter keys. If an attacker can exploit some protocol vulnerability, he may be able to capture enough information to decipher encrypted data.

When your browser and a web server negotiate a connection, they try to pick the most secure combination of protocol and cipher that they can both understand. If an older browser connects to an up-to-date server, one of two things will happen:

  1. If the server has been configured to support older protocols, the server will use one of those older protocols in order to be able to talk to the browser. This is the  less secure choice.
  2. If the server has been configured not to support older protocols, the browser won’t be able to connect at all (the user will get an error message in their browser). This choice is more secure, but it causes problems for users with older browsers.

SSLLabs has a nifty web page that lets you test a web server. Type the address of your online banking site into the SSLLabs server test page and see how your bank’s site looks. My bank’s site got an F, because it supports older protocols and weak ciphers. The knuckle-dragging server pukes that work for my bank had to choose between requiring strong encryption and getting complaints from customers (and they clearly made the wrong choice).

(SSLLabs also has a page which lets you test your browser, and the browser I use for banking is not vulnerable to the things the page tests. That assuages some of the misgivings I have about using my bank’s web site.)

In the past year or so, several protocol vulnerabilities have been revealed (and corrected). These flaws often have catchy names:

This class of vulnerability is typically exploited by a man-in-the-middle (MITM) attack (see footnote). Imagine that Alice and Bob communicate with each other using written messages which they encrypt using some method that they both know how to decrypt. Alice writes a message, encrypts it, and then gives the encrypted message to a courier named Eve who takes it to Bob (these names are traditional: the courier is named Eve, because she likes to eavesdrop).

If Eve learns how to decrypt the messages, then she (the “man” in this MITM attack) can read what Alice and Bob are saying to each other. Eve could even alter the messages she delivers. In a real example, Alice would be your browser, Bob would be the web server, and Eve is someone who is somehow able to capture the traffic between the two (like someone who has tapped into the network at Alice’s ISP).

A couple of those vulnerabilities (FREAK and logjam) allow the attacker to force the the server and browser to use an older protocol and/or a cipher with a shorter key length than the browser and server might otherwise elect to use. Eve then has an easier time decrypting the traffic that she’s able to capture.

That’s easier, not easy. The traffic is still encrypted, and it takes time and computing resources to break the encryption. There are a couple of things to take away from all of this:

  1. It’s really important to keep your browser up-to-date so that it has the most modern set of protocols and ciphers.
  2. If you’re writing about a character who wants to eavesdrop on a target’s encrypted traffic, the attacker probably has to overcome the formidable obstacles of compromising the target’s network connection and have the computing resources to break encrypted traffic. It might be more believable to have your character try to get the target to fall for a phishing attack that installs a keylogger.

Footnote: Heartbleed is the exception here. That was something that potentially gave the attacker the ability to read the contents of a web server’s memory (which might include the private keys that would decrypt the server’s connections.)

Default passwords

A network router is a device which forwards traffic between two networks. Your computer is on one segment of the internet, and your favorite web site is (likely) on a different segment. There’s at least one network router between you and your favorite web site moving the data packets back and forth.

Routers will typically more-or-less work right out of the box, but they generally need some configuration to do their jobs well (and securely). Routers frequently offer a web interface for this: you connect a computer to the router, go to a particular web address (specified by the product’s documentation), and then configure the device for its particular purpose. For example, if you’re setting up a router for an elementary school, you might configure the router to send all web traffic through some kind of content filter.

More and more devices are like this: you buy a shiny new gizmo, connect it to your network, and it offers some feature you can control with an app on your phone. This is the “Internet of Things” (IoT):

Network-enabled security cameras are another interesting example of this kind of thing. Imagine being able to log on to a camera hundreds of miles away, have it take pictures on demand, and view the images.

These devices typically ship with a default password. And that’s the big problem with these things: they don’t necessarily force you to change the password, and those default passwords are well documented and widely available: they’re in the product documentation that the manufacturer probably puts on their web site for anyone to download.

(Sometimes the manufacturer will try to assign a unique default password to every unit they sell. This is great when they do it right, but sometimes they fail hilariously.)

Shodan and Censys are projects which portscan the internet and make the data available to anyone who wants to look at it. This data often reveals the manufacturer and model number of internet routers. Netgear devices often give the full model number in the remote administration password prompt. And there are web sites (like routerpasswords.com) devoted to making it easy to look up the default password for a particular network device model.

There are two important points to remember here:

  1. If you are writing about a character who wants to compromise a network target, and if she can determine the manufacturer and/or model number of the router protecting her target (either through shodan or by portscanning it herself), she can look up the default password either through something like routerpasswords.com or by downloading product documentation from the manufacturer. If the network pukes at the target haven’t secured their router, your character could add routing table rules allowing her direct access to resources on the internal network.
  2. If you haven’t changed the password on the home router that may be sitting on your desk, now would be a good time to do so. (And unless you REALLY need it, you should disable the remote administration feature which was probably enabled by default.)

Target, Home Depot, Ashley Madison, and third-party vendors

If you are interested in writing about large-scale data and credit card theft, you could look to the Target, Home Depot, and Ashley Madison data breaches for inspiration. Much of what we know about these breaches comes from reporter Brian Krebs. His blog is fascinating, and I recommend it very highly. This post will refer heavily to his reporting.

(This post will refer to Target the retailer and targets of crime. Mind the capitalization to tell the difference.)

The retailer Target was the victim of a large data breach during the 2013 holiday shopping season. Criminals stole credit card information of 40 million customers and personal information (names, email and mailing addresses, phone numbers) of 70 million customers. The numbers here are so large that the thieves had trouble selling all the stolen credit card numbers before banks were able to cancel the credit cards, and some banks had trouble re-issuing cards, because the people who turn plastic into credit cards had a huge backlog of orders. (Target recently agreed to a $39.4 million settlement with banks and credit unions as a result of this breach.)

The picture that Krebs’ reporting paints about the Target breach is that it involved an external HVAC company that worked for Target. Someone at the HVAC company fell for a phishing attack, which probably installed a keylogger or some other malware on that person’s (the HVAC company employee) PC, and this enabled the criminals to acquire login information to servers that Target’s vendors use to interact with Target (for work orders, billing, etc.). The criminals were able to use this access to install malware on the point-of-sale (POS) devices at target stores. (Yes, there are probably several steps missing there, which I don’t understand, either, but it’s not the point of this post.) The POS malware was able to upload credit card data to another compromised server on Target’s internal network, and then that internal server exfiltrated the stolen data (gigabytes of it) to external FTP servers all over the world. (See Krebs’ coverage of the Target data breach for more details.)

Much the same thing happened to Home Depot in 2014. Criminals installed malware on thousands of self-checkout lanes at nearly every Home Depot location. The criminals got away with 56 million credit card numbers and 53 million customer email addresses. As happened with Target, the Home Depot network was initially breached using login credentials stolen from a third-party vendor. (Again, Krebs has more details about the Home Depot data breach.)

Although it didn’t involve credit card theft, the Ashley Madison story is similar. Ashley Madison is a social networking site created with the specific intention of enabling elicit (e.g., extra-marital) affairs. Someone managed to download and publish the account information of many or all of the AM users. Little is publicly known about how that information was acquired, but the CEO of AM’s parent company implied that it was the work of a non-employee who had previously had access to the AM information resources.

The takeaway here is something that might be useful for writing any kind of story about corporate hacking and espionage. In all three of these examples, a confirmed or suspected method of infiltration involved a vendor hired by the target company. Even if the vendor isn’t complicit, the vendor may be a softer target with lower standards of security (or with more access than they really needed). Breaching the vendor may give the attacker a foothold into the larger target.

The worst explanation of networking, ever

(This post is going to introduce a lot of jargon, and I’ll probably refer to it from future posts.)

Making a network connection to a computer involves an IP address (the computer’s address on the internet) and a port number. This is a flimsy analogy, but you could think of it like finding your way into a house: the IP address (the host) is like the street address, and the port number is like which entrance to use (the front door, or the window on east side).

For example, most web sites are on port 80 or 443: 80 is the standard port for web servers, and 443 is the standard port for a secure web site (HTTPS). When you want your browser to display the CNN homepage, your browser figures out the IP address for http://www.cnn.com (which at the moment appears to be 23.235.44.73), connects to port 80 on that host, and begins an HTTP transaction to download the home page.

A firewall is network software that allows or rejects network traffic according to a set of rules. An organization which hosts its own web site might have a firewall which allows the Internet to connect to the web server on allowed ports like 80 and 443, but the firewall would reject other inbound traffic to that host.

A port scanner is a program used to interrogate a host’s ports, looking for services listening on those ports. If you point a port scanner at a web server, it’s likely to tell you that a web service is listening on port 80 (and maybe also on port 443). Port scanners are powerful programs which can be used in many ways. A company might hire a security analyst who would use a port scanner to identify weak points on the company’s network (a good analyst would also give the company some suggestions about how to address those shortcomings). A malicious person might use a port scanner to the same effect but for a different reason: the port scanner can tell the attacker the ports on which services are listening on the company’s hosts, identifying targets he or she might try to compromise.

Sophisticated port scanners can identify specific software products listening on a port, and even the version of that software. If a cybercriminal used a port scanner to determine that a host was running version 2.2.15 of the Apache web server, he could use a search engine to look for vulnerabilities in that version of that product. He might find that that version has a race condition error resulting in a remote execution vulnerability and that someone has published an exploit which takes advantage of that software defect. He could download the exploit and run it against the web server for any number of malicious purposes (like stealing otherwise inaccessible information, sending spam and phishing attacks, etc.).

nmap has been a popular port scanner for a long time (it was first released in 1997). In The Matrix Reloaded, Trinity uses nmap to port scan a host she wants to compromise. nmap identifies a vulnerable version of a network service called ssh, and she then uses an exploit to elevate her privileges on that host. This is a good (and rare) example of credible hacking in popular media: the writer(s) included enough realistic detail to make the scene plausible.

nmap has starred in lots of films.

Statement of purpose

For years I’ve had a ridiculous fantasy of being a fiction writer. It seems that the best-selling novel I want to have written isn’t going to write itself. I’m having trouble getting motivated, so maybe what I need is another distraction: a blog.

I thought that technology in writing might be an interesting theme. Nothing ruins a story for me faster than a character hacking the FBI network after tapping on a keyboard for ten seconds. It probably works for many readers/viewers, but some of us see it as lazy writing.

In my day job I write lots of web applications for a public university. Many of my assignments are to convert paper processes into online forms. My job also involves a fair bit of Linux server administration. Most of this goes on the open Internet and is subject to daily cyber-attacks from all over the world (my server logs once revealed malicious traffic from Antarctica).

So the purpose of this blog has a couple of goals. One is to get me in the habit of writing. But I thought it might be useful to share some of what I’ve learned in a format that may be helpful to other prospective writers. I may also write about how technology can affect a writer. Here are some topics I have in mind:

  • credible hacking
    • port scanning
    • realistic exploitable security vulnerabilities
    • case studies of actual security breaches (like Target)
  • a writer’s technology
    • safe(r) Internet use (account security, security-related Firefox extensions, password managers)
    • affordable and effective backups
    • writing tools like scrivener and wordpress (I know a fair bit about the latter and would like to learn more about the former)
  • the day-to-day life of a web programmer
    • server administration is not sexy
    • the importance (and challenge) of making web sites accessible
    • the horrors of working with vendors and ticketing systems

This blog may at times earn a PG-13 rating. I’ll mostly keep it clean, but there may be the occasional bit of salty language.

I’ll try to post every seven to fourteen days (historically I’ve really struggled with self-imposed routines like that), and I’ll try to keep individual posts fairly short (preferring to break up longer topics into multiple posts).