IRS Breach Upgraded — 3x More Severe; Data Fusion Risk Highlighted
In what is becoming a common pattern, the May 2015 IRS customer data breach reported back in May has been upgraded to an estimated 334,000 taxpayer accounts — about three times the previous appraisal. While the increased scale of the breach is bad news enough, we’d like to draw attention to a serious categorial point about data breaches we’ve raised before but which has been little-appreciated: the additional risks of data fusion.
From the article:
To enter the Get Transcript system, the user must correctly answer multiple identity verification question[s]. The hackers took information about taxpayers acquired from other sources and used it to correctly answer the questions, allowing them to gain access to a plethora of data about individual taxpayers…
Hackers love authentication-based systems because it’s very difficult to distinguish between “the good guys and the bad guys” when someone is trying to get in, said Jeff Hill of STEALTHbits Technologies, a cyber security company.
We define data fusion as the synthesis of information about consumers (or other entities) from different sources, regardless of whether each source is the product of a breach or otherwise (e.g., brokered or “mined” information from public sources). Once such records can be correlated across multiple sources, the power of the information potentially rises many-fold in the event of, or subsequent to a breach (i.e., in making use of information gleaned from one). Yet, the risks of such data fusion in breaches have been little studied, or worse, rarely-considered by the potential targets.
In the case above, we see how a seemingly-solid authentication system based on consumer credit history-style verification questions can be punched-through like tissue paper once hackers have acquired some of this background information from other sources. And since this style of verification is probabilistic (with questions and answers selected randomly from a pool of valid financial history), in a case like this were hundreds of thousands of accounts were successfully breached, this implies that the hackers actually have consumer background information many more — possibly millions — of people. This information can possibly be used elsewhere, in further breaches of systems holding sensitive consumer financial records.
The lesson for businesses and other potential targets: realize that you are not the only potential point of pilfered customer information in, or creating a risk for a breach, and if possible, integrate direct communication-based verification methods with your customers. Otherwise, you may find your company subject to expensive breaches — even without a single flaw or compromise in your own system’s security.
This is a frustrating, recurring problem. It’s been a while since I considered authentication systems “based on consumer credit history-style verification questions” to be solid. Requiring, e.g., the name of a street, pet, or relative used to be a simple way to add a layer of security without having to memorize anything new. But this sort of authentication challenge provides an increasingly false sense of security because information is now too easy to get by phishing or otherwise.
As a former government employee with a SSBI clearance, I assume (as a consequence of the recent OPM hack) that unsavory people have access to the names of virtually every street I’ve ever lived on, the full names of my relatives, etc. For most people today, the safest approach here is to lie–i.e., provide false, altered, or imaginary answers at the outset. For example, give the name of the street your best friend lived on, or your fantasy first car (e.g., De|0rean). If you get creative, they’re easy to remember.