11 March 2005—The U.S. database industry is under
a legal microscope following the pilfering of
information that could allow thieves to steal the
identity of hundreds of thousands of people. In a
hearing yesterday, senators threatened legislation
to regulate large brokers of financial and other data
such as Lexis Nexis, Bank of America, and
Choicepoint—all of whom have disclosed major thefts
in the last two months. It was the incident at
Alpharetta, Ga.-based Choicepoint that kindled the
current concern in Washington, D.C. In mid-February the
company, whose data is used to check the legitmacy of
the potential customers of other companies, revealed
that it had been tricked into selling the records of
145 000 people to thieves posing as legitimate
ChoicePoint customers.
But why should an identity thief bother with an
expensive charade? Carnegie-Mellon University associate
professor of computer science, Latanya Sweeney, has
found an even simpler way than paying a company in
the personal database industry, which critics say is
underregulated. She's found a way to extract all the
data she wants for free from the World Wide Web. For
over a decade, Sweeney has been exploring the
intersection of technology and privacy. Her latest work
builds on earlier Web-searching tools that create
software agents to extract names, address, birth
dates, and Social Security numbers from résumés posted
online—everything you need to apply for a new credit
card in someone else's name. Sweeney will report her
findings at a symposium devoted to national security
sponsored by the American Association for Artificial
Intelligence and held at Stanford Univeristy, in
California, 21 - 23 March.
With her software, Sweeney can gather the key data
with just a little Web surfing. She starts with a
filter that searches for documents likely to be résumés
and then extracts the key data values—name, social
security number, address, and date of birth. Résumés
are found in a two-part process: first, a program
Sweeney wrote last year finds long lists of names. Then
a specialized Google search filter looks for résumés
associated with those names that contain Social
Security numbers.
Social Security numbers and the other needed
fields, such as birth date, are isolated using a
combination of techniques. For example, dates can be
formatted in several different ways, but there are
now standard techniques for parsing them. If a résumé
has all the needed data except a birth date, the
software grabs it from one of the many sites that
offer them, such as Anybirthday.com. Social Security
numbers have a distinctive format: nnn-nn-nnnn. Another
program of Sweeney's, SSN Watch, checks the numbers
that are found.
How important are those Social Security numbers?
Last September, the commissioner of the U.S. Federal
Trade Commission told Congress that they play "a pivotal
role in identity theft. Identity thieves use the
Social Security number as a key to access the
financial benefits available to their victims."
Obviously, if people are posting their Social
Security numbers to the Web, and if doing so leaves
them highly vulnerable to identity theft, then they
ought to stop. Sweeney's work addressed that issue.
The Identity Angel project, which she launched
earlier this year, looks for e-mail addresses in those
résumés, and sends individuals automated notices
that their identity information was found online.
She says a follow-up study showed that more than 90
percent of the people subsequently removed the
information from the Web.
Nonetheless, even with a digital Samaritan
patrolling the ether, U.S. identities remain at risk.
A November study by the U.S. Government Accountability
Office found that "Social Security numbers appear in
any number of records exposed to public view almost
everywhere in the nation, primarily at the state and
local levels of government."
The GAO reported that many states and hundreds of
the nation's 3141 counties put Social Security numbers
directly on the Internet and that "this could affect
millions of people." The agency concluded that the
risk of exposure for Social Security numbers in
public records "is highly variable and difficult for any
one individual to anticipate or prevent."
That risk is much lower across the Atlantic, where
a 1995 European Union directive on data privacy
ensures that personal data is kept secret by default.
According to Stephen J. Kobrin, a professor of
multinational management at the University of
Pennsylvania, in Philadelphia, this represents a
fundamental difference between the United States and
Europe. "In America privacy is seen as an alienable
commodity subject to the market," he wrote in 2002
report. In contrast, he says, in Europe, privacy is
considered to be "a fundamental human right." Not
only do explicit privacy statutes exist there, but they
are also enforced by dedicated regulatory agencies.
In other words, the current U.S. crisis of
identity theft is a result of policy choices that
Americans have made, sometimes implicitly, sometimes
explicitly. They are choices that can be revisited anytime.