Pittsburgh, PA
Monday
November 9, 2009
    News           Sports           Lifestyle           Classifieds           About Us
Nation & World
 
Consumer Rates
Flight 93
Headlines by E-mail
Home >  Nation & World >  U.S. News Printer-friendly versionE-mail this story
U.S. News
Privacy in age of data mining topic of workshop at CMU

Friday, March 28, 2003

By Byron Spice, Post-Gazette Science Editor

A Pentagon initiative to find terrorists by sifting through computer databases has caused an outcry among privacy advocates, but the problem of safeguarding personal information isn't restricted to the military's Total Information Awareness program.

Even when identification, such as names and Social Security numbers, are stripped from medical records or other computerized information, it can be all too easy to infer identities by combining the remaining information with other databases, said Latanya Sweeney, director of the Data Privacy Laboratory at Carnegie Mellon University.

That makes privacy a concern even when the analysis isn't intended to identify or track any individual, as is the case for the Real-time Outbreak and Disease Surveillance program being developed at the University of Pittsburgh and Carnegie Mellon as an early warning for bioterrorism.

Scientists from the Defense Advanced Research Projects Agency, which runs the controversial Total Information Awareness program, are among several dozen researchers gathered for a two-day workshop on data privacy that concludes today at Carnegie Mellon.

"What people are worried about is that information will be used against them to deprive them of their individual rights," said DARPA's Doug Dyer. But the system being designed by DARPA would not compile dossiers on every American; rather, it would look for behaviors and activities that terrorists must engage in to launch attacks.

"We just can't think of another way of preempting terrorist attacks without knowing about them in advance," Dyer said.

The U.S. Senate, citing the privacy concerns, last month voted unanimously to hold up funding for the program, which is directed by John Poindexter, former national security adviser to President Ronald Reagan and a figure in the Iran-Contra scandal.

"We're developing technology," Dyer said, not setting up a surveillance program. If the DARPA technology looks promising for identifying terrorists, it will then fall to Congress and others to decide whether it is deployed, who would use it, or what data might be used. Research thus far has been restricted to databases already gathered by U.S. intelligence agencies and to "synthetic" databases generated to see if the computer software could pick out terrorist-associated activities from the mass of data.

Ted Senator, another DARPA scientist, emphasized that these systems are not black boxes into which data is fed and which then spit out lists of suspects. Rather, the system would involve many layers at which potentially useful information is highlighted and passed on for further analysis and possible investigation. Each step of the process, he maintained, would be subject to safeguards and legal reviews.

Moreover, the system is being designed with a number of audit features to catch analysts who use the data inappropriately, or pass it on to unauthorized individuals or groups, Dyer said.

Data privacy is a growing concern, said Sweeney, whose lab is co-sponsoring the workshop along with Carnegie Mellon's Aladdin Center. The ability to generate and store computerized information on individuals is rising dramatically.

Her own research has shown that 87 percent of the U.S. population can be uniquely identified based just on gender, birth date and five-digit ZIP code. In one study, she found that by linking medical records -- stripped of names but including gender, birth dates and ZIP codes -- gathered by a governmental group, with voter registration records for Cambridge, Mass., she was able to identify the medical records of 97 percent of the 55,000 voters.

Though the Total Information Awareness program has generated fears of government snooping, efforts to use emergency room data to provide an early warning for bioterrorism are also subject to data privacy concerns.

Andrew Moore, a Carnegie Mellon computer scientist who is one of the researchers working on the Pitt/CMU Real-time Outbreak and Disease Surveillance system, noted that the research is showing that just looking at aggregate numbers -- the total number of emergency room patients from Squirrel Hill, the total number of respiratory disease patients -- may not be enough to identify an attack with biological weapons.

More specific information -- the number of people with skin problems in Fox Chapel, or the number of children with flu-like symptoms from Squirrel Hill -- might be necessary to detect an outbreak of anthrax or other biological agent in time to prevent deaths.

The Real-time Outbreak and Disease Surveillance system and other bioterror monitoring systems are meant to identify outbreaks, not patients, DARPA's Senator said.


Byron Spice can be reached at bspice@post-gazette.com or 412-263-1578.

Back to top Back to top E-mail this story E-mail this story
Search | Contact Us |  Site Map | Terms of Use |  Privacy Policy |  Advertise | Help |  Corrections