EmailEmail
PrintPrint
CMU researcher adapts e-mail protocol for large-scale analysis of gene activity
Monday, December 12, 2005

Sending an e-mail is a simple enough task for most computer users, but it's not nearly so simple for the computer itself.

E-mail software, for instance, checks each message to see if errors have crept in during transmission. Any human would find this standard Internet protocol to be mind-numbingly repetitive and time-consuming, but the computer handles it with such aplomb that users don't even notice.

Now a researcher at Carnegie Mellon University has borrowed this common protocol and adapted it for use by biologists for analyzing the activity of thousands of genes.

Just as the original Internet protocol can tell a computer that an e-mail contains a mistake so that it can be re-transmitted, the method devised by Ziv Bar-Joseph can help biomedical researchers sift through information on thousands of genes gathered with a technology known as DNA microarrays. The method can help identify where data might be missing or red-flag data that should be discarded.

A report on the method by Dr. Bar-Joseph and his collaborators at Hebrew University in Israel was published online last week by the journal Nature Biotechnology.

"I think there's going to be a lot of interest in it," said Dr. Naftali Kaminski, director of the Simmons Center for Interstitial Lung Disease at the University of Pittsburgh School of Medicine. "His method is very interesting; I think it works," he added, though it still needs to be validated in a large-scale study.

Dr. Kaminski will be working with Dr. Bar-Joseph on a future project, applying the method to a study of pulmonary fibrosis, a disease that thickens and stiffens lung tissue. Using DNA microarrays, Dr. Kaminski will monitor the activity of 30,000 genes once every three months for two years.

Scientists hope that the study of gene expression -- which genes are turned on and which ones turned off, and how that pattern changes over time -- will highlight the causes and progression of diseases and provide insight into diagnosing and treating them.

But the amount of data generated -- Dr. Kaminski's new 60-patient study will generate no less than 14.4 million data points -- is so humonguous that biologists increasingly are borrowing tricks from computer scientists.

The Internet check-sum protocol used for e-mail is straightforward, since all digital data is a series of 1s and 0s. The protocol is designed to show if a 1 is inadvertently switched to a 0, or vice versa.

It works something like this: Each computer "word," or byte, includes eight digits, or bits -- such as 1001110 or 01011001. Seven bits of each word might be reserved for "message" with the remaining bit serving as check. If the sum of the number of 1s in the first seven bits is odd, the value of the final digit would be set at 1; if even, the value would be set at 0.

If an error occurs in transmission, shifting the value of one of the bits, the number of 1s won't match the final bit. It's not a perfect system -- it's possible, for instance, that the value of two bits might change -- but it usually works well.

The protocol isn't quite so straightforward for DNA microarrays; rather than a simple 1 or 0, yes or no, black or white, value, the microarrays show a range of activation levels for each gene.

The method devised by Dr. Bar-Joseph for analyzing the microarray results involves creating a sum of the activation results for each gene; this sum is then compared with an average activation level for that gene. This average can be obtained, if necessary, by performing one more experiment -- profiling a mixture of samples.

If the sum of the results is less than the average, there's a good chance that some gene activation was missed, Dr. Bar-Joseph said. That can occur when the gene is activated for a shorter period than the time interval between samples. If the sum is greater than the average, then there's a good chance that something about the sampling method was causing activation unrelated to whatever disease or condition is being studied.

About 40 percent of DNA microarrays are used for doing this type of time-series studies of gene activation, Dr. Bar-Joseph said, and DNA microarrays already are a multimillion dollar industry, so his method potentially could be widely used.

"I'm really hopeful they will see it as a solution to this problem," he said.

First published on December 12, 2005 at 12:00 am
Post-Gazette science editor Byron Spice can be reached at bspice@post-gazette.com or 412-263-1578.