It's one huge mountain of numbers.
Evaluating death rates -- in three disease categories, in 18 age ranges each for men, women and total population, from 746 municipalities in 14 counties of southwestern Pennsylvania over a nine-year period, and compared with three national databases -- has produced enough numbers to make a mathematician cry uncle.
The mapping process got under way more than a year ago with a calculator, blank sheets of paper and photocopied maps in response to periodic reports that the Pittsburgh region still has high pollution rates.
While pollution mortality is a robust science, results for the region were lacking. So we decided to do our own calculations after discovering the state Department of Health's database of municipal mortality rates.
When interesting patterns emerged on makeshift maps, we used computer spreadsheets to tally mortality rates for all 746 municipalities from 2000 through 2007 available from the state database.
The goal was to produce maps showing heart, respiratory and lung cancer death rates, as compared with state averages, in the 746 municipalities. We saw mortality patterns that seemed to have overlap with major sources of pollution. It now was time to get responses from experts.
Several respected epidemiologists who viewed the maps -- Cliff Davidson, then at Carnegie Mellon University and now at Syracuse University; Conrad Dan Volz and George Leikauf at the University of Pittsburgh Graduate School of Public Health; and Eugene Weinberg at the state Department of Health -- generally found results to be intriguing, and in some cases suggestive of a possible link between major stationary sources of pollution and higher mortality rates. But they agreed that the maps had one serious problem.
Southwestern Pennsylvania's population is one of the nation's oldest. An elderly population, by definition, dies at higher rates from these diseases. Pockets of excess death rates could reveal nothing substantial until the age factor was eliminated.
Dr. Weinberg referred us to Carol K. Redmond, another Pitt Graduate School of Public Health epidemiologist with expertise in pollution, who referred us to Pitt epidemiologists Evelyn Talbott and LuAnn Brink. They recommended we adjust the data to account for age and, while we were at it, calculate death rates by gender.
The two taught us how to do an "indirect age adjustment," a statistical exercise in which the population of each municipality is analyzed by each age groups provided in the 2000 Census. The expected number of deaths for each age group is calculated based on how many people in that age group live in the municipality. The expected death rate for each group is tallied to produce a total expected death rate for each municipality, which then can be compared to actual death totals.
National mortality rates for heart and respiratory diseases were acquired through the U.S. Centers for Disease Control and Prevention's National Center for Health Statistics or CDC/NCHS. From that database, we used National Vital Statistics System: Mortality.
We also used identical disease categories based on the International Statistical Classification of Diseases and Related Health Problems, or ICD-10 codes, to assure apple-to-apple comparisons for diseases of the heart, chronic lower respiratory disease (including chronic bronchitis and chronic obstructive pulmonary disease) and lung and bronchus cancers. These ICD-10 categories are the same used by the state health department for municipal mortality rates.
For heart and respiratory disease, we used the average annual national rates from 2002 through 2006 and compared them with an annual average of local municipal death rates from 2000 to 2007.
For national lung and bronchus cancer rates, we used a National Cancer Institute database known as the Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) -- more specifically the SEER*STAT Database: Mortality. From SEER we used estimated 2004 rates of death for those cancers based on national rates from 2001 through 2003.
Age and gender adjustment required two more months and a study redesign. Tim Dunham, from the Post-Gazette technology staff, produced our spreadsheets, which required writing computer programs to retrieve needed data from databases.
Those maps were completed in March.
But the first design revealed an error.
We had combined categories and added death rates, thinking it would reduce the number of categories. That violated principles of statistical analysis: Rates and categories cannot be combined.
Once Dr. Brink figured out our error, and after a mild scolding and some professorial lessons, we corrected our data by using all 18 five-year age-group census categories as compared with national death rates for identical age categories. Mr. Dunham produced a new group of spreadsheets.
Each time spreadsheets are produced, hand calculations are done to test accuracy. Results were sent to Post-Gazette graphics artist James Hilston who produced maps based on the new data.
With the new maps, patterns were difficult to discern mainly because pockets of high mortality rates were pervasive region-wide. That's when Dr. Davidson suggested we plot all major sources of pollution, not just coal-fired power plants, to fill out the picture of air pollution in the region.
To do that, we used state Department of Environmental Protection and U.S. Environmental Protection Agency databases of major stationary sources of pollution and plotted 166 factories, power plants and other sources of pollution on our maps.
Then in April, the state Health Department posted 2008 mortality numbers, which we incorporated into our spreadsheets to update our study. That was completed in June.
A final spreadsheet that Mr. Dunham completed totaled expected mortality rates for each disease category and compared results with actual deaths to produce a total-risk map.
Adjustments had been made to factor out any differences attributable to age. We had mortality numbers for men and women. Fewer females work in factories where pollution is produced, so their disease rates more readily reflect public exposure.
Actual vs. expected death rates were mapped to show whether the municipality was above, below or equal to the national average.
Only then could we begin analyzing what the maps showed. Only then could Laura Schneiderman of the Post-Gazette online staff begin making 126 interactive maps and a total-risk map for our website at www.post-gazette.com, based on maps produced by Mr. Hilston. Production of those maps were completed over five months.
One cautionary note: Health Department statisticians say 20 deaths are necessary before a municipal mortality rate generally can be considered statistically significant. If there are too few deaths, they recommend adding more years or expanding the geographic area until the number reaches 20.
Dr. Talbott said confidence intervals -- necessary to determine whether statistical conclusions are valid or more likely caused by chance -- should be done for each county if not for each municipality.
When consulting the maps, make sure the average death rate is at least 2.23, which multiplied by 9 years falls just above the minimum 20 deaths. If the average rate is below 2.23, it's almost a guarantee that the municipality is too small for the number to be significant.
In the total-risk spreadsheet, we incorporated results for smaller boroughs into larger surrounding townships to provide a stronger statistical basis for analysis.
Statistical significance is important, as the following example illustrates:
If a municipality has a population of only 100 and one person dies of respiratory disease, that would reflect an enormously high rate of death when compared with the standard national rate of 43 deaths per 100,000 people. In this case, one death would produce a mortality rate of 1,000 deaths per 100,000 population, What might sound like a major health problem actually is a statistical anomaly produced by a population too small to draw valid conclusions.
We produced confidence intervals for the 14 counties whose results are stable and significant. We used the Standard Mortality Rate method or SMR, to calculate confidence intervals.
Acknowledged weaknesses of the Post-Gazette ecological study include the inability to incorporate smoking rates of individual municipalities, along with other lifestyle and socioeconomic factors. Epidemiologists who reviewed our maps and methodologies point to those limitations. But most said the maps serve as an interesting first look at pollution and mortality patterns.
These maps cannot reveal what caused these deaths nor provide a direct link between mortality and sources of pollution. They also cannot say whether pollution was a factor in these deaths that also could be caused or aggravated by smoking, socioeconomic factors, traffic patterns, lifestyle and genetics.
But the maps, along with available scientific research on pollution mortality, do raise issues about potential pollution impacts. Existing scientific models predict similar results based solely on the region's pollution levels.
In reviewing locations of sources of pollution and municipal mortality rates, epidemiologists suggest looking along river valleys where plants exist and pollution can collect. Also look generally in a northeastern direction from the plants, which would reflect the direction in which prevailing wind patterns most often carry pollution.
Prevailing winds also can originate from the west or northwest, depending on time of year, the National Weather Service states. Winds can blow in any direction, which means pollution can also. Pollution levels locally and out of state combine to form a pollution stew, making it all the more difficult to point to any one source as the cause of any particular disease.
So the online Post-Gazette maps are the result of numerous prototypes that underwent corrections, revisions, updates and reviews.
We are aware that they, as with any study, are open to debate, criticism and analysis. We would suggest that each person review them with caution and draw his or her own conclusions based on the data provided, known study limitations and with regard to other facts provided in the full Post-Gazette series.
The ultimate hope? That the study, the maps and stories will lead to more inquiry about our knowledge of pollution in the region.