Home

About Green Audit

Green Audit Services

Publications

Papers

Epidemiology

Science & Policy Making

Political Theory

Azande Science

Links

 

SMALL AREA CANCER EPIDEMIOLOGY

FOR THE CITIZEN

AN INTRODUCTION

Presentation to the Citizen Epidemiology conference North Western University, Illinois, 21/05/05

Chris Busby PhD

 

This paper is available as a .pdf here:  Small Area Epidemiology for the Citizen

 

Green Audit

Aberystwyth 2005

 

Introduction

I have argued elsewhere, in my book Wings of Death (1995) that cancer is a genetic disease and people developed cancer because they had been exposed to some environmental agent or carcinogen that caused the cancer. And there is no doubt that it has increased remarkably in the last 30 years. I calculate that in the UK the age standardised rates have increased for all cancers overall by more than 30%. We cannot walk down any high street in the country now without noticing the huge expansion in the numbers of cancer charity shops. To use my friend Molly's black joke, this is a new growth industry. Everywhere people are sporting ribbons, usually flowers: pink, white, and yellow. In Wales, where I live, St David's Day (the patron saint) has become transmuted into Marie Curie Cancer Care Day, an irony in the context of the real cause of cancer, since Marie Curie's discoveries are largely behind the increases. Here everyone sports artificial daffodils to display both their Welshness and the evidence of their decision to defeat cancer, a curious marriage of beliefs. This is because now few remain untouched by the disease. Aunts, husbands, mothers, children, pet dogs, canaries, all have been kissed by death and carried off. Everywhere these tragedies are the reality behind the pink balloons and shrill requests for money and yet more money to defeat the new plague. There has been some mistake about the Age of Aquarius, the water carrier: this is the age of The Crab.

The massive increase in the incidence of the disease in the last thirty years has been ascribed by the establishment of epidemiologists working for the cancer research foundations and the government epidemiologists -desperately trying to reassure- to increasing life span. I have such an epidemiologist as Chairman of my Ministry of Defence Depleted Uranium committee. Dr David Coggon took over as head of the Medical Research Council Epidemiology Unit in Southampton University where the previous Director was Martin Gardner, the man who confirmed the Sellafield leukaemia cluster. Gardner died at the age of 50, of cancer, oddly and conveniently before the Seallafield Court Case came to trial. The trial failed because the chief witness for the prosecution was dead. Coggon, his successor, closed down the research into Sellafield. In 1994 Coggon and a colleague, Hazel Inskip, wrote a paper in the British Medical Journal denying the cancer epidemic, displaying graphs with a logarithmic axis so that increases in the disease are straightened out.

The philosopher Mary Midgley has remarked that Science is the new religion. This is especially true in the area of belief and truth with regard to evidence that the human race is slowly being polished off by toxins which it produces itself in order to get rich. This is the hole we have dug for ourselves and we are now falling into it. As we do so, these people, the epidemiologists, show us figures that prove that nothing is wrong. Suddenly your friend disappears. A neighbour's daughter vanishes. Look! There's a child with no hair. Never mind: this is all anecdotal evidence. It can be ignored. The numbers show that there is no problem. It is a random cluster. No one need weep.

It was the beginning of the 20th century when numbers and their analysis began to replace religion as the ultimate revelation of truth. The Victorians had been increasingly realising that their 'givens', their world structures of certainty, based on a combination of God and the Empire, the rich man in his castle and the poor man at his gate had no substance in rational analysis. Empirical measurement had done for belief. Everyone was descended from apes, even the heroes and queens. It was worse. Our actions were not even decided by us, but by the strange (and embarrassing) invisible antics of our subconscious. Remember Matthew Arnold's complaints in Dover Beach?

The sea of faith

Was once too, at the full, and round earth's shore

Lay like the folds of a bright girdle furled

But now I only hear

It's melancholy, long withdrawing roar

What was left?

Confused alarm of struggle and flight,

where ignorant armies clash by night?

Well actually no, all was not lost and not everyone's army was ignorant. Numbers could be used to fortify the establishment. Hurrah! Even if the poor could now justifiably argue that everyone was equal, the rich still had mathematics. They could continue to say, Ho Varlet! The data shows this and that objectively! We are objectively better than you: that is why we are rich! That is why we own the castles and you remain at your gate. It is because you can't do statistics. You don’t understand economics, the technique which objectively places us at the top and you at the bottom. The philosophical movement associated with this latest development in the science of keeping the mob in its place was that of the Vienna School: logical positivism. If it couldn’t be tested it was meaningless. If it could be tested (mathematically, statistically) it must be true. If everything was up for grabs, and everything was relative, then all that was left was the objectivism of mathematics. Numbers are neutral and the only truths were in numbers!

Nowadays, of course, all this has collapsed. There are two main reasons; the first has developed from the assaults on the concept of objectivity associated largely with the French philosophers, Foucault et al.; the second is more mundane. In the last ten years only, the tremendous advances in technology have enabled the poor (in which category I include myself) to get access to awesome computing power in the form of the PC. This enables us to undertake tasks, which, before 1990, would have needed a mainframe University Computer and a whole department of programmers. I can use free software available on (or ripped off from) the Internet, and cheap commercial software to analyse a database containing 20 million cells. I can present my results in a report that looks as good as (often better than) those produced by an entire government department. I can send this report by email to hundreds of people by pressing a button. Or I can put the report on the Internet as a free resource. Thank You, Bill Gates!

We are asking the following question. As the biosphere fills up with the poisonous byproducts of progress, to what extent are human beings suffering? Or put another way; is it worth being rich if our children die? Well, of course, the people who are getting rich, and the governments who are afraid of being poor don’t want any messages of doom from the public health sector. To pursue business-as-usual we have to believe that there are no cancer increases; hence David Coggon, a safe pair of hands, is put in charge. In this battlefield, the weapons are the guns of epidemiology and these need to be loaded with data and that is why we have now to look at both these items. We need to examine these tools and methods of cancer epidemiology, and mainly small area cancer epidemiology, to enable us to follow the evidence and counter arguments that exist. Because it is so important, I want to provide in this essay, a basic template for small area epidemiology, a kind of 'Peter and Jane carry out a Study' so that anyone can examine the data on health effects in a small area near some putative source of risk and blow the whistle if there is a problem; or at least understand what is involved. Too often, activists near some nuclear plant or landfill call attention to what seems to them a clear excess of illness, only to be fobbed off (and the newspapers also) by a statistician from the establishment, defending the status quo. A whole arsenal of bogus statistical methods has been developed for this purpose. It is time we drew attention to this development and fought back; and so I present some simple methods for analysing data and determining whether it shows important indicators of harm, or effects that are just a consequence of the natural play of chance.

 

 

 

 

 

 

Epidemiology

Was it bad luck that caused both my aunts to die of cancer, one of them in her 40s? How could I tell? This is the question of the probabilistic basis of causality. It can be seen as a long chain of gates through which various necessary preceding events have to pass for the final event (we are interested in) to happen. These gates are called 'binomial' gates since they define two possibilities: Yes and No, pass or remain behind. The Victorian statistician Francis Galton devised a simple machine which he called the 'quincunx' to illustrate how such binomial decisions result in what is called a Normal Distribution, the well known bell-shaped curve. I show this in Fig 1

Two years ago my 3rd daughter Frances was knocked over on a zebra crossing and badly injured. This must be quite a rare event. For it to have happened there were many gates through which antecedent events had to pass. The van that hit her was driven by an old man with poor response and bad eyesight: two gates, not passed by a young man with good eyesight. It was dusk: another gate. The crossing was sited near a crossroads and a panel truck pulled out from the side street to move across the main road, shielding the accident van from my daughter, and she from it: more gates. If we were to devise a quincunx for such series of antecedents, her accident would be well out to the right of the centre of the distribution in Fig 1. Similar gates were no doubt involved in the cancer deaths of my aunts. There was some bad luck, and maybe some genetic damage. Maybe the genetic damage was a result of bad luck in my grandparents' lifetime. If we are looking to blame something, or someone, we have to try and dissect out the gates responsible and we have to be certain that the event we are trying to understand was not one that could have occurred as a consequence of the play of chance alone. And in this regard, it is more than just understanding the problem. Marx said: Philosophers have interpreted the world: the point is, to change it! I agree. You do too. It is about being able to change things so that the same thing does not happen to someone else.

One way we can do this is to study people and see how their levels of cancer relate to their exposures, their genetic makeup and their lifestyles. For this we use the science of epidemiology. Epidemiology is the study of the distribution and determinants of disease in human populations. A key aspect is that it is observational rather than experimental and therefore has to operate in an area where bias or confounding of the inferences drawn from the data may occur. In chemistry, a blue liquid may be mixed with a green liquid to give a red precipitate: this will always happen so long as the experiment is exactly repeated and the results can be used to draw inferences about the nature of the processes involved. But it is rare that an epidemiological study has the specificity of design and sufficient exclusion of uncontrolled variables between the study and control groups to enable unequivocal conclusions to be drawn. Therefore, this is an area where studies may be electively biased or directed to find either a result or no result. In addition, all studies may be subject to considerable criticism by groups who hold opposite views for reasons that may include culture, employment or political pressure. I will provide evidence of all three of these mechanisms of bias in published papers and review articles in my new book Wolves of Water from which this essay is taken. What we learn is that in drawing inferences from all the epidemiological studies of radiation and health, we have to consider very carefully the provenance of the study and in particular the likely directional bias of the studies' funding bodies and researchers.

Fig 1 Galton's Quincunx illustrating the binomial (yes/no) gates that have to precede filtration of events into categories of likelihood in the normal distribution.

All epidemiological studies compare a study group or groups, in the cases I look at usually those exposed to a known quantity of radiation, or some surrogate for it, with a control group, who should be matched in every way except that they are not exposed. Before examining real studies that attempt to translate this ideal study into practice and quantify the risks I must introduce some aspects of the analytical procedure. The first thing we have to look at is inference. The most valuable list of procedures that should be followed in order to draw safe inferences from evidence in epidemiological studies was devised by Sir Austin Bradford Hill in the 1950s and is termed Bradford Hill’s Canons. They are sufficiently valuable in assessing the case of radiation and health to give a short account of them.

1 Bradford Hill’s Canons

1.1.Statistical significance

A secure foundation for argument in any comparison of an exposed study group with an unexposed control group is that the difference in health deficit, (cancer mortality for example) is statistically significant and is unlikely to have occurred by chance. Significance testing is an area of statistics and a number of basic mathematical tests may be applied to see if a result is statistically significant.

The word ‘significant’ is one that within the scientific community has a specific, technical meaning, but can also be interpreted generally by those without a scientific background. When a research finding is said to be ‘significant’ this means that it may be considered to be meaningful, in the sense of not being a chance finding. Since statistics is a methodology based on probability, it accepts a certain level of error as inevitable, meaning that some scientific findings that have passed the ‘significance’ test are still bound to be wrong.

The level of ‘significance’, which, of course, is directly related to the level of error, is chosen by the researcher, and should be set higher if the findings have more potentially dangerous implications. The level of significance generally adopted in scientific research is 5 per cent. This means that researchers are accepting a 5 per cent level of error, or that they will be wrong 1 in 20 times.

The procedure of testing whether results are ‘significant’ is known as ‘hypothesis testing’. The scientist tests the ‘null’ hypothesis, which is the proposition that there is nothing unusual going on, or that the distribution of results found does not differ from what would be expected by chance.

Statistics defines two types of error that can be made when undertaking research. The first, known as a Type I error, is the one of most concern to scientists. It involves making a claim to have a research finding when in fact the results were generated by chance. An example might be a medical trial that indicated that a certain drug was effective in slowing the progress of AIDS; follow-up research might fail to find a similar result, suggesting that the original findings fell into the 5 per cent error area. For professional and credibility reasons, this is the kind of error most feared by a researcher: the error of claiming a significant result when in fact the finding resulted from chance.

But there is another type of error, which is equally important, particularly in terms of potentially harmful consequences of radiation exposure. This is the Type II error, defined as the failure to find a significant result when the hypothesis is in fact correct. It represents the risk of carrying out a study and, for reasons that may relate to technical issues such as the size of the sample, failing to find a statistically significant result. It may not necessarily mean that the hypothesis is wrong, only that significance was not found this time. However, it may allow conclusions to be drawn, either to justify use of a technology or because of extreme caution, that processes are not causing any ill effects when in fact they are.

Radiation risk studies in the low-level radiation range very often involve small numbers of people in the exposed study group, those living near a point source such as a nuclear power station for example. Studies with large populations may have small numbers of cancer cases due to very low natural rates from the disease in question. An example is childhood leukaemia. In each of these types of situation, statistical methods have been developed to deal with the mathematical problem, yet finally there may not be sufficient evidence in each study to draw an inference from measured excess risk from the radiation exposure because chance could not be ruled out i.e. the result was not significant at the 5% level. This is usually a consequence of the numbers involved. When a material difference is apparent between two groups, but, with the numbers involved, is insufficient to pass the significance test Bradford Hill argues that it is better to take ‘statistically not significant’ as the ‘non-proven’ of Scottish law rather than the ‘not guilty’ of English law. It is nevertheless true that policy decisions in the area of radiation and health have fallen into the trap of assuming that 'there is no evidence that low level radiation exposure is hazardous' means 'low level radiation exposure is not hazardous.' This is an especially important problem where lawyers are concerned, and many politicians are trained as lawyers.

In giving weight to such evidence, we need to make decisions about the possible outcome of accepting or not accepting a certain hypothesis and its supporting evidence. First we should take a precautionary approach and avoid making a Type II error in areas of low probability high impact risk, for if the evidence showing excess risk from the exposure were in fact a chance finding, the mistaken inclusion of it as evidence of radiation-induced effects would not harm the human race. If, on the other hand, we were to take the opposite view and exclude it as evidence when it was, in fact, a true measure of a real effect but merely formally non-significant, then much harm would follow its dismissal. In addition, there is the matter of what we do with non-significant results from different studies. How should we deal with these?

It may be that several different studies each suggest that radiation is a cause of cancer but in each study the formal statistical significance falls short of the magic 5% needed to state that the finding was not due to chance. It turns out that we can combine these studies in such a way as to obtain a very high degree of certainty using a statistical theorem due to a Victorian mathematician the Rev. Thomas Bayes. Bayesian statistics is in vogue at the moment as it can be used to teach robots (and computer programs) to learn. Microsoft 'Word' learns as a result of Bayesian analysis of previous inputs made by the person using the program.

On this basis we might use a Bayesian approach to the refinement of belief in the area of risk assessment and allow each non-significant observation (including unpublished results) to weight and modify the overall probability of belief in the area of radiation risk according to their degree of significance. Thus the discovery of a child leukaemia cluster in the 1980s near the nuclear reprocessing plant at Sellafield in Cumbria, UK has been criticised on the basis that the statistical significance of the result for the ward (p = .002) enabled no inference to be drawn since there are more wards in the UK than the 500 wards needed for such a result to be a chance occurrence. However, since this discovery, child leukaemia excesses have been discovered near two other reprocessing plants and a number of nuclear installations in Europe. The Bayesian modification of the probability of the causal relation by each new example gives us a firmer basis of belief in the association and enables more robust conclusions to be drawn about the levels of risk from exposure under these circumstances.

2 Strength of association

There should be evidence of a strong association between the risk factor and the disease: in other words, it is necessary to consider the relative incidence of the condition under study in the populations contrasted.

3 Consistency

The association should have been repeatedly observed by different persons in different places, different circumstances and times. With much research work in progress many environmental associations may be thrown up. On the customary tests of statistical significance, some of them will appear unlikely to be due to chance. Nevertheless, whether chance is an explanation or whether a true hazard has been revealed may sometimes only be answered by a repetition of the circumstances and the observations. Broadly the same answer should be given by studies using a wide variety of techniques and in different situations.

4 Specificity and reversibility

The association should be specific. The disease association should be limited, ideally, to exposure to the putative cause and those exposed should not suffer an excess risk from other kinds of illness or modes of dying. In the area of radiation risk, where the plausible biological model involves genetic and somatic damage, disease specificity may be hard to define. One condition that has become considered as a specific consequence of radiation exposure is leukaemia, particularly in children. However, the specificity should be defined accurately in terms both of cause and effect. In the case of low-level radiation exposure, the lack of distinction between external and internal exposure has led to conclusions being drawn that are incorrect. Associated with specificity is reversibility. Thus removal of the cause should ideally reduce the incidence of the disease, although this is a consideration that is difficult to apply in the case of cancer, where genetic damage is not removed by removing the cause of the damage. It is of interest in this context that the rates of child leukemia in the UK are flattening off now that radiation exposure is reducing (Chernobyl being the last exposure).

5 Relationship in time.

There should clearly be evidence that the risk factor preceded the onset of the disease.

6 Biological gradient

There should clearly be evidence of a dose-response effect. This is usually taken to mean that as the dose increase, the illness rate should also increase in some proportion. However, some thought will reveal that this may not be true for certain end-points. Take, for example, birth malformations due to an exposure; Increasing the stress from zero will cause increasing damage to embryos which may eventually present as increasing risk of malformation. At some point, the weight of damage will prove too great and the embryos will die: at this dose, there will be no further congenital malformation, merely a reduction in the birth rate. Since there are many possible reasons for reduction in the birth rate, including social ones, the fact that exposure to a large dose of some putative mutagen has not caused any increase in birth defects ought not be taken as evidence of no effect unless lower doses are also considered and the dose-response relation adequately considered. This exact misunderstanding appears to have led to the belief that exposure to radiation from Chernobyl caused no harmful effect on birth defect, stillbirth and infant mortality rates in European populations. A number of papers asserted this on the basis of the data without drawing attention to the sharp fall in the birth rate that occurred some nine to twelve months after the exposure. A similar type of error also applies to ecological studies where some groups of individuals may have greater susceptibility to radiation. The existence of a dual sensitivity to radiation as a consequence of normal cell division also results in a dose-response relation that is biphasic, i.e. has two areas where increased effect follows increased dose, with an intervening area where increased dose results in reduced effect. The existence of inducible cell-damage repair results in a similar biphasic relationship between cause and effect.

7 Biological plausibility: mechanism

A key question in the area of radiation risk is: what is the mechanism of radiation induced cancer? Mechanisms advanced in the early years of radiation research, based on direct hits to genes and induction of specific mutations underpinned the use of a linear relationship between dose and final cancer expression. But we now see that this was too simplistic a reduction of what happens, and that the response to exposure is extremely complex. Bradford Hill was aware of this problem of lack of knowledge. He stated, ‘It will be helpful if the causation we suspect is biologically plausible, though this is a feature we cannot demand. What is biologically plausible depends upon the biological knowledge of the day. It was lack of biological knowledge in the 19th century that led a prize essayist writing on the value and fallacy of statistics to conclude that among other 'absurd associations . . . it could be no more ridiculous for the stranger who passed the night in the steerage of an emigrant ship to ascribe the typhus, which he there contracted, to the vermin with which the bodies of the sick might be infected'. For this reason we should be anxious to avoid dismissing evidence of health detriment following low level radiation exposure on the grounds of lack of a plausible biological mechanism. In particular, the ICRP's assumptions about cell dose at low level exposures provide a good example of how mechanistic arguments have been used to argue for a linear relation between dose and response, a thesis which is only valid for external random irradiation of large tissue volumes and which, in any case, is being overtaken by recent research on genomic instability and bystander effects.

8 Alternative explanation

There should be no convincing alternative explanation or confounding for the observed association.

2 Types of study and general problem

There are two questions we need to ask. The first is: how much ill health, specifically how much cancer, will be caused as a result of some unit dose of radiation exposure of a specific kind? The second is: Have the discharges from nuclear site X caused increases in cancer or ill-health in people living nearby? In order to answer these questions we must devise epidemiological studies of various sorts. I am going to outline some simple methods that we have used and which you can also use. I provide, at the end of this essay, the Burnham on Sea questionnaire study as a template to allow you to do this. For those who want to look further into the interesting field of epidemiological methods I suggest the excellent book by Woodward (1999).

As far as the nuclear industries and governments of the world are concerned, the question of how much ill health is caused by radiation has been answered by the study of the survivors of Hiroshima and Nagasaki. I describe these studies in some detail in Wings of Death. A group of Japanese people who were out in the open at the time of the A-bombs and groups of which were situated at different distances from the explosion in 1945 have been followed up from about 1952 to the present day. They have been compared on the basis of their various distances (and presumed) doses and also have been compared with groups who were not in the city at the time of the bomb or groups who came into the city much later. The study recorded the cancer rates over their whole lifespan (the study is still going on), and this enabled the calculation of relative risk of cancer following various external exposure doses from gamma rays. In this simple formulation, relative risk RR is simply the age-standardised rate in the irradiated population divided by the age standardised rate in the un-irradiated or control population.

Why do we need to examine age-standardised rates and what does this mean? Cancer incidence increases exponentially with age, for reasons I have outlined in Wings of Death. Therefore cancer mostly occurs in older people. So if we are comparing two groups of people there has to be a way of allowing for this since, even if there were identical environments and stresses, in two equal sized groups, the one with the oldest people in it would have the highest numbers of cancer cases. Age standardisation is fairly straightforward and I explain how to do this later on in this article.

The Hiroshima studies are the main basis for radiation risk factors used to predict or explain the cancer yield from any exposure, but there have been some other studies of people who were irradiated for various medical conditions (at the time when this was believed to be harmless and the results of these have been generally supportive of the Hiroshima risk factors). Our argument is, of course, that the Hiroshima risk factors (and those of the other supportive studies) were all of external acute radiation exposures and can't be used to assess internal exposure. The Hiroshima studies are an example of a cohort study, one where a group of people who have been identified as experiencing a particular risk are followed over a period of time to see what diseases they suffer compared with a similar group who have not been exposed to the risk. The other main type of epidemiological study is the case-control study where cases who are identified as having some disease are compared with controls who are matched with the cases in as many ways as possible so that differences in aspects of the environment or behaviour between the two groups might point to a cause for the disease in the cases. But I don’t have space here to deal with all the problems of epidemiology. In the last ten years, as the pollution effects begin to bite and human health begins to suffer, the science has become infested with philosophical complication and mathematical fog. The reason for the obfuscation is partly psychological denial by the public health and cancer epidemiologists in the face of the evidence that we are all being poisoned. For our purposes in this brief essay we can bypass all this. We just need sufficient arithmetical machinery to ask some simple questions about risk near sources of pollution and to see how we can obtain answers to the questions we need to examine. This approach might perhaps be called barefoot epidemiology. I shall describe now how we can learn fairly simply what is going on the world.

2.1 The necessary data

Before we find out if the risk of cancer near the nuclear (or any source) plant is high, we need to know three things:

The numbers of cases of cancer of interest in some defined area near the plant.

The numbers of men and women living in this area in each 5-year age group.

The rates for the cancer of interest in each 5-year age and sex group in the national population which we use as a control or basis for estimation'

In addition, if we wish to look at the trends in cancer near the plant, or further afield, we need the numbers of cases of cancer and the populations for defined areas, which are more distant from the plant. There is one other refinement. Some cancers are naturally more prevalent (incidence and deaths) in poor people or disadvantaged people. The effect is large for lung cancer, the most common cancer in men, and is believed to be due to cigarette smoking behaviour. Some people think it is because poverty and disadvantagement may have effects through immune system suppression. For some other cancers, notably breast cancer and leukaemia, there is a weaker effect in the opposite direction. Thus there is more breast cancer in less disadvantaged or richer women. This could be because of differences in nutrition (more dairy) or behaviour (having fewer children later in age), or a combination. But these effects should be controlled for, and this can be accomplished in various ways, using indices of disadvantagement. The easiest way to control is to adjust the expected numbers by multiplying by a weighting factor calculated from the relationship between Social Class and cancer rates in the national population. Social Class makeup of an area is obtained directly from the census. Many other indices, Carstairs or Townsend, which include unemployment, multiple occupancy of houses, car ownership etc. can be calculated from data in the census.

There are basically two sources for these kinds of data, national databases and questionnaires. The problem with national databases, as far as our examination of point sources of risk are concerned, is that generally speaking the establishment will not give out the data, so no one can find out anything. This seems to be true for the USA and Europe as well as the UK and it is getting worse. I will discuss the problem further in my new book but there is one tremendously useful exception. Although the rules for releasing such data are held to refer to both incidence (developing the disease) and mortality (dying of it) [M Quinn, 1992], since about 1999, ward level cancer mortality (deaths) for all malignancy, lung cancer, breast, stomach and prostate cancer annually in England and Wales from 1995 has been sold by the UK Office for National Statistics. This oversight, which I am sure is regretted because of the use we have put to the data, is probably because when ONS was split off from government by Mrs Thatcher, they were told they had to be self sufficient, and so they began to sell of the data to make ends meet. Anyone could buy this data. So for about £500 we do now have seven years of cancer mortality data for the main cancer types by ward, 1995 to 2001. The oversight was noted belatedly following some studies I carried out in 2000 and 2001 near nuclear sites in the UK and these received a great deal of Press coverage (since we found cancer increases near nuclear sites) and by 2002, the gate had been shut. No more data was to be released. I have an application into the Freedom of Information Act at the moment but don’t hold your breath.

In the UK, wards are the smallest sizes of area for which the population makeup can be obtained. The UK census in 1981, 1991 and 2001 gives numbers of men and women in each 5-year age group for all the wards in the UK, and these data are also available from the ONS for a fee. Before I turn to an explanation of how to proceed from here, I will briefly examine two other sources of data, questionnaires and local doctors' records.

If we wish to examine cancer risk in a smaller area than a ward, or if locals can take the matter into their own hands, they can bypass the cancer registries and knock on doors. Green Audit developed this local area cancer questionnaire technique originally because it seemed that we were going to be refused small area data for Ireland. In the event we did get some numbers from the new National Cancer Registry in Cork, but by then we had gone ahead in the small area around Carlingford and Greenore in County Louth. Later, we used the method in Burnham on Sea, near the nuclear power station at Hinkley Point in Somerset. I will give an account of the questionnaire method. Briefly, the method consists of defining an area and visiting all the houses in the area with a questionnaire. This asks the head of the household, or some responsible person to record the ages and sex of each person living in the house, together with any case of cancer diagnosed in the last ten years, the type of cancer and the age at diagnosis. This information allows the calculation of risk on the basis of the population living in all the houses.

The other possibility is to approach local GP surgeries and ask if anonymised totals of cancer incidence can be made available for research. For a meaningful result, we also need the total age breakdown of the surgery population. It may seem unlikely that these data will be forthcoming, but we were able to use this approach in Ireland and the results were very valuable. I will now turn to the basic method used to convert the data into meaningful patterns of risk.

2.2 Calculating Risk from the data

1. Ward level data

The method is essentially the same whatever the source of data. We have to calculate the expected numbers of cancer deaths or diagnosed cases in the area of interest for the period we are studying. We then must compare these with the observed or recorded numbers. It is best to look at as long a period as possible in order to get the largest number of cases into the study. This is because the larger the number of cases, the less likelihood there is of any effect we find being a consequence of chance. The procedure is simple but quite tedious if many wards are involved. We start with the population of each ward making up the area close to our proposed source of risk. The source of risk may be a nuclear site, or it may be a coastal strip where radioactive materials are known, by measurement, to accumulate. To give an example, I will show how the sums are done for an area we have been interested in, the ward of Burnham on Sea North in Somerset. The 1991 census population of females in this ward is shown in Table 2.1 below together with the England and Wales death rates from breast cancer and calculated annual expected numbers of deaths. These populations can be obtained from ONS as computer text files; and this is the best way, since they can be imported directly into a spreadsheet program and the calculations done from within it. There are, therefore, no transcription errors.

The rates for the cancer in question are calculated from national figures published by ONS. Table 2 of Series DH2, Mortality by Cause gives the numbers of deaths in each 5-year age group, and also the population. So that it is easy to follow this, I calculate the rate for one year in Table 2.2

These calculations, and the others that follow, are very tedious, but are made straightforward and can be done in blocks of as many wards as you like, using the computer. The most widely used software that will deal with these calculations quickly and efficiently is Microsoft EXCEL, but other spreadsheet programs will do just as well e.g LOTUS 123. Whole columns containing hundreds of populations of wards can be multiplied by rates to give new columns of expected cases and these can be added for all the age groups and the result put into a new column with a few clicks of a mouse. For those who are able to do these procedures by batch programming, the whole process can be automated, and indeed we are now developing a system here that will enable us to obtain cancer mortality risks in any ward in Britain.

In Table 2.1, the 1991 female population of the ward of Burnham on Sea North is given by 5-year age group. In this table we are calculating the expected annual numbers of breast cancer deaths, based on the rates in England and Wales. Each 5-year age group population is thus multiplied by the appropriate rate, given in column (C) to give the expected numbers in column (D). All ages are then added up to give the total expected number of deaths, 2.178. What this tells us is that if Burnham on Sea North has exactly the same risk as the population of England and Wales, and the population had not changed from 1991, there would be 2.178 deaths from breast cancer every year. What we found, in the first of these studies we did in 2000 was that there were 17 recorded deaths from breast cancer in the four years 1995 to 1998. The expected number would be 2.178 x 4 years = 8.7 deaths. Thus we can calculate the age standardised mortality risk or Standardised Mortality Ratio as:

SMR = 17/8.7 = 1.95

This is a valuable discovery. We have found that there is almost twice the probability of dying of breast cancer here than the mean for England and Wales. And yet Burnham on Sea is not in the middle of some industrial slum area surrounded by chemical factories: It is a pretty little seaside town, where people go on holiday and children play in the sand. But it is also directly opposite and down wind of the Hinkley Point nuclear power station complex just across the bay.

The same calculation can be done for any cancer type and for men and women combined. All that we need is the appropriate rates, the ward population by sex and 5-year groups and the observed numbers.

Table 2.1 1991 female census population of Burnham North ward with expected numbers of deaths from breast cancer (D) calculated by multiplying each age group population (B) by the average England and Wales mortality rate (D).

(A)

Age group

(females tabulated)

(B)

Ward population at census

*(C)

Annual cancer rate in England and Wales for 1995-2000 per 100,000

(D)

Expected numbers of cases per year

(D) =(B) x (C)

0-4 101 0 0
5-9 113 0 0
10-14 131 0 0
15-19 127 0 0
20-24 83 1.27E-6 0.00001
25-29 106 1.26E-5 0.0013
30-34 117 5.0E-5 0.0058
35-39 122 0.000123 0.0149
40-44 163 0.000234 0.038
45-49 136 0.000395 0.054
50-54 122 0.000603 0.073
55-59 145 0.000719 0.104
60-64 175 0.000872 0.152
65-69 237 0.00103 0.244
70-74 237 0.00126 0.299
75-79 176 0.00150 0.264
80-84 181 0.00188 0.34
85-89 137 0.00246 0.337
90+ 81 0.0031 0.251
All ages 2.178

* calculated from annual figures of deaths tabulated by cause published by ONS (series DH2) for England and Wales

Note: the E-notation means 'multiply by 10 to the power of-' so E-6 is 1 x 10-6

 

Table 2.2 Calculating the England and Wales breast cancer mortality rates for 1995

(A) Age group (B) 1995 Female population (thousands) (C) 1995 Breast cancer deaths (D) 1995 Rate

D = C/B

0-4 1651.9 0 0
5-9 1656.4 0 0
10-14 1563 0 0
15-19 1469 0 0
20-24 1703 2 1.17E-6
25-29 2002.9 28 1.4E-5
30-34 2074.6 114 5.5E-5
35-39 1810.6 218 0.00012
40-44 1669.1 404 0.000242
45-49 1828.2 795 0.000435
50-54 1478.8 979 0.000662
55-59 1339 1042 0.000778
60-64 1254 1132 0.000903
65-69 1245.8 1397 0.00112
70-74 1231 1591 0.00129
75-79 933.5 1412 0.00151
80-84 768.8 1517 0.00197
85-89 468 1151 0.00246
90+ 240 761 0.0032

 

2. Questionnaire and GP studies.

The approach in both of these types of study is exactly the same. The questionnaire responses give the numbers of cancers diagnosed in each household, the type of cancer, the sex, age and year at diagnosis and the sex, age and number of everyone living in the house. When all these data are added together we have a sample population, drawn randomly from the total population of the area being canvassed. We know the sex and age breakdown of this population, at the time of the questionnaire and can calculate the annual expected numbers of cases of all cancers or any specific cancer from the national database. We then compare this expected number with the reported numbers over any period we choose to see if there is an excess. There are some particular problems with this approach for questionnaires. The main one is that as we go back in time from the date of the questionnaire, there will be leakage of people with cancer from the sample due to deaths. Thus we always find that the number of annual cases reported falls off for earlier years. This is shown in Table 2.3. below, for the 2001 Burnham on Sea North questionnaire undertaken by 'Parents Concerned about Hinkley' (PCAH). This survey was a response to denials by the Somerset Health Authority that there was any increase in breast cancer in the ward following the first mortality study we carried out in 2000 [Busby, Dorfman, Rowe 2000]. The survey obtained answers from addresses that added up to about one third of the census population of the ward.

Table 2.3 All cancer cases by year of diagnosis reported in 2002 in Burnham on Sea North according to PCAH questionnaire. Numbers fall off in earlier years because many have died and their families have moved away.

Year diagnosed Number of cases
2001 17
2000 12
1999 8
1998 8
1997 10
1996 9
1995 6
1994 4
1993 2

Now, on the basis of the population defined by the response, the expected number of all types of cancer was 11 per year, so if we choose to look at 2000 and 2001 together, this will define a risk of 29/22 or 1.32. For year 2000 alone, the risk is 17/11 = 1.55. The risk we calculate for all cancers falls off rapidly with time. This is because the numbers will be dominated by lung cancer, which is usually fatal, and for all cancers, lung and certain other cancers which have poor survival, the questionnaire method is of little use. But for looking at types of cancer that are treatable, or are less immediately fatal, like breast cancer and leukaemia, the method is valuable. It also has some great advantages. The first is that you can believe the results. If there is a significant excess of cancer shown by a survey like this, then you have immediate connection with the people who are reporting this. You know where they live and can relate this to the source of pollution by dividing your area up into bands of distance from the source, as we were able to do in Carlingford in Ireland. Best of all, you know that no one in the establishment has altered the data or re-evaluated the numbers and retrospectively removed cases that might indicate a problem. I show in the new book Wolves of Water that all these things happen.

The PCAH questionnaire showed that the Somerset Health Authority were covering up a real effect. It showed a doubling of the breast cancer incidence rate in the ward, confirming what we found in the earlier ward level mortality study. In reality the true risk will always be higher than the calculated risk because of this leakage of population, so if the questionnaire shows a problem, there is one. In addition to locating sufferers on a map so that their distance from the source of pollution can be assessed, we can also ask questions about lifestyle e.g. smoking, eating habits etc. which may be valuable indicators of exposure routes or other stresses. Some 18 months after our publication of the results of these studies, the local Cancer Registry were forced by the local Health Authority and concerns driven by media to conduct an official examination of the data to see if our results were correct. They found that the increases in breast cancer and leukemia we found were real. However, they argued that these were chance findings and not related to the nuclear site. Nevertheless, I make this point that the small study carried out by concerned citizens gave approximately the right answer and was validated by the cancer registry, who had refused earlier to release data to us or the citizens of Burnham. The media had a field day.

GP studies involve exactly the same calculations as we did for the mortality studies. The only difference is that the age breakdown of the study population is that of the surgery patient list. You have to be able to persuade the doctor to get the data out; often a very difficult exercise.

Part of my purpose here is to show that anyone can do such calculations and that people who are concerned about illness near any source of pollution, nuclear plants, landfill sites, incinerators, can check out what is going on for themselves. But we can't get too carried away at this stage; there is one more question we must address before we are secure in our belief that there is a problem at Burnham on Sea North, or anywhere where we find an apparent increased risk. We have to establish, as Bradford Hill's Canon asks: could this doubling of the risk of death occurred by chance? Here we have to deal with some statistical methods that enable us to answer this question.

3 The play of chance, and its evaluation.

I think am going to upset the epidemiologists here. As I mentioned, in the last fifteen years, the discipline has collapsed under the intellectual weight of statistical methods developed to answer the simple question about whether an event might be a real indicator of an underlying causal relation (e.g. radiation and leukaemia) or merely an example of the random play of chance. Part of the problem we have to deal with is that the levels of risk we will find for adult cancers are usually modest. In Burnham on Sea, downwind from the Hinkley Point nuclear power station we found in 2001 that there was a 2-fold excess of breast cancer based on 17 deaths over the four years 1995-98. In public health epidemiology the diseases that are being tracked are usually more immediate and graphic than the development of cancer some ten to twenty years after the initial causal exposure. Public Health departments deal with more mundane question. Some people eat at a restaurant and suffer food poisoning: why? Twenty out of thirty who attended a picnic are in hospital vomiting with a high fever: what was the cause? It is unlikely to be chance, in these cases. But with the cancer risk situation, it may be chance, and we have to establish whether it is. I am going to avoid the complexity of modern statistical methods and offer a few simple tests that will let us know whether our discoveries are statistically significant or not. When we discover some increased risk level and report it, the health authorities will usually respond immediately with the cry: this is a random cluster of cases! In order to see if this is so, for small numbers of cases up to 40, the most important weapon in our armoury is the Cumulative Poisson Probability Table. This is a table which is published in most compilations of statistical tables and which enables us to see at a glance what the probability of observing some number of events is, given that the expectation is some other number of events. The tables are constructed from the Poisson Equation, which is an approximation to the Normal Distribution (or Bell-Shaped Curve) for rare events in the wings of the distribution. If this means nothing, don't worry, just use the tables and assume they tell you what you wish to know. I refer you to Statistical Tables by J Murdoch and JA Barnes (Macmillan1998), but all these tables are the same. In the case of Burnham on Sea and breast cancer deaths from 1995-1998 we expect 8.7 and find 17. In the Cumulative Poisson Tables we look along the columns of expected numbers to 8.7 and then look down the rows to observed numbers 17. The Table gives the value 0.008. This value means that for an expectation of 8.7 deaths, you could expect to find 17 or any number fewer deaths with a probability of 0.008. That is 1 chance in 1/0.008 or 125. It tells us that if we looked at about 125 wards of the same size as Burnham on Sea, we would expect to find one ward with this apparent increase in breast cancer by chance alone.

The level of statistical significance conventionally accepted by statisticians as showing that a finding is 'significant' is taken to be below 0.05 (1 chance in 20) so in this case we certainly achieve this level. The numbers of cases involved in the calculation is very critical to this question of significance. Between 1995 and 1998 there were 17 breast cancer deaths. By 2001, another three years, the number of deaths was 14 more taking the total to 31. In the seven years the expectation is (assuming the same population) increased to 2.178 x 7 = 15.3. We now have a new risk 31/15.3 = 2.03. In the Poisson Tables this gives us a probability of 0.0002 or one chance in 5000 of these events being due to chance. There is not much difference between SMR's of 1.95 and 2.03, but the significance has hugely increased because the numbers are greater. This shows the importance of getting as many people into a study as possible.

For larger numbers than 40, we use a different method, which is much easier and quicker. It is called Chi-squared and is written c2. All we have to do is calculate the number from the following equation:

c2 = (Observed-Expected)2/Expected

Thus for the previous case, we would have:

c2 = (31-15.3)2/15.3

= 246.49/15.3

= 16.1

There are values for c2 in all compilations of statistical tables. For these simple cases we have to use the value for 1-degree of freedom and the critical points for various levels of significance (0.05, 0.01, 0.005 and 0.001) are all we need to know. These are given in Table 3.1

3.1 critical values for the Chi-squared (c2 ) distribution on 1-degree of freedom.*

p-value less than: c2 more than: One chance in:
0.05 3.84 20
0.01 6.64 100
0.005 7.88 200
0.001 10.8 1000

*Note: these values are for 2-tailed hypothesis tests where we are asking if the result is significantly greater than or smaller than the expected result. This is now the conventional test, although usually we are asking if the cancer is greater than expected, and in these circumstances a one-tailed test is correct, and this lets in more results as being significant

Our calculation gives 16.1 and since this is greater than 10.82, the statistical significance is lower than 0.001, which we found in the Poisson Tables also. Some researchers demand that we also give the 95% confidence limits, and there is some reason for this. These are rather harder to work out than the p-values we have just calculated but can be obtained easily enough using a computer program. There are a great deal of excellent statistical computer programs which can be used to deal with these issues, and I will say a bit more about these below. Before I do this I need to briefly address the small numbers problem, since this is usually referred to by health officials trying to dismiss the significance of a finding.

We saw that the p-value from breast cancer mortality in Burnham North over the 5-year period was 0.008 and the RR was 1.95. This means that if we are looking for some effect near a nuclear site or other putative source of risk where we are studying a number of wards we should be aware that for every 125 wards there would be an excess of 1.95 by chance alone. In our first study we looked at about 109 wards, so finding the effect in one of them was not unexpected. On the other hand we already had a hypothesis suggesting the pattern of risk, close to the contaminated mud where Burnham on Sea was. This suggested the problem was a real one. So we have to look at more than just the value in one particular ward: we have to look for a pattern on several wards which share some attribute that can act as a surrogate for the exposure. In our case, it was proximity to the intertidal sediment. As it happened, increasing the period to 2001 increased the risk slightly but hugely increased the statistical significance to 0.0002. For this to have been a chance finding we would have to study about 5000 wards.

But there is yet another pitfall for the amateur epidemiologist: referred to (by the epidemiological community) as the 'Texas Sharpshooter'. In this, the argument goes, the student examines all the different cancers in all the different wards near a point source of pollution. This gives a large number of results (for different cancers) and, by chance, one of these will show a high risk (merely because there are lot of numbers). At this point, the student decides that this particular cancer is being caused by the releases. This apparently is not allowed. It is like the 'Texas Sharpshooter' firing his pistol at the wall of a barn and then after he has hit the wall, drawing a bullseye around his bullethole. Whilst I have sympathy with this argument, I get extremely irritated at the way it is mindlessly trotted out by establishment or government to deny real problems. I discuss a specific example of this in Wolves, but at this stage I want to point out that such an argument would make it impossible to discover anything on first notice. What if we lived near a factory that produced a seriously carcinogenic substance that caused throat cancer? We do not know this, but we may have some (plausible) general belief that pollution is a bad thing and causes disease. So we do a cancer study of the area using 100 wards (to get a control) and look at all cancer types and find that the ward near the plant has a 4-fold excess of throat cancer. We are not allowed to conclude anything from this because of the Texas Sharpshooter.

4 Adjusting for Disadvantagement.

It is quite commonly stated, when attention is drawn to high levels of cancer incidence in an area where there is a source of some environmental pollution, that the high rates are a consequence of socioeconomic factors. The argument generally follows the reasoning that people who are poor or unemployed smoke or drink too much alcohol and eat too many chips, not enough fresh fruit and vegetables. These behaviour patterns, it is argued, are associated with elevated cancer risk. It is also said that the genetic constitution of people in the lowest socio-economic groups is, in some un-elaborated way, 'poor' and this also adds to the cancer risk. Furthermore, these people get ill more often owing to their damp houses and overcrowded conditions with higher levels of infective illness or virus infection. Possibly also, the toffs might say, their lower class behaviour, too much boozing and sex.

The study of deprivation and health is associated with a large body of literature. Deprivation indices using data easily extracted from the Census (such as that first suggested by Townsend) include such indicators as multiple dwelling occupancy, car ownership, home ownership, unemployment rates and so forth. The report by Carstairs and Morris (1991) discussed the development and use of the now widely utilised 'Carstairs Index' of deprivation. However, even if it is subsequently shown that the area in question has a higher index of socioeconomic deprivation, great care must be exercised in attributing the cancer to the deprivation. A common method to examine trend in cancer incidence near a putative source of pollution is to use least-squares multiple regression to examine trend with distance. This is a method that has become much in vogue since the development of computers. It can be used for looking at data where there are multiple possible causal factors for some measurement of interest and in principle enables us to see what are the most important contributors. I will discuss it further below. In its approach to effects near a point source, distance may be entered as one of the independent variables as a surrogate for pollution level and we can also include some measure of socio-economic deprivation. Implication of causation by the pollutant is flagged up by high statistical significance for the slope coefficient of the regression line or curve. Implication of socioeconomic deprivation may also be confirmed in this way. This, however, may be entirely spurious since it is only generally poor areas where sources of pollution exist. Either they are built in poor areas because there is insufficient political power in such areas to oppose the project or else, following the identification of such a source, people rich enough to move away, do so. Furthermore, it is an extraordinary fact that in the UK there is in force a statutory planning instrument encouraging the siting of environmentally polluting sites in areas of high unemployment.

Examination of the relationship between cancer incidence and the Carstairs index of deprivation in Scotland reveals that, except for lung cancer in males and cervical cancer in females (many sexual partners and papillovirus transfer) the differences between the most deprived and least deprived are not very great. (Carstairs and Morris, 1991). Similar data for England has been obtained from the OPCS longitudinal study and uses Social Class as a measure of disadvantagement (Leon, 1988). The direction of the trend in cancer with increasing levels of deprivation may be positive as well as negative. In our work we follow Leon (1988) in defining a 'positive' effect as being an increase in cancer risk in going up the socioeconomic scale and a 'negative' effect as an increase in cancer in going down the socioeconomic scale. A large number of studies have investigated such effects both in the UK and elsewhere in the world and in general, the results indicate that lung cancer in males and cervical cancer in females suffer negative effects. Positive effects exist for colon cancer, female breast cancer, and leukemia. A Table of Carstairs deprivation index and cancer incidence in Scotland is shown in Table 4.1 below.

We used both Carstairs, and a Welsh Office index of deprivation for regression analysis in the Wales small area study but in our cancer mortality studies we carry out an adjustment of the expected numbers of cases in a ward using the relation between mortality and Social Class in England and Wales discovered by Leon in 1988. Since this is a straightforward calculation, I show how it is done. The relationship is given in Table 4.2 for the main cancers of interest. The trends with Social Class are similar to the trends shown by Carstairs indices. In practice, the adjustment is made to the expected numbers of cases. To use our example of breast cancer mortality in Burnham on Sea, first we have to obtain from the Census the numbers of each household in the ward in each of the Social Class categories as a proportion of all the households.

Table 4.1 Standardised Incidence Ratio of cancer sites by Carstairs deprivation score in Scotland 1979-82. (Source: Carstairs and Morris, 1991) The ranges of status in the rural areas we have examined are given in bold.

Cancer site

-12

-8

-4

0

+4

+8

+12

All malignancy

95

96

94

100

101

107

122

Oesophagus

78

86

101

92

104

118

148

Stomach

79

80

86

103

109

128

138

Colon

104

106

99

99

98

97

96

Lung

69

77

80

100

110

128

183

F.breast

114

109

99

100

96

95

89

Leukemia

107

107

96

102

89

104

100

Prostate

115

106

99

103

96

90

92

Cervix

67

72

91

94

120

124

166

 

Table 4.2 Standardised Mortality Ratio (0-64yrs) by Social Class for some common cancers (1971-72)

Social Class

All cancers

Lung

Breast

Prostate

Leukemia

1 Male

*Female

75

99

53

73

117

91

113

88