Family-wise error rate

From Infogalactic: the planetary knowledge core
Jump to: navigation, search

In statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or type I errors, among all the hypotheses when performing multiple hypotheses tests.

History

Tukey first coined the term experimentwise error rate and "per-experiment" error rate for the error rate that the researcher should use as a control level in a multiple hypothesis experiment.

Background

Within the statistical framework, there are several definitions for the term "family":

  • First of all, a distinction must be made between exploratory data analysis and confirmatory data analysis: for exploratory analysis – the family constitutes all inferences made and those that potentially could be made, whereas in the case of confirmatory analysis, the family must include only inferences of interest specified prior to the study.
  • Hochberg & Tamhane (1987) define "family" as "any collection of inferences for which it is meaningful to take into account some combined measure of error".[1]
  • According to Cox (1982), a set of inferences should be regarded a family:
  1. To take into account the selection effect due to data dredging
  2. To ensure simultaneous correctness of a set of inferences as to guarantee a correct overall decision

To summarize, a family could best be defined by the potential selective inference that is being faced: A family is the smallest set of items of inference in an analysis, interchangeable about their meaning for the goal of research, from which selection of results for action, presentation or highlighting could be made (Benjamini).

Classification of multiple hypothesis tests

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

The following table defines various errors committed when testing multiple null hypotheses. Suppose we have a number m of multiple null hypotheses, denoted by: H1H2, ..., Hm. Using a statistical test, we reject the null hypothesis if the test is declared significant. We do not reject the null hypothesis if the test is non-significant. Summing the test results over Hi  will give us the following table and related random variables:

Null hypothesis is True (H0) Alternative hypothesis is True (H1) Total
Declared significant V S R
Declared non-significant U T m - R
Total m_0 m - m_0 m

Definition

The FWER is the probability of making at least one type I error in the family,

 \mathrm{FWER} = \Pr(V \ge 1), \,

or equivalently,

 \mathrm{FWER} = 1 -\Pr(V = 0).

Thus, by assuring  \mathrm{FWER} \le \alpha\,\! \,, the probability of making even one type I error in the family is controlled at level \alpha\,\!.

A procedure controls the FWER in the weak sense if the FWER control at level \alpha\,\! is guaranteed only when all null hypotheses are true (i.e. when m_0 = m so the global null hypothesis is true).

A procedure controls the FWER in the strong sense if the FWER control at level \alpha\,\! is guaranteed for any configuration of true and non-true null hypotheses (including the global null hypothesis).

Controlling procedures

Lua error in Module:Broader at line 30: attempt to call field '_formatLink' (a nil value).

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

The following is a concise review of some of the classical solutions that ensure strong level \alpha FWER control, followed by some newer solutions.

The Bonferroni procedure

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

  • Denote by p_{i} the p-value for testing H_{i}
  • reject H_{i} if  p_{i} \leq \frac{\alpha}{m}

The Šidák procedure

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

  • Testing each hypothesis at level  \alpha_{SID} = 1-(1-\alpha)^\frac{1}{m} is Sidak's multiple testing procedure.
  • This procedure is more powerful than Bonferroni but the gain is small.
  • This procedure can fail to control the FWER when the tests are negatively dependent.

Tukey's procedure

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

  • Tukey's procedure is only applicable for pairwise comparisons.
  • It assumes independence of the observations being tested, as well as equal variation across observations (homoscedasticity).
  • The procedure calculates for each pair the studentized range statistic:  \frac {Y_{A}-Y_{B}} {SE} where Y_{A} is the larger of the two means being compared, Y_{B} is the smaller, and SE is the standard error of the data in question.
  • Tukey's test is essentially a Student's t-test, except that it corrects for family-wise error-rate.

Holm's step-down procedure (1979)

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

  • Start by ordering the p-values (from lowest to highest) P_{(1)} \ldots P_{(m)} and let the associated hypotheses be H_{(1)} \ldots H_{(m)}
  • Let R be the smallest k such that P_{(k)} > \frac{\alpha}{m+1-k}
  • Reject the null hypotheses H_{(1)} \ldots H_{(R-1)}. If R = 1 then none of the hypotheses are rejected.

This procedure is uniformly more powerful than the Bonferroni procedure.[2] It is worth noticing here that the reason why this procedure controls the family-wise error rate for all the m hypotheses at level α in the strong sense, is because it is a closed testing procedure. As such, each intersection is tested using the simple Bonferroni test.

Hochberg's step-up procedure (1988)

Hochberg's step-up procedure (1988) is performed using the following steps:[3]

  • Start by ordering the p-values (from lowest to highest) P_{(1)} \ldots P_{(m)} and let the associated hypotheses be H_{(1)} \ldots H_{(m)}
  • For a given \alpha, let R be the largest k such that P_{(k)} \leq \frac{\alpha}{m+1-k}
  • Reject the null hypotheses H_{(1)} \ldots H_{(R)}

Hochberg's procedure is more powerful than Holms'. Nevertheless, while Holm’s is a closed testing procedure (and thus, like Bonferroni, has no restriction on the joint distribution of the test statistics), Hochberg’s is based on the Simes test, so it holds only under non-negative dependence.

Dunnett's correction

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

Charles Dunnett (1955, 1966) described an alternative alpha error adjustment when k groups are compared to the same control group. Now known as Dunnett's test, this method is less conservative than the Bonferroni adjustment.

Scheffé's method

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

Lua error in package.lua at line 80: module 'strict' not found.

Resampling procedures

The procedures of Bonferroni and Holm control the FWER under any dependence structure of the p-values (or equivalently the individual test statistics). Essentially, this is achieved by accommodating a `worst-case' dependence structure (which is close to independence for most practical purposes). But such an approach is conservative if dependence is actually positive. To give an extreme example, under perfect positive dependence, there is effectively only one test and thus, the FWER is uninflated.

Accounting for the dependence structure of the p-values (or of the individual test statistics) produce more powerful procedures. This can be achieved by applying resampling methods, such as bootstrapping and permutations methods. The procedure of Westfall and Young (1993) requires a certain condition that does not always hold in practice (namely, subset pivotality).[4] The procedures of Romano and Wolf (2005a,b) dispense with this condition and are thus more generally valid.[5][6]

Alternative approaches

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

FWER control exerts a more stringent control over false discovery compared to false discovery rate (FDR) procedures. FWER control limits the probability of at least one false discovery, whereas FDR control limits (in a loose sense) the expected proportion of false discoveries. Thus, FDR procedures have greater power at the cost of increased rates of type I errors, i.e., rejecting null hypotheses of no effect when they should be accepted.[7]

On the other hand, FWER control is less stringent than per-family error rate control, which limits the expected number of errors per family. Because FWER control is concerned with at least one false discovery, unlike per-family error rate control it does not treat multiple simultaneous false discoveries as any worse than one false discovery. The Bonferroni correction is often considered as merely controlling the FWER, but in fact also controls the per-family error rate.[8]

References

  1. Lua error in package.lua at line 80: module 'strict' not found.
  2. Lua error in package.lua at line 80: module 'strict' not found.
  3. Lua error in package.lua at line 80: module 'strict' not found.
  4. Lua error in package.lua at line 80: module 'strict' not found.
  5. Lua error in package.lua at line 80: module 'strict' not found.
  6. Lua error in package.lua at line 80: module 'strict' not found.
  7. Lua error in package.lua at line 80: module 'strict' not found.
  8. Lua error in package.lua at line 80: module 'strict' not found.

External links