Test StatisticPermalink

A test is based on a statistic, which estimates the parameter that appears in the hypotheses – Point estimate

Values of the estimate far from the parameter value in H0 give evidence against H0.

Ha determines which direction will be counted as “far from the parameter value”.

Commonly, the test statistic has the form

T=estimate - hypothesized valuestandard deviation of the estimate

One-Sample T Test: Test StatisticPermalink

Parameter μ with hypothesized value μ0

Estimate ˉX with observed value ˉx, and estimated standard deviation s/n

Test statistics

T=ˉXμ0s/n

with observed value

t=ˉxμ0s/n

State null and alternative hypothesis:

μμ0H0:μ=μ0vs.Ha:μ>μ0μ<μ0

p-value equals, assuming H0 holds

2P(T|t|)P(Tt)P(Tt)

Hypothesis Testing: Type I and Type II ErrorsPermalink

-w849

To limit the chance of a Type I Error to a chosen level α:

  • referred to as significance level
  • upper bound on Type I error
  • commonly set at 5%

Reject H0 when the p-value <= α

If so, we claim that the data support the alternative Ha at level α, or

– The data are statistically significant at level α

Relation between P-value and significance level α :

  • Reject H0 if p-value <= α
  • Do not reject H0 if p-value > α.

Two-Sample t-Test for Equal MeansPermalink

Purpose: Test if two population means are equalPermalink

The two-sample t-test (Snedecor and Cochran, 1989) is used to determine if two population means are equal. A common application is to test if a new process or treatment is superior to a current process or treatment.

There are several variations on this test.

  1. The data may either be paired or not paired. By paired, we mean that there is a one-to-one correspondence between the values in the two samples. That is, if X1, X2, …, Xn and Y1, Y2, … , Yn are the two samples, then Xi corresponds to Yi. For paired samples, the difference Xi - Yi is usually calculated. For unpaired samples, the sample sizes for the two samples may or may not be equal. The formulas for paired data are somewhat simpler than the formulas for unpaired data.

  2. The variances of the two samples may be assumed to be equal or unequal. Equal variances yields somewhat simpler formulas, although with computers this is no longer a significant issue.

  3. In some applications, you may want to adopt a new process or treatment only if it exceeds the current treatment by some threshold. In this case, we can state the null hypothesis in the form that the difference between the two populations means is equal to some constant μ1−μ2=d0 where the constant is the desired threshold.

Definition of Two-Sample t-testPermalink

The two-sample t-test for unpaired data is defined as:

H0:μ1=μ2Ha:μ1μ2 Test Statistic: T=ˉY1ˉY2s21/N1+s22/N2

where N1 and N2 are the sample sizes, ˉY1 and ˉY2 are the sample means, and s21 and s22 are the sample variances.

Significance Level: α

Critical Region: Reject the null hypothesis that the two means are equal if

|T|>t1α/2,v

where t1α/2,v is the critical value of t-distribution with v degrees of freedom.

For the unequal variance case:

v=(s21/N1+s22/N2)2(s21/N1)2/(N11)+(s22/N2)2/(N21)

For the equal variance case:

v=N1+N22

Two-Sample t-Test ExamplePermalink

The following two-sample t-test was generated for the AUTO83B.DAT data set. The data set contains miles per gallon for U.S. cars (sample 1) and for Japanese cars (sample 2); the summary statistics for each sample are shown below.

SAMPLE 1

1
2
3
4
    NUMBER OF OBSERVATIONS      = 249
    MEAN                        =  20.14458
    STANDARD DEVIATION          =   6.41470
    STANDARD ERROR OF THE MEAN  =   0.40652

SAMPLE 2

1
2
3
4
    NUMBER OF OBSERVATIONS      = 79
    MEAN                        = 30.48101
    STANDARD DEVIATION          =  6.10771
    STANDARD ERROR OF THE MEAN  =  0.68717

We are testing the hypothesis that the population means are equal for the two samples. We assume that the variances for the two samples are equal.

H0:μ1=μ2
Ha:μ1μ2

Test statistic: T=12.62059

Pooled standard deviation: sp=6.34260

Degrees of freedom: v=326

significance level: α=0.05

Critical value (upper tail): t1α/2,v=1.9673

Critical region: Reject H0 if |T|>1.9673

The absolute value of the test statistic for our example, 12.62059, is greater than the critical value of 1.9673, so we reject the null hypothesis and conclude that the two population means are different at the 0.05 significance level.

In general, there are three possible alternative hypotheses and rejection regions for the one-sample t-test:

 Alternative Hypothesis  Rejection Region Ha:μ1μ2|T|>t1α/2,vHa:μ1>μ2T>t1α,vHa:μ1<μ2T<tα,v

For our two-tailed t-test, the critical value is t1α/2,v=1.9673, where α = 0.05 and ν = 326. If we were to perform an upper, one-tailed test, the critical value would be t1α/2,v=1.6495. The rejection regions for three possible alternative hypotheses using our example data are shown below.

-w700

Engineering Statistics

One way ANOVAPermalink

One-way ANOVA overviewPermalink

In an analysis of variance, the variation in the response measurements is partitoined into components that correspond to different sources of variation.

The goal in this procedure is to split the total variation in the data into a portion due to random error and portions due to changes in the values of the independent variable(s).

The variance of n measurements is given by

s2=ni=1(yiˉy)2n1,

where ˉy is the mean of the n measurements.

The numerator part is called the sum of squares of deviations from the mean, and the denominator is called the degrees of freedom.

The SS in a one-way ANOVA can be split into two components, called the “sum of squares of treatments” and “sum of squares of error”, abbreviated as SST and SSE, respectively.

Algebraically, this is expressed by

SS(Total)=SST+SSEki=1nij=1(yijˉy)2=ki=1ni(ˉyiˉy)2+ki=1nij=1(yijˉyi)2,

where k is the number of treatments and the bar over the y denotes the “grand” or “overall” mean. Each ni is the number of observations for treatment i. The total number of observations is N (the sum of the ni).

We introduced the concept of treatment. The definition is: A treatment is a specific combination of factor levels whose effect is to be compared with other treatments.

The one-way ANOVA model and assumptionsPermalink

The mathematical model that describes the relationship between the response and treatment for the one-way ANOVA is given by

Yij=μ+τi+ϵij,

where Yij represents the j-th observation (j=1,2,…,ni) on the i-th treatment (i=1,2,…,k levels). So, Y23 represents the third observation using level 2 of the factor. μ is the common effect for the whole experiment, τi represents the i-th treatment effect, and ϵij represents the random error present in the j-th observation on the i-th treatment.

Fixed effects model

The errors ϵij are assumed to be normally and independently (NID) distributed, with mean zero and variance σ2ε. μ is always a fixed parameter, and τ1,τ2,,τk are considered to be fixed parameters if the levels of the treatment are fixed and not a random sample from a population of possible levels. It is also assumed that μ is chosen so that

τi=0,i=1,,k

holds. This is the fixed effects model.

Random effects mode

If the k levels of treatment are chosen at random, the model equation remains the same. However, now the τi values are random variables assumed to be NID(0,στ). This is the random effects model.

Whether the levels are fixed or random depends on how these levels are chosen in a given experiment.

The ANOVA table and tests of hypotheses about meansPermalink

Sums of Squares help us compute the variance estimates displayed in ANOVA Tables.

The sums of squares SST and SSE previously computed for the one-way ANOVA are used to form two mean squares, one for treatments and the second for error. These mean squares are denoted by MST and MSE, respectively. These are typically displayed in a tabular form, known as an ANOVA Table. The ANOVA table also shows the statistics used to test hypotheses about the population means.

When the null hypothesis of equal means is true, the two mean squares estimate the same quantity (error variance), and should be of approximately equal magnitude. In other words, their ratio should be close to 1. If the null hypothesis is false, MST should be larger than MSE.

Let N=ni. Then, the degrees of freedom for treatment are

DFT=k1,

and the degrees of freedom for error are

DFE=Nk.

The corresponding mean squares are:

MST=SST/DFT
MSE=SSE/DFE

F-testPermalink

The test statistic, used in testing the equality of treatment means is: F=MST/MSE.

The critical value is the tabular value of the F distribution, based on the chosen α level and the degrees of freedom DFT and DFE.

The calculations are displayed in an ANOVA table, as follows:

 Source  SS  DF  MS  F  Treatments SSTk1SST/(k1)MST/MSE Error SSENkSSE/(Nk) Total (corrected) SSN1

The word “source” stands for source of variation. Some authors prefer to use “between” and “within” instead of “treatments” and “error”, respectively.

ANOVA Table ExamplePermalink

The data below resulted from measuring the difference in resistance resulting from subjecting identical resistors to three different temperatures for a period of 24 hours. The sample size of each group was 5. In the language of design of experiments, we have an experiment in which each of three treatments was replicated 5 times.

 Level 1  Level 2  Level 3 6.98.38.05.46.810.55.87.88.14.69.26.94.06.59.3mean5.347.728.56

The resulting ANOVA table is

 Source  SS  DF  MS  F  Treatments 27.897213.9499.59 Error 17.452121.454 Total (corrected) 45.34914 Correction Factor 779.0411

The test statistic is the F value of 9.59. Using an α of 0.05, we have F0.05;2,12=3.89.

Since the test statistic is much larger than the critical value, we reject the null hypothesis of equal population means and conclude that there is a (statistically) significant difference among the population means. The p-value for 9.59 is 0.00325, so the test statistic is significant at that level.

The populations here are resistor readings while operating under the three different temperatures. What we do not know at this point is whether the three means are all different or which of the three means is different from the other two, and by how much.

There are several techniques we might use to further analyze the differences. These are:

Confidence intervals for the difference of treatment meansPermalink

This page shows how to construct a confidence interval around (μi−μj) for the one-way ANOVA by continuing the example shown on a previous page.

The formula for a 100(1−α) % confidence interval for the difference between two treatment means is:

(^μi^μj)±t1α/2,Nkˆσ2ϵ(1ni+1nj),

where ˆσ2ϵ=MSE.

For the example, we have the following quantities for the formula

  • ˉy3=8.56
  • ˉy1=5.34
  • 1.454(1/5+1/5)=0.763
  • t0.975,12=2.179
Here the degree of freedom is from ˆσ2ϵ. As a result, in the t-statistic, the DOF is still the DOF of ˆσ2ϵ, which is exactly N-k=15-3=12.

Substituting these values yields (8.56 - 5.34) ± 2.179(0.763) or 3.22 ± 1.663.

That is, the confidence interval is (1.557, 4.883).

A 95 % confidence interval for μ3−μ2 is: (-1.787, 3.467).

A 95 % confidence interval for μ2−μ1 is: (-0.247, 5.007).

Application: Employee Performance StudyPermalink

“Which of two prospective job candidates should we hire for a position that pays 80,000: the internal manager or the externally recruited manager?”

Data set:

  • 150 managers: 88 internal and 62 external
  • Manager Rating is an evaluation score of the employee in their current job, indicating the “value” of the employee to the firm
  • Origin is a categorical variable that identifies the managers as either External or Internal to indicate from where they were hired
  • Salary is the starting salary of the employee when they were hired. It indicates what sort of job the person was initially hired to do. In the context of this example, it does not measure how well they did that job. That’s measured by the rating variable.

Two Sample Comparison: Manager Rating vs Origin

-w800

We can recognize a significant difference between the means via two-sample t-test.

One-way ANOVA

-w600

Regress Manager Rating on Origin:

-w516

  • The difference in the rating (-0.72) between internal and external managers is significant since the p-value = .003 < .05.
  • In terms of regression, Origin explains significant variation in Manager Rating.
  • Before we claim that the external candidate should be hired, is there a possible confounding variable, another explanation for the difference in rating?
  • Let’s explore the relationship between Manager Rating and Salary.

Scatterplot of Manager Rating vs. Salary

-w457

  • (a) Salary is correlated with Manager Rating, and (b) that external managers were hired at higher salaries
  • This combination indicates confounding: not only are we comparing internal vs. external managers; we are comparing internal managers hired into lower salary jobs with external managers placed into higher salary jobs.
  • Easy fix: compare only those whose starting salary near $80K. But that leaves too few data points for a reasonable comparison.

Separate Regressions of Manager Rating on Salary

-w872

  • Based on the regression, at any given salary, internal managers is expected to get higher average ratings!
  • In regression, confounding is a form of collinearity.
    • Salary is related to Origin which was the variable used to explain Rating.
    • With Salary added, the effect of Origin changes sign. Now internal managers look better.

Are the Two Fits Significantly Different?

-w512

  • The two confidence bands overlap, which make the comparison indecisive.
  • A more powerful idea is to combine these two separate simple regressions into one multiple regression that will allow us to compare these fits.

Regress Manager Rating on both Salary and Origin

-w826

  • x1 dummy variable of being ’Internal’, I(Origin = Internal)
  • Notice that we only require one dummy variable to distinguish internal from external managers.
  • This enables two parallel lines for two kinds of managers.
    • If Origin = External, Manager Rating = -2.100459 + 0.107478 Salary
    • If Origin = Internal, Manager Rating = -2.100459 + 0.107478 Salary + 0.514966
  • The coefficient of the dummy variable is the difference between the intercepts.
  • The difference between the intercepts is significantly different from 0, since 0.0149, the p-value for Origin[Internal], is less than 0.05.
  • Thus, if we assume the slopes are equal, a model using a categorical predictor implies that controlling for initial salary, internal managers rate significantly higher.
  • How can we check the assumption that the slopes are parallel?

Model with Interaction: Different Slopes

Interaction. Beyond just looking at the plot, we can fit a model that allows the slopes to differ. This model gives an estimate of the difference between the slopes. This estimate is known as an interaction.

An interaction between a dummy variable Ik and a numerical variable xi measures the difference between the slopes of the numerical variable in the two groups:

XiIk

-w854

Interaction variable – product of the dummy variable and Salary:

 originlnternal:salary = salary  if Origin = Internal =0 if Origin = External 
  • If Origin = External:
    • Manager Rating = -1.94 + 0.11 Salary
  • If Origin = Internal:
    • Manager Rating = (-1.94+0.24) + (0.11+0.0037) Salary= -1.69 + 0.11 Salary
  • These equations match the simple regressions fit to the two groups separately. The interaction is not significant because its p-value is large.

Principle of Marginality

  • Leave main effects in the model (here Salary and Origin) whenever an interaction that uses them is present in the fitted model. If the interaction is not statistically significant, remove the interaction from the model.
  • Origin became insignificant when Salary∗Origin was added, which is due to collinearity.
  • The assumption of equal error variance should also be checked by comparing box-plots of the residuals grouped by the levels of the categorical variable.

-w378

Summary of this example

  • Categorical variables model the differences between groups using regression, while taking account of other variables.
  • In a model with a categorical variable, the coefficients of the categorical terms indicate differences between parallel lines.
  • In a model that includes interactions, the coefficients of the interaction measure the differences in the slopes between the groups.
  • Significant categorical variable ⇒ different intercepts
  • Significant interaction ⇒ different slopes

Statistical Significance and Practical SignificancePermalink

When drawing conclusions from a hypothesis test, it is important to keep in mind the difference between Statistical and Practical Significance.

  • Statistical Significance : We can be sure that !” is false i.e. the difference from the hypothesized value is too large to be attributed to chance. Statistics can answer this question.
  • Practical Significance : Is the difference large enough that in practice we care? Statistics can not answer this one!

Comments