What is significance & Its Level


Significance in statistics, often referred to as statistical significance or level of significance ($\alpha$), is fundamentally the probability of rejecting a true null hypothesis ($H_0$).



  1. Definition and Measurement of Significance:

    • Hypothesis Testing is sometimes referred to as significance testing. It is the process of inferring from a sample whether to reject a certain statement about a population.
    • The level of significance ($\alpha$) is the probability that the statistical test results in rejecting the null hypothesis ($H_0$) when $H_0$ is actually true. This mistake is known as a Type I error.
    • By convention, a probability of less than 5% ($\alpha < 0.05$ or a 1/20 chance) is usually considered an unlikely event. If the difference is significant at the 5% level, it is often expressed as $p < 0.05$.
    • When results are considered "statistically significant," it means the sample data is incompatible with the null hypothesis, leading to its rejection in favor of the alternate hypothesis ($H_1$).
    • The $p$-value (or significance probability) is a post hoc measure of error. It is the probability, calculated assuming $H_0$ is true, that the test statistic takes a value equal to or more extreme than the value actually observed. A small $p$-value signifies a strong rejection of $H_0$.
  2. Statistical Tests for Significance (Hypothesis Testing): A wide range of inferential statistical tests are used to determine significance, typically categorized based on the type of data (continuous/discrete) and assumptions (parametric/nonparametric). These tests compare an observed test statistic (a ratio based on sample data) to a preset critical value or calculate a $p$-value to determine if the result is extreme enough to reject $H_0$.

    Common statistical tests employed for significance testing include:

    • Parametric Procedures (generally assume normality and homogeneity of variance):

      • $t$-Tests (used primarily when comparing one or two means, or paired data):
        • One-Sample $t$-Test.
        • Two-Sample $t$-Test.
        • Matched Pair $t$-Test (Paired $t$-Test).
      • Analysis of Variance (ANOVA) (used for comparing means of three or more groups, relying on the $F$-distribution).
      • $Z$-Tests (used for large samples, especially concerning proportions or means with known population variance):
        • $Z$-Test of Proportions (One-sample or Two-sample case).
    • Tests for Relationships and Association:

      • Correlation and Regression (to test if a relationship exists, usually $H_0: r_{xy} = 0$ or $H_0: \beta_1 = 0$).
      • Chi Square ($\chi^2$) Tests (used when only discrete variables are involved):
        • Chi Square Goodness-of-Fit Test.
        • Chi Square Test of Independence (or Test for Association).
        • Related tests: Fisher’s exact test, McNemar's test, Cochran-Mantel-Haenszel test.
    • Nonparametric Tests (alternatives used when assumptions like normality are not met):

      • Wilcoxon Signed Rank Test (alternative to paired $t$-test).
      • Wilcoxon Rank Sum Test (Mann–Whitney $U$ test, alternative to two-sample $t$-test).
      • Kruskal–Wallis Test (alternative to One-Way ANOVA).
      • Sign Test (alternative where $\mu$ is interpreted as the difference of medians).

Example of Shelf Life Calculation with No Variation


Based on the requirement that the three batches exhibit similarity (no significant difference), the stability data can be combined (pooled) to determine a single, unified shelf life.

The FDA guideline specifies that the expiration dating period (shelf life, $\xi$) is determined as the time point at which the $95%$ one-sided lower confidence limit for the mean degradation curve intersects the acceptable lower specification limit ($\eta$).

Here is a simulated example demonstrating this process for three similar batches ($K=3$).

1. Simulated Stability Study Data and Parameters

Objective: Determine the shelf life ($\xi$) for a drug product using three validation batches. Acceptable Lower Specification Limit ($\eta$): $90%$ of label claim. Model: Linear degradation ($Y = \alpha + \beta X + \epsilon$). Time Points ($X_j$): 0, 3, 6, 9, and 12 months ($n=5$ time points). Total Observations ($N$): $K \times n = 3 \times 5 = 15$.

The observed Potency (% Label Claim) data are simulated to be consistent with a common degradation rate of approximately $-0.5%$ per month, indicating high similarity across batches:

Batch (i)Time $X_j$ (Months)Potency $Y_{i,j}$ (%)
10100.2
398.6
697.1
995.3
1294.1
2099.9
398.3
696.9
995.6
1293.8
30100.0
398.5
697.0
995.4
1294.2

2. Preliminary Test for Batch Similarity

A preliminary statistical test for batch similarity (equality of slopes and intercepts) is conducted at a significance level of $0.25$.

Assumption: The statistical test demonstrates that the three batches are statistically similar (the null hypothesis of no difference in slopes and intercepts is not rejected). This justifies pooling the $N=15$ data points into one overall analysis.

3. Statistical Calculation (Pooled Data)

The Ordinary Least Squares (OLS) method is applied to the combined data set to estimate the common intercept ($\hat{\alpha}$) and common slope ($\hat{\beta}$).

ParameterCalculation Result (Pooled Data)
Mean Time ($\overline{X}$)6.0 months
Pooled Sum of Squares of X ($K\sum_{j=1}^{n}(x_{j}-\overline{x})^{2}$)90
Estimated Intercept ($\hat{\alpha}$)$100.40$ (Potency %)
Estimated Slope ($\hat{\beta}$)$-0.50$ ($-%$ per month)
Mean Squared Error (MSE)$0.038$
Degrees of Freedom (N-2)13
$t$-value ($t(0.95, 13)$)$\approx 1.771$

The pooled mean degradation curve is: $\hat{Y}(X) = 100.40 - 0.50 X$

4. Determination of Tentative Shelf Life ($\xi$)

The tentative shelf life ($\xi$) is the solution to the equation where the lower $95%$ confidence bound intersects the lower specification limit ($\eta=90$):

$$ \eta = \hat{\alpha} + \hat{\beta}\xi - t(.95)S(\xi) $$

Where $S(\xi)$ is the standard error of the estimated mean degradation curve at time $\xi$:

$$S^{2}(\xi) = \text{MSE} \left\{ \frac{1}{N} + \frac{(\xi-\overline{X})^{2}}{K\sum_{j=1}^{n}(x_{j}-\overline{x})^{2}} \right\}$$

Substituting the calculated pooled values:

$$ 90 = 100.40 - 0.50\xi - 1.771 \sqrt{0.038 \left( \frac{1}{15} + \frac{(\xi-6)^{2}}{90} \right)} $$

Solving this equation for $\xi$ yields the estimated shelf life:

$$ \hat{\xi} \approx 20.1 \text{ months} $$

5. Conclusion

The estimated tentative shelf life is $\mathbf{20.1}$ months.

Since the batches were determined to be similar, pooling the data was justified, resulting in a narrower confidence limit due to the larger degrees of freedom ($N-2=13$) and improved precision. This yielded a statistically determined shelf life of $20.1$ months, based on the time point where the lower $95%$ confidence boundary for the mean degradation profile of the combined batches meets the $90%$ specification limit. 


Ps: I am using NotebookLM to create this blog.

How to calcuate drug shelf life

 Shelf life, or the expiration dating period, is defined as the interval that a drug product is expected to remain within the approved specifications after manufacture. The calculation of the shelf life is the primary objective of a stability study.

The general method for determining the shelf life, as recommended by the FDA and ICH guidelines, involves statistical analysis of stability data:

Primary Calculation Method (Long-Term Stability)

The shelf life is determined as the time point at which the 95% one-sided lower confidence limit for the mean degradation curve intersects the acceptable lower specification limit ($\tau_{\eta}$).

  1. Modeling Degradation: The stability data, typically using percent of label claim as the primary variable, are fitted to a mathematical relationship.

    • The degradation relationship can usually be represented by a linear, quadratic, or cubic function on an arithmetic or logarithmic scale.
    • For characteristics expected to decrease (e.g., strength), the 95% one-sided lower confidence limit is used.
    • For characteristics expected to increase (e.g., degradation products), the 95% one-sided upper confidence limit is used.
  2. Statistical Calculation (Linear Model): Assuming the strength decreases linearly over time (a zero-order reaction), the expected degradation is modeled by linear regression, $E(Y_{j}) = \alpha + \beta\lambda_{j}$.

    • The shelf life ($x_{L}$) is calculated by solving the quadratic equation that results from setting the 95% lower confidence limit for the mean degradation line, $L(x)$, equal to the lower specification limit, $\tau_{\eta}$. $x_{L}$ is the smaller root of this equation.
    • It is not acceptable to determine the expiration dating period by simply finding where the fitted least-squares line intersects the specification limit (which would only provide a 50% confidence level).

Handling Multiple Batches

When multiple batches (a minimum of three) are tested, the approach depends on batch-to-batch variability:

  • Pooling Data: If analysis shows that the batch-to-batch variability is small (e.g., slopes and intercepts are sufficiently similar, sometimes assessed using a significance level of 0.25), the data from different batches may be combined into one overall estimate to establish a single, more precise shelf life.
  • Minimum Approach (Fixed Effects): If it is inappropriate to combine data due to significant batch-to-batch variability, the overall expiration dating period may be based on the minimum of the individual shelf lives estimated from each batch. This is considered a conservative estimate.
  • Random Batch Effects (Advanced Methods): For establishing a shelf life applicable to all future production batches, statistical methods incorporating random batch effects are used (e.g., Chow and Shao's approach or the HLC method). These methods include the between-batch variability when constructing the confidence limit for the mean degradation curve.

Tentative Shelf Life (Accelerated Testing)

Accelerated stability testing (or stress testing) is used primarily to predict a tentative expiration dating period in a shorter timeframe by increasing the rate of chemical or physical degradation under exaggerated conditions.

The prediction relies on kinetic models:

  1. Reaction Order: The analysis involves empirically determining the order of the reaction (e.g., zero-order for linear degradation or first-order for logarithmic degradation).
  2. Arrhenius Equation: The relationship between the degradation rate and temperature is characterized using the Arrhenius equation.
  3. Extrapolation: The tentative shelf life is obtained by extrapolating the relationship to ambient (marketing) storage conditions.

"lower.tail" confusion in R.

 "lower.tail" in R 

I usually get confused on how to use the argument in "pt" function and similar function. Here I will focus on t-distribution. I will utilize Minitab for graphical presentation.

A:- lower.tail is FALSE 

    In R

The code is 
> pt(q = -2.262, df = 9, lower.tail = FALSE)
The output 
[1] 0.9749936

    In Minitab

This is shown as in the graph from Minitab.



So when FASLE is chosen, the calculation will give the area after the critical value.

B:- "lower.tail" is TRUE

    In R 

the code is 
 pt(q = -2.262, df = 9, lower.tail = TRUE)
the output is 
[1] 0.02500642

    In Minitab


When TRUE is used, it orders R to compute the area before the critical value.

How to compute the probability between two values using t-distribution in R.

 To compute the "P" between two cut offs in t-distribution (two points) in R.

The example uses the d.f. = 9 i.e. n = 10 , first quantile -2.262, the second quantile is 2.262.

I used Minitab to give a graphical representation of that as below:-



The code in R to use is as follow:- 

pt(q = 2.262, df = 9, lower.tail = TRUE) - pt(q = -2.262, df = 9, lower.tail = TRUE)

I recomend to play a little with above code, to find out the argument "lower.tail" , when it is TRUE and FALSE.

The output from R is :-

[1] 0.9499872

Which when rounded, it will be 0.95 as Minitab.

P.S 

d.f. ; degree of freedom.

What is significance & Its Level

Significance in statistics, often referred to as statistical significance or level of significance ($\alpha$), is fundamentally the proba...