A STUDY OF RIDGE REGRESSION AND SOME EXAMPLES OF ITS APPLICATION IN MULTIVARIATE ANALYSIS
Ridge regression analysis is a procedure used in estimating the parameter (beta) in the linear model Y = X(beta) + (epsilon), when the X matrix is highly ill-conditioned or non-orthogonal. Considerable research has been devoted to the univector case where (beta) and (epsilon) are q x l and n x l, respectively, and X has dimension n x q. In this paper ridge regression is adapted to the multivector case, using the model, Y = XB + E where X is n x q, B is q x p and E is n x p. The correlations among the X variates are used to determine the extent of the ill-conditioning of the X matrix. The least squares estimator of B is a satisfactory estimator when X is orthogonal or nearly othogonal. However, it is determined that when the X matrix is highly ill-conditioned, the least squares estimates may be unstable. The ridge regression estimates are much more stable and tend to be better. We assume that the vectors E(,(alpha)), (alpha) = 1, 2, . . . . , n, are independently distributed with mean 0 and covariance matrix V(,(p x p)). Using this fact, the covariance matrix for the single row of pq estimators was obtained by Anderson. Using the same technique, the covariance matrix for the pq ridge estimators is derived in Chapter II. The determinant of the covariance matrix is used to measure the variance of the respective estimators. This measure, known as the generalized variance of the estimator, is used to compare the least squares and ridge estimators. Other methods used for the comparison of the estimators are discussed in Chapter II. The generalized mean square errors of the columns of the least squares and ridge estimators are compared, using the assumption that the rows of the Y matrix are independent. The methods discussed in Chapter II and the examples in Chapter III indicate that when the X matrix is highly ill-conditioned the measure of the generalized mean square error of the ridge estimator, B('*), is smaller than that of the least squares estimator, B. This is shown to be true if the diagonal elements of the covariance matrix V are not very small. A third measure of comparison for the estimators is the total mean square error, which is defined as the sum of the mean square errors of the columns of the estimators. Similarly, the total variance of an estimator, B, is defined as the sum of the variances of the individual elements of B. Sections 4.1 and 4.2 discuss several examples, which were used to illustrate the effect of small changes in the data matrix, and in the covariance matrix, on the estimates. Results indicate that the least squares estimates are much more sensitive to small changes in a highly correlated data matrix. In Section 4.5, we compare the predicted values obtained by the use of the ridge and least squares estimation for the examples used in Section 4.1 and Section 4.2. Results show that when a data matrix is highly ill-conditioned and the diagonal elements of V are not small, the ridge estimates give better prediction for Y, than the least squares estimates. Finally, in Section 4.6, a Monte Carlo study, using one thousand different sets of errors E, shows that for a specific data matrix X and covariance matrix V, the ridge estimates are very close to the true coefficients.