American University
Browse
- No file added yet -

DETECTING PATTERNS IN DATA: A NEW STATISTIC FOR SMOOTHNESS AND NONRANDOMNESS (RESIDUALS, MODELING, RANDOMNESS TESTS)

Download (2.65 MB)
thesis
posted on 2023-09-06, 02:57 authored by Peter Jonathan Munson

The problem of detecting a pattern in data obscured by noise is common in much of science. An intuitive and often unstated assump- tion about this pattern is that it is smooth. We therefore consider the intuitive concept of "smoothness" as the basis of pattern detection. This concept is formalized in the context of mathematical splines and leads to a convenient, powerful nonrandomness measure called the curvature statistic. We define roughness of a function as the integrated squared second derivative R(f) = (INT)f''('2)dx. It is known that, of functions inter- polating a set of data, the cubic spline g is smoothest, i.e. R(g) = Inf(,f)R(f). We show that R(g) may be calcuted as a quadratic form (')y('T)A(')y in the vector of observations (')y. The matrix A is given as the product of tridiagonal and the inverse of tridiagonal matrices. The curvature statistic is defined to be T((')y) = (')y('T)A(')y/(SIGMA)(y - y)('2). Under the null assumption that the y(,i) are independent normally distributed, the distribution of T may be approximated by matching its low order moments to a Beta-Jacobi polynomial series. The curvature statistic T is closely related to the well known mean square successive differences (MSSD). The numerator of the MSSD is exactly (INT)h'('2)dx for the interpolating linear spline, h. This observation leads to a generalized MSSD (GMSSD) for unequally spaced data, specifically (SIGMA)(y(,i+1)-y(,i))('2)/(x(,i+1)-x(,i)) /(SIGMA)(y(,i)-y)('2). An analo- gous statistic, based on properties of the smoothing spline, is named the penalized curvature statistic, T(,(lamda)). In a Monte Carlo study, the curvature statistic is shown to have more power for detecting smooth patterns compared to existing approaches, when the pattern sought is moderately complex and the expected level of contamination of the underlying deterministic function is small. The advantages of using the curvature statistic increase when data are non-uniformly spaced and the number of data points is increased. The penalized curvature statistic is more resistant to large error contamination levels yet retains most of the power to detect complex patterns, and is thus to be recommended. The calculation and use of this statistic is illustrated on several problems involving regression residuals.

History

Publisher

ProQuest

Language

English

Notes

Ph.D. American University 1986.

Handle

http://hdl.handle.net/1961/thesesdissertations:2252

Media type

application/pdf

Access statement

Part of thesis digitization project, awaiting processing.

Usage metrics

    Theses and Dissertations

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC