American University
Browse
thesesdissertations_2593_OBJ.pdf (5.95 MB)

Distribution of errors and methods of inference for automated DNA sequencing

Download (5.95 MB)
thesis
posted on 2023-08-04, 15:00 authored by Gregory E. Alexander

For a given input DNA fragment (clone), we study the distribution of the output sequence (observation) from an automated sequencer. Primary emphasis is placed on obtaining likelihood based procedures associated with the important computational tasks involved in a shotgun sequencing strategy: ranking multiple sequence alignments, estimating a consensus sequence, and assessing the confidence of a reconstruction. These tasks rely on the important problem of deciding which pairs of data sequences arise from overlapping fragments. The overlap detection problem is formulated from the standpoint of point estimation as well as test of hypotheses. Using theoretical models for data under no errors and under substitution type errors, the performance of maximum likelihood, maximum posterior estimates for overlap are compared. Likelihood ratio tests of overlap versus no overlap are evaluated. The goal there is to understand the relationship between different error processes on procedures to detect when cloned fragments overlap. Because the underlying biochemical mechanisms responsible for the sequence reading errors are poorly understood, we have proposed new definitions and methods for characterizing error events when discrepancies are evident. Using results from our data study, a probabilistic model, Run Extension and Contraction (RECO), for sequence read errors is developed. Methods for parameter estimation and overlap detection under the RECO model are described. Analysis of two independent experimentally derived data sets demonstrates that our RECO model provides a good fit to the majority of sequence read errors.

History

Publisher

ProQuest

Language

English

Notes

Ph.D. American University 1997.

Handle

http://hdl.handle.net/1961/thesesdissertations:2593

Media type

application/pdf

Access statement

Unprocessed

Usage metrics

    Theses and Dissertations

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC