Distribution at Contingency of Alignment of Two Literal Sequences under Constrains

Lorentz Jäntschi; Sorana D. Bolboaca*

doi:10.11145/77

Distribution at Contingency of Alignment of Two Literal Sequences under Constrains

Authors

Lorentz Jäntschi Technical University of Cluj-Napoca
Sorana D. Bolboaca* Department of Medical Infromatics and Biostatistics "Iuliu Hatieganu" University of Medicine and Pharmacy Cluj-Napoca

DOI:

https://doi.org/10.11145/77

Abstract

deoxyribonucleicacid), RNA, (ribonucleic acid) or protein (amino-acid) sequences toidentify similar regions that could reflect functional, structural or evolutionaryrelationships between sequences [1], is frequently used nowadays dueto huge amount of already identified sequence of DNA, RNA, or proteins[2]. Several algorithms were developed and implement for global or localalignments, and each having advantages and disadvantages [3] and [4].Our research started from the hypothesis that the distribution of alignmentscould provide useful information about the chance that a certainalignment occur or not by chance. We present here a statistical approachbased on distribution analysis that is able to identify the thresholds for rejectingan alignment by chance under the supposition that each literal hasat least one alignment in any case. For two literal sequences, we define thealignment through the frequency of matches (with 0 meaning no alignmentand n meaning perfect alignment, where n is the number of nucleotides oramino-acids in the two equal length sequences). A closed form of the probabilitydistribution function of the alignment was obtained. We providedthat the cumulative distribution function have (unfortunately) no generalclosed form. Anyway, a series of statistics (including mode and central momentstill order 4) were obtained with closed forms. By using the formulafor the cumulative probability of an alignment, for the particular case offour literals alignment, thresholds to reject the alignment by chance wereobtained as follow: 70% for n > 8; 60% for n > 13; 55% for n > 21; 50%for n > 39; 45% for n > 282; 44% for n в†’в€ћ.

Downloads

Published

2013-04-27

Issue

Pilot Issue

Section

Conference Contributions

License

The journal Biomath Communications is an open access journal. All published articles are immeditely available online and the respective DOI link activated. All articles can be access for free and no reader registration of any sort is required. No fees are charged to authors for article submission or processing. Online publications are funded through volunteer work, donations and grants.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License 4.0 that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Distribution at Contingency of Alignment of Two Literal Sequences under Constrains

Authors

DOI:

Abstract

Downloads

Published

Issue

Section

License

Cover

ISSN

Index

Journals