[UGENE-1020] Revise multiple alignment similarity/dissimilarity measure - Jira

XML

Word

Printable

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.11.1
Component/s: Basic-MSA
Labels:
None

Affect Type:
Userdefined

Description

Currently the multiple alignment similarity measure is incorrect.
Hamming distance measures a dissimilarity of two sequences and means "How many substitutions is needed to get one sequence from another".

There must be two distance algorithms: 1) "Hamming distance" for dissimilarity and 2) "Simple similarity" for similarity.
They use the following weight schemes:
1)
w("A", "T") = 1
w("A", "-") = w ("-", "A") = 0 or 1 (depends on "Exclude gaps" option that will be added in the dialog)
w("-", "-") = 0
w("A", "A") = 0

2) w("A", "T") = 0
w("A", "-") = w ("-", "A") = 0
w("-", "-") = 0
w("A", "A") = 1

A measure is a total weight of all pairs of characters in two sequences. It is recommended to align sequences to get a better value of a measure.

There are two ways to show the measure: pure weight value and similarity/dissimilarity estimation in percent. In percentage case, the value must be calculated as weight value divided on min(len1, len2), where len1 is a number of non-gap characters in the first sequence and len2 is a number of non-gap characters in the second sequence.

Also the distance matrix view must be revised. It must show similarity or dissimilarity depending on algorithm chosen.

Attachments

Activity

People

Assignee:: Yura Vaskin

Reporter:: Yura Vaskin

Watchers:: 1 Start watching this issue

Dates

Created:: 25/May/12 2:58 PM

Updated:: 11/Jun/17 11:34 AM

Resolved:: 05/Jul/12 11:20 AM