Does missing data(?) have a significant impact on phylogenetic studies?
0
0
Entering edit mode
18 months ago
hojoun • 0

Hello, I am researching the intraspecific population genetics of snakes. (using ML,BI tree and Relaxedmolecular clock..)

My concatanated sequences are 2392 bp.

gene1 : 1113bp gene2 : 1279bp

in outgroup species, the gene 1 is complete(1113bp), but the Gene 2 is 696 bp.

In this situation, can it be used for phylogenetic analysis by adding missing data (symbol is ?) as much as the difference in length in gene2?

please help!! I have to graduate!! The black square is missing data.

data missing phylogenetic tree • 944 views
ADD COMMENT
0
Entering edit mode

While missing data can have an important effect (there are lots of papers about this, e.g., this one), based on your description, I am confused why you would be adding missing data symbols. Have you already generated a multiple sequence alignment from your data? That would be a step you want to take before any phylogenetic inference.

ADD REPLY
0
Entering edit mode

MUSCLE alignment was already performed on 74 sequences with two outgroups. The reason for adding the missing letter was because there was "no sequence data" in the corresponding region in outgroup. but in the ingroup, there were a certain mutations in the corresponding length for each population. Then, without adding missing data, can I set the outgroup with 696 bp and use it for analysis?

ADD REPLY
0
Entering edit mode

I guess I'm still confused. If the outgroup sequences were used in your MSA, then there should be no need to add any "missing data" symbols because the alignment should already look something like this toy example:

>out1
----atcgggc
>out2
----atcgcgc
>in1
ttcgatcgggc
>in2
ttcgatcgggt
>in3
ttcgatcgggt
>in4
ttcgatcggac

In the above, the outgroup sequences have gaps at the beginning of the sequence because they are shorter than the ingroup sequences. Leaving aside the question of why the outgroup sequences are shorter (deletion in the outgroup, insertion in the ingroup, technical artifact, etc.), most mainstream phylogenetics tools (at least those that I am aware of) do not explicitly model insertion/deletions in the analysis and will treat gaps like missing data.

Assuming your MSA looks something vaguely like my toy example, you should be able to use it in your phylogenetic analysis. That said, do no be surprised if having outgroups that possess dramatically less information relative to the ingroup may introduce some biases to the inference.

ADD REPLY
0
Entering edit mode

thank you!!! It was grate help to me.

ADD REPLY

Login before adding your answer.

Traffic: 1555 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6