Question

How minimap2 mapping works

0

Entering edit mode

3.3 years ago

SUDOsundu ▴ 80

Hello,

I mapped the trimmed nanopore reads with a single reference fasta file. I got more coverage depth. Then I map the same reads with a file which contains multiple fasta files including the sequence which I mapped previously. When I visualized the BAM in UGENE, Tablet, I found more depth in first BAM file rather than the second. Why the coverage is different between the two BAM?

minimap2 UGENE • 1.3k views

ADD COMMENT • link 3.3 years ago by SUDOsundu ▴ 80

score 3 · Answer 1 · 2021-01-06

3

Entering edit mode

3.3 years ago

GenoMax 141k

If you are widening the search space then it is not surprising that overall depth of alignment went down for smaller of the two. Reads must be aligning better to additional sequences. This is reason why one should not use a reduced reference if data is from whole genome.

ADD COMMENT • link 3.3 years ago by GenoMax 141k

0

Entering edit mode

Thanks I am aligning with plant virus genome. It it is a multi component eg. 3 fasta sequences DNA A, B, C. 4 to 5 kb size of each genome. It also has common regions among them. Should I align individually?

ADD REPLY • link 3.3 years ago by SUDOsundu ▴ 80

1

Entering edit mode

Hi, if you post a question you should most importantly give details to understand your experiment. It is unclear what you even sequenced, therefore hard to give advise. In general it is good practice to align to all sequences that your reads possible could come from.

ADD REPLY • link 3.3 years ago by ATpoint 81k

0

Entering edit mode

Do you want reads to match to the location that they are most similar to or do you want to ensure that they only match to one organism (reference?). Choose depending on the goals you have.

ADD REPLY • link 3.3 years ago by Istvan Albert 100k

0

Entering edit mode

Sorry @ATpoint I was in a hurry. My aim is to characterize virus genome from infected plant. I sequenced the infected plant genome. It is a known virus but reference sequence is not available for the particular subgroup. It is a tripartite virus. The virus has 3 different DNA with 3 kb length eg. DNA A, DNA B DNA C. I am trying to do consensus generation by aligning with the available sequences in that subgroup. So I pulled full length available sequences (20 sequences) of the virus from ncbi and mapped.against them. From the above comments I understand that I have widened the search space, so depth for each reference went down. The three genome has common regions. If I need to get the consensus sequence, should I align with a single reference file containing the 3 fasta sequence or should I map it separately. From what I understood If I map it individually I may get more depth. Since the 3 DNA has common regions I need to map the long reads against all 3 to get the best mapping. Please correct me.

ADD REPLY • link 3.3 years ago by SUDOsundu ▴ 80