Find A (Gapped) Sequence Within A Multiple Sequence Alignment
0
0
Entering edit mode
10.9 years ago
sorrywm ▴ 10

I have a large multiple-sequence alignment (~70,000 columns) from which I would like to extract a small number (~1,000) of columns. Specifically, I want the columns that correspond to particular positions in the ungapped sequence of one of the records in the MSA.

One way to do this (I think) would be to cycle over the columns and keep a tally of how many non-gap base pairs have been parsed for the record in question, or even create a 1:1 mapping (storing which column contains the n-th non-gap base pair). But is there a faster way?

I am most familiar with biopython, so a solution in that framework would be easiest.

Thank you!

msa alignment biopython • 2.3k views
ADD COMMENT
0
Entering edit mode

You could use GBlocks to remove all the gapped columns...but then that would throw off your position numbers

ADD REPLY

Login before adding your answer.

Traffic: 2898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6