Question: Biopython - new MSA from specific columns of old MSA
1
gravatar for johnbmcarthur
5 months ago by
johnbmcarthur10 wrote:

I want to start by apologizing about the vague title and my overall ignorance. I have no idea what I'm doing, and the few scripts I've gotten to work have been pieced together from scraps of code taken from this and other websites rather than genuinely written by me. I'm working in Python using some biopython code, and that's the limit of my coding comfort zone so far.

I'd like to take an MSA and an input list of positions to create a new MSA (or other output file type) with all of the same rows but only the columns for positions specified in the list. Ideally the identifiers for the sequences in the first MSA will carry over to the output. I've had problems getting this to work:

positions = [18,20,26,33,83,86,87,88,133,517]
outputMSA = MultipleSeqAlignment([])
inputMSA = AlignIO.read("input.fst", "fasta")
for y in inputMSA:
    for x in positions:
        outputMSA.append(inputMSA[y,x-1])
print(outputMSA)

I get "TypeError: list indices must be integers or slices, not SeqRecord"

Assuming I could get this to work, the next step would be to take a reference sequence for those positions and the output MSA, and list all of the unique sequences in the output MSA, the frequency of each one, and the frequency of the reference sequence.

Example inputs/outputs:

positions = [3, 8, 9, 11]
input MSA:
IAMAWFLATTHIS
IAMAWFLATTHIS
IAMAWFLATTHIS
IA-AWFLATTHIS

output MSA:
MATH
MATH
MATH
-ATH

referenceseq = MASH

output analysis:
MASH 0%,
MATH 75%,
-ATH 25%

UPDATE: My probably inelegant solution:

inputMSA = AlignIO.read("input.fst", "fasta")
tempMSA = inputMSA[:,0:0]
for x in positions:
    outputMSA = tempMSA[:,:] + inputMSA[:, (x-1):x]
    tempMSA = outputMSA
print(outputMSA)

Still working on the second part.

alignment • 251 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by johnbmcarthur10

Please add updates as comments instead of editing the post.

ADD REPLYlink written 5 months ago by Ram32k

UPDATE: My probably inelegant solution:

inputMSA = AlignIO.read("input.fst", "fasta")
tempMSA = inputMSA[:,0:0]
for x in positions:
    outputMSA = tempMSA[:,:] + inputMSA[:, (x-1):x]
    tempMSA = outputMSA
print(outputMSA)

Still working on the second part. I thought this would be the easy part, but I'm stuck again.

ADD REPLYlink written 5 months ago by johnbmcarthur10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1430 users visited in the last hour
_