I want to start by apologizing about the vague title and my overall ignorance. I have no idea what I'm doing, and the few scripts I've gotten to work have been pieced together from scraps of code taken from this and other websites rather than genuinely written by me. I'm working in Python using some biopython code, and that's the limit of my coding comfort zone so far.
I'd like to take an MSA and an input list of positions to create a new MSA (or other output file type) with all of the same rows but only the columns for positions specified in the list. Ideally the identifiers for the sequences in the first MSA will carry over to the output. I've had problems getting this to work:
positions = [18,20,26,33,83,86,87,88,133,517] outputMSA = MultipleSeqAlignment() inputMSA = AlignIO.read("input.fst", "fasta") for y in inputMSA: for x in positions: outputMSA.append(inputMSA[y,x-1]) print(outputMSA)
I get "TypeError: list indices must be integers or slices, not SeqRecord"
Assuming I could get this to work, the next step would be to take a reference sequence for those positions and the output MSA, and list all of the unique sequences in the output MSA, the frequency of each one, and the frequency of the reference sequence.
positions = [3, 8, 9, 11] input MSA: IAMAWFLATTHIS IAMAWFLATTHIS IAMAWFLATTHIS IA-AWFLATTHIS output MSA: MATH MATH MATH -ATH referenceseq = MASH output analysis: MASH 0%, MATH 75%, -ATH 25%
UPDATE: My probably inelegant solution:
inputMSA = AlignIO.read("input.fst", "fasta") tempMSA = inputMSA[:,0:0] for x in positions: outputMSA = tempMSA[:,:] + inputMSA[:, (x-1):x] tempMSA = outputMSA print(outputMSA)
Still working on the second part.