Entering edit mode
3.6 years ago
USER
•
0
I have a reference sequence (CDS) and an aligned sequence in the same file. Format fasta.aln or aln.The alignment was done with MAFFT.
Input:
RefSeq - - - - - - AAGCTGC
Seq1 AAAAAAGGGGGG
Output I would like:
Seq1 GGGGGG
It would be a removal from sequence 1 according to the symbol "-" of RefSeq. I would like to extract only the CDS after the alignment. Is there any way to do this from the command line or some programming language? I tried to do it with biopython but was not successful!
f = open('Denv4-X-gb_AY947539.txt', 'r')
con = f.readlines()
con = [i.strip() for i in con]
length = len(con[0].split("-")[0])
result = f'{con[0].split("-")[0]} {con[0].split("-")[0][length:]}'
print(result)
f.close()
f = open('Denv4cds.txt', 'a') f.write(f'\n{result}')
Do you have a script or module or library that can do this?
I'm using windows wsl.
I'm a beginner in bioinformatics
the generated file just printed the first line ...