Off topic:How to count a sequence and cut in the other sequence?
1
0
Entering edit mode
3.6 years ago
USER • 0

I have a reference sequence (CDS) and an aligned sequence in the same file. Format fasta.aln or aln.The alignment was done with MAFFT.

Input:

RefSeq - - - - - - AAGCTGC

Seq1 AAAAAAGGGGGG

Output I would like:

Seq1 GGGGGG

It would be a removal from sequence 1 according to the symbol "-" of RefSeq. I would like to extract only the CDS after the alignment. Is there any way to do this from the command line or some programming language? I tried to do it with biopython but was not successful!

f = open('Denv4-X-gb_AY947539.txt', 'r')
con = f.readlines()
con = [i.strip() for i in con]
length = len(con[0].split("-")[0])
result = f'{con[0].split("-")[0]} {con[0].split("-")[0][length:]}'
print(result)
f.close()
f = open('Denv4cds.txt', 'a') f.write(f'\n{result}')

Do you have a script or module or library that can do this?

I'm using windows wsl.

I'm a beginner in bioinformatics

the generated file just printed the first line ...

alignment sequence gene software error genome • 483 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 1897 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6