How to count a sequence and cut in the other sequence?

Off topic:How to count a sequence and cut in the other sequence?

0

Entering edit mode

3.6 years ago

USER • 0

I have a reference sequence (CDS) and an aligned sequence in the same file. Format fasta.aln or aln.The alignment was done with MAFFT.

Input:

RefSeq - - - - - - AAGCTGC

Seq1 AAAAAAGGGGGG

Output I would like:

Seq1 GGGGGG

It would be a removal from sequence 1 according to the symbol "-" of RefSeq. I would like to extract only the CDS after the alignment. Is there any way to do this from the command line or some programming language? I tried to do it with biopython but was not successful!

f = open('Denv4-X-gb_AY947539.txt', 'r')
con = f.readlines()
con = [i.strip() for i in con]
length = len(con[0].split("-")[0])
result = f'{con[0].split("-")[0]} {con[0].split("-")[0][length:]}'
print(result)
f.close()
f = open('Denv4cds.txt', 'a') f.write(f'\n{result}')

Do you have a script or module or library that can do this?

I'm using windows wsl.

I'm a beginner in bioinformatics

the generated file just printed the first line ...

alignment sequence gene software error genome • 483 views

ADD COMMENT • link 3.6 years ago by USER • 0

This thread is not open. No new answers may be added