edit FASTA file according to range positions
1
0
Entering edit mode
3.7 years ago

Hello everyone,

I am trying to replace the sequences from file_1.fasta with specific sequences from file_2.fasta. The files are set up this way:

head -2 file_1.fasta 
>Scaffolds_1
TAAATCAAATTGGACAGTCAATGCTATTATGTCTCAATCTACAACACATAATACACAATT
CCAAACCACCTTCTTGGAATATCCACATTTTCTTTATTGGAAGAGAAATTAGTTTAACAA
TGACCACTCTTTTTCTCACTAATGTTACAACCACCTAGAAACTGAATTTCAAGCCTATAC

head -6 file_2.fasta 
>Scaffolds_1:327519-327900
AAATCAAATTGGACAGTCAATGCTATTATGTCTCAATCTACAACACATAAT
>Scaffolds_1:344277-344478
ACCTTTGAAACTTTGACTCTAACTCAGCTTGAATATTGGAAGTTAGGGGT
>Scaffolds_1:345134-345287
CCTTCTCTGCTAGAACACGTAGGGCCACTTCAGTAGATTCGCCAATCTTT

I have tried to set up a script to replace the sequences at the right coordinates, but got a bit confused with the sed script.

Would anybody know if there is simple tool (Samtools, Biopython) designed to make this kind of replacement?

alignment fasta bed samtools • 1.4k views
ADD COMMENT
1
Entering edit mode

Hi,

Check the SeqKit toolkit. I'm not sure if does what you want, but it allows a lot of fasta/q manipulations. So, it might do it.

António

ADD REPLY
1
Entering edit mode

Not supported.

But it easy with python, for example,

In [11]: S="actgACTG"

In [12]: s="Ga"

In [13]: begin, end = 4, 5

In [14]: S[0:begin-1] + s + S[end:]
Out[14]: 'actGaCTG'

ADD REPLY
0
Entering edit mode

Thanks a lot for your answers.

the SeqKit toolkit is very handy to convert the fasta files to tabulated files (with seqkit fx2tab).

Your script is nice Pierre, but as I have several sequences to edit in each Scaffold, I cannot use it as it is written.

Thanks for your script shenwei356, I will have to get used to Python to use it and format the files exactly as I want.

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question

ADD REPLY
0
Entering edit mode

My apologizes, I will

ADD REPLY
1
Entering edit mode
3.7 years ago

index file_1.fasta with samtools faidx

samtools faidx file_1.fasta  Scaffolds_1:1-327519 > out.fa
echo "AAATCAAATTGGACAGTCAATGCTATTATGTCTCAATCTACAACACATAAT" >> out.fa
samtools faidx file_1.fasta  Scaffolds_1:327900-344276  | grep -v '^>' >> out.fa
echo "ACCTTTGAAACTTTGACTCTAACTCAGCTTGAATATTGGAAGTTAGGGGT" >> out.fa

etc..
etc..
ADD COMMENT
0
Entering edit mode

As I quote above, it is a very nice script for a single range replacement.

If there is several replacement per Scaffold, then I think another tool should be used.

Would anyone know what could be used in this specific case?

ADD REPLY

Login before adding your answer.

Traffic: 1727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6