Question: edit FASTA file according to range positions
0
gravatar for Begonia_pavonina
19 days ago by
Begonia_pavonina20 wrote:

Hello everyone,

I am trying to replace the sequences from file_1.fasta with specific sequences from file_2.fasta. The files are set up this way:

head -2 file_1.fasta 
>Scaffolds_1
TAAATCAAATTGGACAGTCAATGCTATTATGTCTCAATCTACAACACATAATACACAATT
CCAAACCACCTTCTTGGAATATCCACATTTTCTTTATTGGAAGAGAAATTAGTTTAACAA
TGACCACTCTTTTTCTCACTAATGTTACAACCACCTAGAAACTGAATTTCAAGCCTATAC

head -6 file_2.fasta 
>Scaffolds_1:327519-327900
AAATCAAATTGGACAGTCAATGCTATTATGTCTCAATCTACAACACATAAT
>Scaffolds_1:344277-344478
ACCTTTGAAACTTTGACTCTAACTCAGCTTGAATATTGGAAGTTAGGGGT
>Scaffolds_1:345134-345287
CCTTCTCTGCTAGAACACGTAGGGCCACTTCAGTAGATTCGCCAATCTTT

I have tried to set up a script to replace the sequences at the right coordinates, but got a bit confused with the sed script.

Would anybody know if there is simple tool (Samtools, Biopython) designed to make this kind of replacement?

samtools alignment bed fasta • 100 views
ADD COMMENTlink modified 18 days ago • written 19 days ago by Begonia_pavonina20
1

Hi,

Check the SeqKit toolkit. I'm not sure if does what you want, but it allows a lot of fasta/q manipulations. So, it might do it.

António

ADD REPLYlink written 19 days ago by antonioggsousa1.3k
1

Not supported.

But it easy with python, for example,

In [11]: S="actgACTG"

In [12]: s="Ga"

In [13]: begin, end = 4, 5

In [14]: S[0:begin-1] + s + S[end:]
Out[14]: 'actGaCTG'

ADD REPLYlink modified 19 days ago • written 19 days ago by shenwei3565.3k

Thanks a lot for your answers.

the SeqKit toolkit is very handy to convert the fasta files to tabulated files (with seqkit fx2tab).

Your script is nice Pierre, but as I have several sequences to edit in each Scaffold, I cannot use it as it is written.

Thanks for your script shenwei356, I will have to get used to Python to use it and format the files exactly as I want.

ADD REPLYlink written 18 days ago by Begonia_pavonina20

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question

ADD REPLYlink written 18 days ago by genomax89k

My apologizes, I will

ADD REPLYlink written 18 days ago by Begonia_pavonina20
1
gravatar for Pierre Lindenbaum
19 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum130k wrote:

index file_1.fasta with samtools faidx

samtools faidx file_1.fasta  Scaffolds_1:1-327519 > out.fa
echo "AAATCAAATTGGACAGTCAATGCTATTATGTCTCAATCTACAACACATAAT" >> out.fa
samtools faidx file_1.fasta  Scaffolds_1:327900-344276  | grep -v '^>' >> out.fa
echo "ACCTTTGAAACTTTGACTCTAACTCAGCTTGAATATTGGAAGTTAGGGGT" >> out.fa

etc..
etc..
ADD COMMENTlink written 19 days ago by Pierre Lindenbaum130k

As I quote above, it is a very nice script for a single range replacement.

If there is several replacement per Scaffold, then I think another tool should be used.

Would anyone know what could be used in this specific case?

ADD REPLYlink written 18 days ago by Begonia_pavonina20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1671 users visited in the last hour