Question: Extract fasta sequence from multifasta file using id and position in range
0
gravatar for Bioinfo_MG
9 weeks ago by
Bioinfo_MG0
Bioinfo_MG0 wrote:

Hi,

I have a multifasta file from which I want to extract sequence based on id and position of sequence in specific range. Please suggest some tool or program that can be useful. My seq file is:

>seq1
ATGGAGAGCCTTGTCCCTGGTTTCAACGAGAAAACACACGTCC

>seq2
AACTCAGTTTGCCTGTTTTACAGGTTCGCGACGTGCTCGTACGTGGCT

>seq3
TTGGAGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACATCTTAAAGATGGCACTTGTGGCTTAG

My id file is:

seq1    5   15
seq2    2     10
seq3    10  20
sequence R gene • 155 views
ADD COMMENTlink modified 9 weeks ago by genomax85k • written 9 weeks ago by Bioinfo_MG0
1

Use seqtk (LINK).

seqtk subseq in.fa reg.bed > out.fa
ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by genomax85k

You can use the following code:

cat id_file.txt | while read id start stop; do echo ">"$id >> output_file.fasta ;  perl -ne 'if(/^>(\S+)/){$c=grep{/^$1$/}qw('$id')}print if $c' seq-file.fasta | tail -n +2 | cut -c $start-$stop >> output_file.fasta ; done

To use this code you first need to delete empty lines from your sequence file:

perl -p -i -e "s/^\n$//g" seq-file.fasta

if you don't want to modify the original file, try:

perl -p -e "s/^\n$//g" seq-file.fasta > new_seq-file.fasta

If each of your sequences is one line, you can use this code too:

cat id_file.txt | while read id start stop; do echo ">"$id >> output_file.fasta ;  grep -A 1 ">$id" seq-file.fasta  | cut -c $start-$stop >> output_file.fasta ; done

If you want to add start and stop to the header, you can use this echo instead of echo ">"$id:

echo ">"$id"_"$start"_"$stop

The output:

>seq1_5_15
AGAGCCTTGTC
>seq2_2_10
ACTCAGTTT
>seq3_10_20
CCGTGGAGGAG
ADD REPLYlink modified 8 weeks ago • written 9 weeks ago by Fatima590

; done was missing from two of my commands. I added them.

ADD REPLYlink written 8 weeks ago by Fatima590

You have tagged R, but are asking for any tools. Can you clarify your requirements?

ADD REPLYlink written 9 weeks ago by Joe17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 821 users visited in the last hour