Extract fasta sequence from multifasta file using id and position in range
0
0
Entering edit mode
4.0 years ago
MG_19 • 0

Hi,

I have a multifasta file from which I want to extract sequence based on id and position of sequence in specific range. Please suggest some tool or program that can be useful. My seq file is:

>seq1
ATGGAGAGCCTTGTCCCTGGTTTCAACGAGAAAACACACGTCC

>seq2
AACTCAGTTTGCCTGTTTTACAGGTTCGCGACGTGCTCGTACGTGGCT

>seq3
TTGGAGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACATCTTAAAGATGGCACTTGTGGCTTAG

My id file is:

seq1    5   15
seq2    2     10
seq3    10  20
R sequence gene • 1.6k views
ADD COMMENT
1
Entering edit mode

Use seqtk (LINK).

seqtk subseq in.fa reg.bed > out.fa
ADD REPLY
0
Entering edit mode

You can use the following code:

cat id_file.txt | while read id start stop; do echo ">"$id >> output_file.fasta ;  perl -ne 'if(/^>(\S+)/){$c=grep{/^$1$/}qw('$id')}print if $c' seq-file.fasta | tail -n +2 | cut -c $start-$stop >> output_file.fasta ; done

To use this code you first need to delete empty lines from your sequence file:

perl -p -i -e "s/^\n$//g" seq-file.fasta

if you don't want to modify the original file, try:

perl -p -e "s/^\n$//g" seq-file.fasta > new_seq-file.fasta

If each of your sequences is one line, you can use this code too:

cat id_file.txt | while read id start stop; do echo ">"$id >> output_file.fasta ;  grep -A 1 ">$id" seq-file.fasta  | cut -c $start-$stop >> output_file.fasta ; done

If you want to add start and stop to the header, you can use this echo instead of echo ">"$id:

echo ">"$id"_"$start"_"$stop

The output:

>seq1_5_15
AGAGCCTTGTC
>seq2_2_10
ACTCAGTTT
>seq3_10_20
CCGTGGAGGAG
ADD REPLY
0
Entering edit mode

; done was missing from two of my commands. I added them.

ADD REPLY
0
Entering edit mode

You have tagged R, but are asking for any tools. Can you clarify your requirements?

ADD REPLY

Login before adding your answer.

Traffic: 2328 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6