Question

Fasta extraction from bed file

0

Entering edit mode

5.2 years ago

baurumon ▴ 30

hello,

how can i extract fasta in reverse order. i have bed file where start position is greater than the stop. this could come from reverse strand. How can i extract fasta file from reverse order?

NC_037130.1 12295912 12286289

please help me .

Thanks in advance

alignment • 2.7k views

ADD COMMENT • link updated 5.2 years ago by ATpoint 85k • written 5.2 years ago by baurumon ▴ 30

score 3 · Accepted Answer · 2019-08-15

3

Entering edit mode

5.2 years ago

alex.zaccaron ▴ 470

Sounds like you want to extract the sequences in the correct orientation and your 3-column bed file has reversed coordinates if sequence is in the reverse strand. Modifying ATpoint suggestion, you could still use bedtools getfasta to extract the correct orientation with:

awk 'OFS="\t" {if($2>$3) print $1, $3, $2, ".", ".", "-"; else print $0, ".", ".", "+"}'  file.bed | bedtools getfasta -s -fi ref.fasta -bed -

ADD COMMENT • link 5.2 years ago by alex.zaccaron ▴ 470

0

Entering edit mode

thank you very much.

ADD REPLY • link 5.2 years ago by baurumon ▴ 30

0

Entering edit mode

Hello baurumon ,

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.

Upvote|Bookmark|Accept

ADD REPLY • link 5.2 years ago by finswimmer 16k

score 2 · Accepted Answer · 2019-08-15

2

Entering edit mode

5.2 years ago

ATpoint 85k

awk 'OFS="\t" {print $1, $3, $2}' your.file | bedtools getfasta (...)

ADD COMMENT • link 5.2 years ago by ATpoint 85k

0

Entering edit mode

Thanks,

But will it be the same position that i want?

As i understand, this awk will print 12286289 12295912 in this way and then extract fasta. after alignment i found some coordinate in reverse order. then i divided then into another bed file and to extract those position.

ADD REPLY • link 5.2 years ago by baurumon ▴ 30

0

Entering edit mode

From what I understand the convention in genomics is that the genome itself (in a bioinformatical context) is unstranded because all positions always refer to the top strand. If you want something from the minus strand this would be indicated by a - in the strand column and the tool would therefore extract the DNA sequence from the top strand and reverse-complement it because again by convention sequences are always written as 5'->3'. Therefore it is odd you even have a sequence with $2 > $3, this should probably not happen. Where did you get that file from?

ADD REPLY • link 5.2 years ago by ATpoint 85k