Question: Fasta extraction from bed file
0
gravatar for baurumon
12 months ago by
baurumon10
Norway
baurumon10 wrote:

hello,

how can i extract fasta in reverse order. i have bed file where start position is greater than the stop. this could come from reverse strand. How can i extract fasta file from reverse order?

NC_037130.1 12295912 12286289

please help me .

Thanks in advance

alignment • 517 views
ADD COMMENTlink modified 12 months ago by ATpoint36k • written 12 months ago by baurumon10
3
gravatar for alex.zaccaron
12 months ago by
alex.zaccaron170
alex.zaccaron170 wrote:

Sounds like you want to extract the sequences in the correct orientation and your 3-column bed file has reversed coordinates if sequence is in the reverse strand. Modifying ATpoint suggestion, you could still use bedtools getfasta to extract the correct orientation with:

awk 'OFS="\t" {if($2>$3) print $1, $3, $2, ".", ".", "-"; else print $0, ".", ".", "+"}'  file.bed | bedtools getfasta -s -fi ref.fasta -bed -
ADD COMMENTlink modified 12 months ago • written 12 months ago by alex.zaccaron170

thank you very much.

ADD REPLYlink written 12 months ago by baurumon10

Hello baurumon ,

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.

Upvote|Bookmark|Accept

ADD REPLYlink written 12 months ago by finswimmer13k
2
gravatar for ATpoint
12 months ago by
ATpoint36k
Germany
ATpoint36k wrote:
awk 'OFS="\t" {print $1, $3, $2}' your.file | bedtools getfasta (...)
ADD COMMENTlink written 12 months ago by ATpoint36k

Thanks,

But will it be the same position that i want?

As i understand, this awk will print 12286289 12295912 in this way and then extract fasta. after alignment i found some coordinate in reverse order. then i divided then into another bed file and to extract those position.

ADD REPLYlink modified 12 months ago • written 12 months ago by baurumon10

From what I understand the convention in genomics is that the genome itself (in a bioinformatical context) is unstranded because all positions always refer to the top strand. If you want something from the minus strand this would be indicated by a - in the strand column and the tool would therefore extract the DNA sequence from the top strand and reverse-complement it because again by convention sequences are always written as 5'->3'. Therefore it is odd you even have a sequence with $2 > $3, this should probably not happen. Where did you get that file from?

ADD REPLYlink modified 12 months ago • written 12 months ago by ATpoint36k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1789 users visited in the last hour