Question: Fasta extraction from bed file
0
gravatar for baurumon
3 months ago by
baurumon10
Norway
baurumon10 wrote:

hello,

how can i extract fasta in reverse order. i have bed file where start position is greater than the stop. this could come from reverse strand. How can i extract fasta file from reverse order?

NC_037130.1 12295912 12286289

please help me .

Thanks in advance

alignment • 238 views
ADD COMMENTlink modified 3 months ago by ATpoint26k • written 3 months ago by baurumon10
3
gravatar for alex.zaccaron
3 months ago by
alex.zaccaron120
alex.zaccaron120 wrote:

Sounds like you want to extract the sequences in the correct orientation and your 3-column bed file has reversed coordinates if sequence is in the reverse strand. Modifying ATpoint suggestion, you could still use bedtools getfasta to extract the correct orientation with:

awk 'OFS="\t" {if($2>$3) print $1, $3, $2, ".", ".", "-"; else print $0, ".", ".", "+"}'  file.bed | bedtools getfasta -s -fi ref.fasta -bed -
ADD COMMENTlink modified 3 months ago • written 3 months ago by alex.zaccaron120

thank you very much.

ADD REPLYlink written 3 months ago by baurumon10

Hello baurumon ,

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.

Upvote|Bookmark|Accept

ADD REPLYlink written 3 months ago by finswimmer13k
2
gravatar for ATpoint
3 months ago by
ATpoint26k
Germany
ATpoint26k wrote:
awk 'OFS="\t" {print $1, $3, $2}' your.file | bedtools getfasta (...)
ADD COMMENTlink written 3 months ago by ATpoint26k

Thanks,

But will it be the same position that i want?

As i understand, this awk will print 12286289 12295912 in this way and then extract fasta. after alignment i found some coordinate in reverse order. then i divided then into another bed file and to extract those position.

ADD REPLYlink modified 3 months ago • written 3 months ago by baurumon10

From what I understand the convention in genomics is that the genome itself (in a bioinformatical context) is unstranded because all positions always refer to the top strand. If you want something from the minus strand this would be indicated by a - in the strand column and the tool would therefore extract the DNA sequence from the top strand and reverse-complement it because again by convention sequences are always written as 5'->3'. Therefore it is odd you even have a sequence with $2 > $3, this should probably not happen. Where did you get that file from?

ADD REPLYlink modified 3 months ago • written 3 months ago by ATpoint26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 966 users visited in the last hour