Question

How to use Bed file to extract sequence from FASTA file?

2

Entering edit mode

8.9 years ago

allyson1115ar ▴ 30

I tried bedtools getfasta and I get the errors that chromosome was not found in fasta file but I have triple checked it there is no blank space the chromosome name in bed file is exactly the same as in fasta file. I would like to know is there any alternatives other than using bedtools getfasta in order to extract the sequence.

ChIP-Seq fasta bed • 14k views

ADD COMMENT • link updated 15 months ago by Ram 43k • written 8.9 years ago by allyson1115ar ▴ 30

1

Entering edit mode

samtools faidx extracts subsequence from indexed reference sequences (http://samtools.sourceforge.net/samtools.shtml)

Also use search feature to look for similar posts on this forum.

ADD REPLY • link 8.9 years ago by Ashutosh Pandey 12k

0

Entering edit mode

I tried. But can't solve it.

ADD REPLY • link 8.9 years ago by allyson1115ar ▴ 30

0

Entering edit mode

Solved well. Thanks.

ADD REPLY • link 8.9 years ago by allyson1115ar ▴ 30

0

Entering edit mode

There is one problem in this tool. The result is not accurate. The sequence extracted is not the same as in bed file coordinates

ADD REPLY • link 8.9 years ago by allyson1115ar ▴ 30

0

Entering edit mode

It's more likely that you made an error than that samtools faidx did...

ADD REPLY • link 8.9 years ago by Devon Ryan 104k

0

Entering edit mode

This is a part of the output I get from samtools faidx:

>chr1:179757197-179758470:1251-1255
TGAGT
>chr1:201237463-201238874:41-45

>chN
>chr1:201237463-201238874:62-80
238874
TGCCACAGCTGN
>chr1:201237463-201238874:62-81
238874

ADD REPLY • link updated 15 months ago by Ram 43k • written 8.9 years ago by allyson1115ar ▴ 30

0

Entering edit mode

You appear to have used a heavily mistyped command (e.g., chN, also the ranges are non-sensical). Post the exact command that you used and mention where you got the genome.

ADD REPLY • link 8.9 years ago by Devon Ryan 104k

Ram · Answer 1 · 2015-05-28

1

Entering edit mode

8.9 years ago

Matt Shirley 10k

You might try the faidx command from https://github.com/mdshw5/pyfaidx. There is a --default-seq parameter that allows filling in nonexistent sequence, as well as a --lazy parameter that disables bounds checking. You can pass a bed file using --bed.

ADD COMMENT • link updated 15 months ago by Ram 43k • written 8.9 years ago by Matt Shirley 10k

Ram · Answer 2 · 2015-05-29

Use a command like this:

twoBitToFa http://hgdownload.cse.ucsc.edu/gbdb/hg19/hg19.2bit -bed=input.bed test.fa

or for a single region:

twoBitToFa http://hgdownload.cse.ucsc.edu/gbdb/hg19/hg19.2bit test.fa -seq=chr21 -start=1 -end=10000

Requires the UCSC tool twoBitToFa, available from http://hgdownload.cse.ucsc.edu/admin/exe/

If you're not on the hg19 genome, you have to index your .fa file first with faToTwoBit