How to use Bed file to extract sequence from FASTA file?
2
2
Entering edit mode
5.9 years ago

I tried bedtools getfasta and i get the errors that chromosome was not found in fasta file but I have triple checked it there is no blank space the chromosome name in bed file is exactly the same as in fasta file. I would like to know is there any alternatives other than using bedtools getfasta in order to extract the sequence. 

ChIP-Seq software error sequence fasta bed • 8.3k views
ADD COMMENT
1
Entering edit mode

samtools faidx extracts subsequence from indexed reference sequences (http://samtools.sourceforge.net/samtools.shtml)

Also use search feature to look for similar posts on this forum.

ADD REPLY
0
Entering edit mode

I tried. But can't solve it.

ADD REPLY
0
Entering edit mode

Solved well. Thanks.

ADD REPLY
0
Entering edit mode

There is one problem in this tool. The result is not accurate. The sequence extracted is not the same as in bed file coordinates

ADD REPLY
0
Entering edit mode

It's more likely that you made an error than that samtools faidx did...

ADD REPLY
0
Entering edit mode

This is a part of the output i get from samtools faidx:

>chr1:179757197-179758470:1251-1255
TGAGT
>chr1:201237463-201238874:41-45

>chN
>chr1:201237463-201238874:62-80
238874
TGCCACAGCTGN
>chr1:201237463-201238874:62-81
238874

ADD REPLY
0
Entering edit mode

You appear to have used a heavily mistyped command (e.g., chN, also the ranges are non-sensical). Post the exact command that you used and mention where you got the genome.

ADD REPLY
1
Entering edit mode
5.9 years ago

You might try the "faidx" command from https://github.com/mdshw5/pyfaidx. There is a --default-seq parameter that allows filling in nonexistent sequence, as well as a --lazy parameter that disables bounds checking. You can pass a bed file using --bed.

ADD COMMENT
1
Entering edit mode
5.9 years ago

Use a command like this:

twoBitToFa http://hgdownload.cse.ucsc.edu/gbdb/hg19/hg19.2bit -bed=input.bed test.fa

or for a single region: 

twoBitToFa http://hgdownload.cse.ucsc.edu/gbdb/hg19/hg19.2bit test.fa -seq=chr21 -start=1 -end=10000

Requires the UCSC tool twoBitToFa, available from http://hgdownload.cse.ucsc.edu/admin/exe/

If you're not on the hg19 genome, you have to index your .fa file first with faToTwoBit

ADD COMMENT

Login before adding your answer.

Traffic: 1394 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6