How To Use Bedtools To Extract Promoters From A Mouse Bed File
1
7
Entering edit mode
9.2 years ago
Anima Mundi ★ 2.8k

Hello, I would like to know how to use Bedtools to extract promoter sequences (as FASTAs) from the mouse genome (mm9) starting from a BED file.

bedtools bed extraction promoter fasta • 24k views
ADD COMMENT
20
Entering edit mode
9.2 years ago

As an example, let's say you define your promoter as the 2kb upstream of your gene and the you have a bed file with the chrom, txStart, and txEnd, name, num_exons, and strand for each gene you are interested in. Something like the following:

head -n4 genes.bed
chr1    134212701    134230065    Nuak2    8    +
chr1    134212701    134230065    Nuak2    7    +
chr1    33510655    33726603    Prim2,    14    -
chr1    25124320    25886552    Bai3,    31    -

bedtools flank -i genes.bed -g mm9.chromsizes -l 2000 -r 0 -s > genes.2kb.promoters.bed

This will give you the upstream regions based on strand as follows:

chr1    134210701    134212701    Nuak2    8    +
chr1    134210701    134212701    Nuak2    7    +
chr1    33726603    33728603    Prim2,    14    -
chr1    25886552    25888552    Bai3,    31    -

You can now use this BED file to extract the sequence (based on strand) from the mm9 genome.

bedtools getfasta -fi mm9.fa -bed genes.2kb.promoters.bed -fo genes.2kb.promoters.bed.fa

NOTE: The "mm9.chromsizes" file is a tab delimited file where each line has a chrom name and a chrom length. See the bedtools manual for examples. mm9.fa is meant to represent the name of the mouse reference genome in fasta format.

ADD COMMENT
0
Entering edit mode

did you miss a 0 after -r in flank?

ADD REPLY
0
Entering edit mode

@brentp - yep, thank you sir.

ADD REPLY
0
Entering edit mode

Thanks for the solution but I do not understand what kind of error is in the code, sorry. Could you please provide the fixed command for the given example?

ADD REPLY
0
Entering edit mode

Does the edit above help?

ADD REPLY
0
Entering edit mode

One issue solved but still says: "Less than the req'd two fields were encountered in the genome file". I should work on the input files, but with this hint I think I will be able to solve the problem.

ADD REPLY

Login before adding your answer.

Traffic: 1682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6