How To Use Bedtools To Extract Promoters From A Mouse Bed File
1
7
Entering edit mode
10.4 years ago
Anima Mundi ★ 2.9k

Hello, I would like to know how to use Bedtools to extract promoter sequences (as FASTAs) from the mouse genome (mm9) starting from a BED file.

bedtools bed extraction promoter fasta • 28k views
23
Entering edit mode
10.4 years ago

As an example, let's say you define your promoter as the 2kb upstream of your gene and the you have a bed file with the chrom, txStart, and txEnd, name, num_exons, and strand for each gene you are interested in. Something like the following:

head -n4 genes.bed
chr1    134212701    134230065    Nuak2    8    +
chr1    134212701    134230065    Nuak2    7    +
chr1    33510655    33726603    Prim2,    14    -
chr1    25124320    25886552    Bai3,    31    -

bedtools flank -i genes.bed -g mm9.chromsizes -l 2000 -r 0 -s > genes.2kb.promoters.bed


This will give you the upstream regions based on strand as follows:

chr1    134210701    134212701    Nuak2    8    +
chr1    134210701    134212701    Nuak2    7    +
chr1    33726603    33728603    Prim2,    14    -
chr1    25886552    25888552    Bai3,    31    -



You can now use this BED file to extract the sequence (based on strand) from the mm9 genome.

bedtools getfasta -fi mm9.fa -bed genes.2kb.promoters.bed -fo genes.2kb.promoters.bed.fa
`

NOTE: The "mm9.chromsizes" file is a tab delimited file where each line has a chrom name and a chrom length. See the bedtools manual for examples. mm9.fa is meant to represent the name of the mouse reference genome in fasta format.

0
Entering edit mode

did you miss a 0 after -r in flank?

0
Entering edit mode

@brentp - yep, thank you sir.

0
Entering edit mode

Thanks for the solution but I do not understand what kind of error is in the code, sorry. Could you please provide the fixed command for the given example?

0
Entering edit mode

Does the edit above help?

0
Entering edit mode

One issue solved but still says: "Less than the req'd two fields were encountered in the genome file". I should work on the input files, but with this hint I think I will be able to solve the problem.