Hello, I would like to know how to use Bedtools to extract promoter sequences (as FASTAs) from the mouse genome (mm9) starting from a BED file.
As an example, let's say you define your promoter as the 2kb upstream of your gene and the you have a bed file with the chrom, txStart, and txEnd, name, num_exons, and strand for each gene you are interested in. Something like the following:
head -n4 genes.bed
chr1 134212701 134230065 Nuak2 8 +
chr1 134212701 134230065 Nuak2 7 +
chr1 33510655 33726603 Prim2, 14 -
chr1 25124320 25886552 Bai3, 31 -
bedtools flank -i genes.bed -g mm9.chromsizes -l 2000 -r 0 -s > genes.2kb.promoters.bed
This will give you the upstream regions based on strand as follows:
chr1 134210701 134212701 Nuak2 8 +
chr1 134210701 134212701 Nuak2 7 +
chr1 33726603 33728603 Prim2, 14 -
chr1 25886552 25888552 Bai3, 31 -
You can now use this BED file to extract the sequence (based on strand) from the mm9 genome.
bedtools getfasta -fi mm9.fa -bed genes.2kb.promoters.bed -fo genes.2kb.promoters.bed.fa
NOTE: The "mm9.chromsizes" file is a tab delimited file where each line has a chrom name and a chrom length. See the bedtools manual for examples. mm9.fa is meant to represent the name of the mouse reference genome in fasta format.
Thanks for the solution but I do not understand what kind of error is in the code, sorry. Could you please provide the fixed command for the given example?