Question: How To Use Bedtools To Extract Promoters From A Mouse Bed File
6
gravatar for Anima Mundi
5.3 years ago by
Anima Mundi2.1k
Italy
Anima Mundi2.1k wrote:

Hello, I would like to know how to use Bedtools to extract promoter sequences (as FASTAs) from the mouse genome (mm9) starting from a BED file.

ADD COMMENTlink written 5.3 years ago by Anima Mundi2.1k
15
gravatar for Aaronquinlan
5.3 years ago by
Aaronquinlan9.7k
United States
Aaronquinlan9.7k wrote:

As an example, let's say you define your promoter as the 2kb upstream of your gene and the you have a bed file with the chrom, txStart, and txEnd, name, num_exons, and strand for each gene you are interested in. Something like the following:

head -n4 genes.bed
chr1    134212701    134230065    Nuak2    8    +
chr1    134212701    134230065    Nuak2    7    +
chr1    33510655    33726603    Prim2,    14    -
chr1    25124320    25886552    Bai3,    31    -

bedtools flank -i genes.bed -g mm9.chromsizes -l 2000 -r 0 -s > genes.2kb.promoters.bed

This will give you the upstream regions based on strand as follows:

chr1    134210701    134212701    Nuak2    8    +
chr1    134210701    134212701    Nuak2    7    +
chr1    33726603    33728603    Prim2,    14    -
chr1    25886552    25888552    Bai3,    31    -

You can now use this BED file to extract the sequence (based on strand) from the mm9 genome.

bedtools getfasta -fi mm9.fa -bed genes.2kb.promoters.bed -fo genes.2kb.promoters.bed.fa

NOTE: The "mm9.chromsizes" file is a tab delimited file where each line has a chrom name and a chrom length. See the bedtools manual for examples. mm9.fa is meant to represent the name of the mouse reference genome in fasta format.

ADD COMMENTlink modified 5.3 years ago • written 5.3 years ago by Aaronquinlan9.7k

did you miss a 0 after -r in flank?

ADD REPLYlink written 5.3 years ago by brentp22k

@brentp - yep, thank you sir.

ADD REPLYlink written 5.3 years ago by Aaronquinlan9.7k

Thanks for the solution but I do not understand what kind of error is in the code, sorry. Could you please provide the fixed command for the given example?

ADD REPLYlink written 5.3 years ago by Anima Mundi2.1k

Does the edit above help?

ADD REPLYlink written 5.3 years ago by Aaronquinlan9.7k

One issue solved but still says: "Less than the req'd two fields were encountered in the genome file". I should work on the input files, but with this hint I think I will be able to solve the problem.

ADD REPLYlink written 5.3 years ago by Anima Mundi2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 447 users visited in the last hour