Question: Trying to get genome for bedtools
1
gravatar for radwa.raed
3.2 years ago by
radwa.raed10
radwa.raed10 wrote:

Hi,

I want to add 2000 bp on either side of "Alu elements", a file in BED format

For that purpose, I need to download the human genome (hg19) to use for my bedtools slop command so it can add these basepairs accordingly. I am not sure in which format the genome had to be so I downloaded it from UCSC by going to their directory: goldenPath/hg19/chromosomes and then typing: mget -a

I unzipped them (all were now .fa files) and then wanted to combine all the chromosomes together in one file: cat *.fa > hg19.fasta

but when I run

bedtools slop -i Alu_elements -g hg19.fasta -b 2000

I get the following error message: Less than the req'd two fields were encountered in the genome file (hg19.fasta) at line 1. Exiting.

  1. I am not sure where the problem is: Is it in the genome and how I unzipped / combined it..? Before combining, there were separate files for each chromosome: chr1.fa, chr1.gl000194_random.fa, etc...

  2. Does the bedtools command need the genome to be in a BED format as well? If yes, how do I download the genome in this format? I tried to find it on the UCSC page table browser, but there are so many options under "Tables" and "Tracks" and I don't know which to choose to download the whole genome, not just specific elements within it.

Thanks!

genome • 5.7k views
ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by radwa.raed10

If I am not wrong genome file is usually not a fasta but a file with format <chrname> <size> . The size file should be available here https://genome.ucsc.edu/goldenpath/help/hg19.chrom.sizes

ADD REPLYlink written 3.2 years ago by microfuge1.4k
3
gravatar for igor
3.2 years ago by
igor8.7k
United States
igor8.7k wrote:

As others already pointed out, bedtools genome file is also known as chrom.sizes file. If you can't download it, you can generate it yourself from an indexed FASTA file:

samtools faidx genome.fa
cut -f 1,2 genome.fa.fai > chrom.sizes
ADD COMMENTlink written 3.2 years ago by igor8.7k
0
gravatar for harold.smith.tarheel
3.2 years ago by
United States
harold.smith.tarheel4.4k wrote:

When in doubt, read the manual:

1.3.10 What is a “genome” file?

Some of the BEDTools (e.g., genomeCoverageBed, complementBed, slopBed) need to know the size of the chromosomes for the organism for which your BED files are based. When using the UCSC Genome Browser, Ensemble, or Galaxy, you typically indicate which species / genome build you are working. The way you do this for BEDTools is to create a “genome” file, which simply lists the names of the chromosomes (or scaffolds, etc.) and their size (in base pairs).

Genome files must be tab-delimited and are structured as follows (this is an example for C. elegans):

chrI 15072421

chrII 15279323

chrX 17718854

chrM 13794

BEDTools includes predefined genome files for human and mouse in the /genomes directory included in the BEDTools distribution.

ADD COMMENTlink written 3.2 years ago by harold.smith.tarheel4.4k
0
gravatar for radwa.raed
3.2 years ago by
radwa.raed10
radwa.raed10 wrote:

Thank you so much. This makes a lot of sense. I don't need the sequence itself, just the length of each chromosome.

I implemented what you said but am still stuck:

I tried either to go to the /genomes directory of Bedtools and copy the hg19.genome into the same directory where my Alu_elements file is but am getting an error that the file could not be opened

bedtools slop -i Alu_elements.bed -g hg19.genome -b 2000

Error: The requested file (Alu_elements.bed) could not be opened. Error message: (No such file or directory). Exiting!

OR

by downloading a BED file from the link posted above.

bedtools slop -i Alu_elements.bed -g hg19.chrom.sizes.BED -b 2000

Error: The requested genome file (hg19.chrom.sizes.BED) could not be opened. Exiting!

Samples from hg19.chrom.sizes.BED chr1 249250621 chr2 243199373 chr3 198022430

Samples from Alu_elements.BED chr1 16777160 16777470 AluSp 2147 + chr1 25165800 25166089 AluY 2626 -

Could the 'extra' columns in Alu_elements.BED be throwing it off? I am unsure..

Many thanks!

ADD COMMENTlink written 3.2 years ago by radwa.raed10

Have you tried to explicitly do this (provided both files are in the directory you are running this from)?

bedtools slop -i ./Alu_elements.bed -g ./hg19.genome -b 2000
ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by genomax73k

Yes but when I try to run it, I receive this error msg

Error: The requested genome file (./hg19.chrom.sizes.bed) could not be opened. Exiting!

but outside of the command line, I can open the file itself and see the entries

ADD REPLYlink written 3.2 years ago by radwa.raed10

I am in the correct directory..

https://postimg.org/image/ph35bjigr/

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by radwa.raed10

From that screenshot, I see there is hg19.chrom.sizes.BED.txt, but not hg19.chrom.sizes.bed, which is what you specify.

Also, there is Alu_elements.BED.txt and Alu_elements, but not Alu_elements.BED, which is what you specify.

Run ls and then copy and paste the proper file names into your command. Don't try to type them manually.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by igor8.7k

I see, please correct me if I am wrong, but I thought a BED file is a tab-delimited one. And I read that to create a BED file you need to save as tab-delimited and then add in .BED at the end. Did I misunderstand?

ADD REPLYlink written 3.2 years ago by radwa.raed10

Yes, but you need to specify the filename exactly as it is (what you see when you run ls). The filename you gave the file and the filename that you give to bedtools are not the same.

ADD REPLYlink written 3.2 years ago by igor8.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1721 users visited in the last hour