Question: Bedtools: problems intersecting datasets after using makewindows
1
gravatar for Verónica Olmos
4.1 years ago by
European Union
Verónica Olmos10 wrote:

Hi everyone!

I'm new here, so I hope I'm posting properly.

I'm using bedtools in order to know where some different epigenetic marks are present across a mouse genome split into different regions.

To do so, I used makewindows function to split the genome into 10k chunks:

bedtools makewindows -g mm9.genome -w 10000 > mm9.windows.bed

Then, I tried the intersect function like this:

bedtools intersect -a mm9.windows.bed -b H3K4m1.bed H3K4m2.broadPeak -loj -names H3K4m1 H3K4m2

but I got nothing!
 

There's a H3K4m1 mark with this information:

chr1    4352396    4353107    .    0    .    6.73    -1    -1

Which I guess it should have an intersection here in the genome (mm9.windows.bed):

chr1     4350000    4360000

However, I get no output!

I don't know if this is relevant, but the files I use have the following formats:

  • mm9.windows.bed --> BED3
  • H3K4m1.bed --> BED6
  • H3K4m2.broadPeak --> broadPeak (BED9)

I would really appreciate some help because this is really getting me crazy!


All the best,

Verónica

bedtools genome software error • 2.0k views
ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Verónica Olmos10
1

it could be a problem with your bed files. bedtools is very sensitive to carriage return characters. if you did something in i.e. excel with those files and then saved them, you likely have some unwanted characters that make the bed file illegible for bedtools.

try from your shell/terminal to do 

head H3K4m1.bed

if you get something like 

$ head GSM1187118_s03_MCF7_ER_rep3_macs_peaks.bed
chr1    7993    8847    MACS_peak_1     964.58
chr1    18728   19835   MACS_peak_2     54.10
chr1    36580   37067   MACS_peak_3     50.05
chr1    41169   41596   MACS_peak_4     72.19
chr1    58399   58921   MACS_peak_5     80.97
chr1    94893   95488   MACS_peak_6     57.28
chr1    125614  127226  MACS_peak_7     1168.78
chr1    144506  144899  MACS_peak_8     72.73
chr1    316644  317465  MACS_peak_9     916.54
chr1    358508  359147  MACS_peak_10    71.39

then you are ok and we need to look into something else.

if you get a single line of overlapping data then do

cat H3K4m1.bed | tr '\r' '\n' > mySecondBedFile.bed

and use  mySecondBedFile.bed with bedtools...that did the trick for me a number of times

 

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by TriS3.7k
0
gravatar for Verónica Olmos
4.1 years ago by
European Union
Verónica Olmos10 wrote:

Thanks for the advice, TriS!

However, I'm afraid it's not working...

I think the problem is related to the genome files... because I tried intersecting the epigenomic data and the intersect function was working.

Just in case anyone can see something I'm missing, here is the input I use for the function makewindows:

chr1     197195432  
chr2     181748087  
chr3     159599783  
chr4     155630120  
chr5     152537259  
chr6     149517037  
chr7     152524553  
chr8     131738871  
chr9     124076172  
chr10     129993255  
chr11     121843856  
chr12     121257530  
chr13     120284312  
chr14     125194864  
chr15     103494974  
chr16     98319150  
chr17     95272651  
chr18     90772031  
chr19     61342430  
chrX     166650296  
chrY     15902555

And this is the output I get: link.

ADD COMMENTlink written 4.1 years ago by Verónica Olmos10

genome looks fine to me, it could be a typo, but in your post you wrote:

bedtools makewindows -g mm9.genome -w 10000 > mm9.windows.bed ## << note the mm9.genome file name

the name of the genome file is mouse.mm9.genome, not mm9.genome, could you please check that?

using the correct commands:

bedtools makewindows -g mouse.mm9.genome -w 10000 > mm9.windows.bed

$ head mm9.windows.bed
chr1    0       10000
chr1    10000   20000
chr1    20000   30000
chr1    30000   40000
chr1    40000   50000
chr1    50000   60000
chr1    60000   70000
chr1    70000   80000
chr1    80000   90000
chr1    90000   100000

so theoretically it works. can you check using

head mm9.windows.bed

how your file looks?

also, unless it's an answer use the comments to reply :)

ADD REPLYlink written 4.1 years ago by TriS3.7k

Hi again!

I was checking again my data (sorry about the post, it was a typo indeed) thinking again about your suggestion, TriS. It seems I had some hidden characters or something related to what you said in the original file with the chromosomes and their sizes.

I created a new (cleaner) file and now it's working. Thanks a lot for your advice!
 

ADD REPLYlink written 4.1 years ago by Verónica Olmos10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 671 users visited in the last hour