Question: Bedtools intersect returns 0, intersections present on IGV
0
gravatar for kerrypop
5 months ago by
kerrypop0
kerrypop0 wrote:

Hello,

I am trying to use bedtools to find intersections between H3K4me1 peaks and my regions of interest. Though I can see overlap on IGV, intersect returns 0 without any error messages.

Here is command I'm using and screenshot from IGV:

bedtools intersect -a gene.ROI -b wgEncodeBroadHistoneHuvecH3k4me1StdPk.broadPeak -u | wc -l

IGV screenshot: https://imgur.com/a/JYUzp

Is there something I'm missing?

Thank you, Kerry

chip-seq bedtools • 417 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by kerrypop0
1

do file use the same chromosome notation ? are the files sorted ? are both files BED files ?

ADD REPLYlink written 5 months ago by Pierre Lindenbaum107k

Yes, same chromosome notation. Files are not sorted - I've done this analysis before and didn't sort them. Do you think sorting would overcome this? The ROI file is just a document, created using nano in terminal. The other is a broad peak file.

ADD REPLYlink written 5 months ago by kerrypop0

The ROI file is just a document, created using nano in terminal.

Check columns are tab separated and there are no funny, non-printable characters. If you do something like:

cat -vet gene.ROI | head

You should see columns separated by ^I, e.g.:

chr1^I67208778^I67216822^INM_032291_utr3_24_0_chr1_67208779_f^I0^I+$
chr1^I67208778^I67216822^INM_001308203_utr3_21_0_chr1_67208779_f^I0^I+$
chr1^I33585783^I33586132^INM_001301825_utr3_8_0_chr1_33585784_f^I0^I+$
chr1^I8404073^I8404227^INM_001080397_utr3_8_0_chr1_8404074_f^I0^I+$
ADD REPLYlink written 5 months ago by dariober9.1k

I only used chromosome coordinates but don't see any extraneous characters

cat -vet gene.ROI | head
Chr6^I11711888^I11781280$
Chr6^I35702809^I35718690$
Chr1^I162349520^I162358608$
ADD REPLYlink modified 5 months ago by Pierre Lindenbaum107k • written 5 months ago by kerrypop0

It looks like the dollar sign at the end of each line might be causing trouble. You can use this command to remove the last character from each line:

sed 's/.$//' gene.ROI > geneROI.nolast
ADD REPLYlink written 5 months ago by Hussain Ather860
1

That is because of options (-vet) used for cat command.

-e     equivalent to -vE
-E, --show-ends
              display $ at end of each line
-t     equivalent to -vT

-T, --show-tabs
          display TAB characters as ^I
-v, --show-nonprinting
              use ^ and M- notation, except for LFD and TAB
ADD REPLYlink modified 5 months ago • written 5 months ago by genomax48k
1

Your IGV screenshot seems to show the entire genome. It may be that once you zoom in into one of the ROI genes there is actually no intersection.

ADD REPLYlink written 5 months ago by dariober9.1k

Here's zoomed in screenshot of the first ROI. There should be 4 intersections. https://imgur.com/a/lsiEQ

ADD REPLYlink written 5 months ago by kerrypop0
2
gravatar for harold.smith.tarheel
5 months ago by
United States
harold.smith.tarheel4.1k wrote:

It looks like your chromosome notation is NOT identical. Your IGV screenshot says 'chr6' but your gene.ROI says 'Chr6'. Linux is case-sensitive.

ADD COMMENTlink written 5 months ago by harold.smith.tarheel4.1k

That did it - thank you!

ADD REPLYlink written 5 months ago by kerrypop0
0
gravatar for dariober
5 months ago by
dariober9.1k
Glasgow - UK
dariober9.1k wrote:

I see you have Chr6, Chr1 instead of chr6, chr1 (lowercase c). That may be the problem.

ADD COMMENTlink written 5 months ago by dariober9.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 931 users visited in the last hour