Diffbind error : sequences from an unspecified genome; no seqlengths
Entering edit mode
6.2 years ago
erwan.scaon ▴ 940

I am trying to perform differential binding analysis of ChIPSeq peak data with DiffBind.

Input reads were mapped vs GRCm38.p5 (containing alt-loci, patches & scaffolds) and peaks were called with MACS2 (alt-loci are of particular interest here, because mouses used in this study are not BL6 on one particular locus, locus which happen to have an alt-sequence in GRCm38.p5, so potentially lots of reads mapping on this alt-locus & DE binding between KO & WT).

Then MACS2 output is loaded in Diffbind, analysis steps are shown below :
Reading in peaksets (dba)
Counting reads (dba.count)
Differential binding affinity analysis (dba.contrast, dba.analyze)

After DE analysis, I try to report results as a GRanges object (dba.report)

test.DB <- dba.report(test)

GRanges object with 40 ranges and 6 metadata columns:
seqnames ranges strand Conc Conc_MAR Conc_WT
81 chr12 [115603698, 115604198] * 5.88 6.61 -2.02
309 chr6 [103648967, 103649467] * 8.3 -1.12 9.62
seqinfo: 19 sequences from an unspecified genome; no seqlengths

40 differentially bound sites were found, but it seems that I lost an additional 19 differentially bound sites because they occur on sequences with names such as "GL456385.1", "JH584299.1", "KQ030495.1", ...

I did check that my sequences names are the same in BAM & MACS2 output files.

How can I keep DE sites on alt-loci / patches / scaffolds in my results ?

Best regards

Diffbind MACS2 GRanges MACS ChIP-Seq • 2.5k views
Entering edit mode

Underlying question could be : In which package / function do you specify the genome in a DiffBind analysis ? (I am following instructions found here : https://bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf)

Session info mentions "GenomeInfoDb", I'll take a look

Entering edit mode

I'm having the same problem as erwan.scaon. DiffBind ver 2.10.0. I used MACS2 to generate peaks. I get "seqinfo: <<xyz>> sequences from an unspeficied genome; no seqlengths"

I too am using your instructions by example Rory: https://bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf

I used:

macs2 callpeak -t ./H3K27me3/H3K27me-LP-A.sorted.unique.bam -c ./WCEcontrols/WCE-LP-A.sorted.unique.bam -f BAM -g mm -n H3K27me3-LP-A -B -q 0.05 --outdir ./H3K27me3/MACS2output

.. and use "H3K27me3-LP-A_peaks.xls" in Peaks, and PeakCaller: "macs".

thanks for any assistance, cheers, Kieran

Entering edit mode

Is this an error or a warning? Did you try to use a non-Excel file like the narrowPeak file?

Also, H3K27me3 is a broad peak mark, you should probably be using the --broad option in macs.

Entering edit mode
6.2 years ago
Rory Stark ★ 2.0k

Has this been resolved?

You shouldn't need to specify the genome as DiffBind takes the chromosome names directly from the peak files, so if they are consistent it should work. However there was a bug at some point that impacted this in certain cases. What version of DiffBind are you using?

Alternatively you can send the DBA object to me and I can have a look at what may be going on.


Rory (DiffBind maintainer)


Login before adding your answer.

Traffic: 1736 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6