Question: Diffbind error : sequences from an unspecified genome; no seqlengths
0
gravatar for erwan.scaon
20 months ago by
erwan.scaon720
Nantes - France
erwan.scaon720 wrote:

I am trying to perform differential binding analysis of ChIPSeq peak data with DiffBind.

Input reads were mapped vs GRCm38.p5 (containing alt-loci, patches & scaffolds) and peaks were called with MACS2 (alt-loci are of particular interest here, because mouses used in this study are not BL6 on one particular locus, locus which happen to have an alt-sequence in GRCm38.p5, so potentially lots of reads mapping on this alt-locus & DE binding between KO & WT).

Then MACS2 output is loaded in Diffbind, analysis steps are shown below :
Reading in peaksets (dba)
Counting reads (dba.count)
Differential binding affinity analysis (dba.contrast, dba.analyze)

After DE analysis, I try to report results as a GRanges object (dba.report)

test.DB <- dba.report(test)
test.DB

GRanges object with 40 ranges and 6 metadata columns:
seqnames ranges strand Conc Conc_MAR Conc_WT
81 chr12 [115603698, 115604198] * 5.88 6.61 -2.02
309 chr6 [103648967, 103649467] * 8.3 -1.12 9.62
seqinfo: 19 sequences from an unspecified genome; no seqlengths

40 differentially bound sites were found, but it seems that I lost an additional 19 differentially bound sites because they occur on sequences with names such as "GL456385.1", "JH584299.1", "KQ030495.1", ...

I did check that my sequences names are the same in BAM & MACS2 output files.

How can I keep DE sites on alt-loci / patches / scaffolds in my results ?

Best regards

ADD COMMENTlink modified 4 months ago by kieran.short0 • written 20 months ago by erwan.scaon720

Underlying question could be : In which package / function do you specify the genome in a DiffBind analysis ? (I am following instructions found here : https://bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf)

Session info mentions "GenomeInfoDb", I'll take a look

ADD REPLYlink modified 20 months ago • written 20 months ago by erwan.scaon720

I'm having the same problem as erwan.scaon. DiffBind ver 2.10.0. I used MACS2 to generate peaks. I get "seqinfo: <<xyz>> sequences from an unspeficied genome; no seqlengths"

I too am using your instructions by example Rory: https://bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf

I used:

macs2 callpeak -t ./H3K27me3/H3K27me-LP-A.sorted.unique.bam -c ./WCEcontrols/WCE-LP-A.sorted.unique.bam -f BAM -g mm -n H3K27me3-LP-A -B -q 0.05 --outdir ./H3K27me3/MACS2output

.. and use "H3K27me3-LP-A_peaks.xls" in Peaks, and PeakCaller: "macs".

thanks for any assistance, cheers, Kieran

ADD REPLYlink modified 4 months ago by ATpoint26k • written 4 months ago by kieran.short0

Is this an error or a warning? Did you try to use a non-Excel file like the narrowPeak file?

Also, H3K27me3 is a broad peak mark, you should probably be using the --broad option in macs.

ADD REPLYlink modified 4 months ago • written 4 months ago by ATpoint26k
0
gravatar for Rory Stark
20 months ago by
Rory Stark550
University of Cambridge, Cancer Research UK - Cambridge Institute
Rory Stark550 wrote:

Has this been resolved?

You shouldn't need to specify the genome as DiffBind takes the chromosome names directly from the peak files, so if they are consistent it should work. However there was a bug at some point that impacted this in certain cases. What version of DiffBind are you using?

Alternatively you can send the DBA object to me and I can have a look at what may be going on.

Cheers-

Rory (DiffBind maintainer)

ADD COMMENTlink written 20 months ago by Rory Stark550
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1797 users visited in the last hour