Can DESeq2 used for DRIP-seq analysis and how?
2
0
Entering edit mode
4.9 years ago
H.Chloe ▴ 20

Dear all.
I am using DESeq2 for analyzing differential expression of DRIP-seq data.
About DRIP-seq data, I have BAM files and BED files.(used featureCounts)
I already tried DESeq2 with BAM files, but I heard that BED files can be used in DESeq2.(I am not sure it's true.)
Could you tell me how to use BED files in DESeq2 if I can use BED files for analysis? Thank you!

R DESeq2 DRIP-seq • 1.5k views
ADD COMMENT
1
Entering edit mode

How did you use BAM files in the past with DESeq2?

AFAIK, DESeq needs a table of counts, so if your BED file contains a column that contains read counts that follow the assumptions made for RNA-seq data, you should be able to use it.

ADD REPLY
0
Entering edit mode

Thanks for your comment and I am sorry for my explanations. I created an RNA-seq count table using bam files.(with functions such as BamFileList and csv files) My BED files have a column for read counts so I will try to use this.

ADD REPLY
0
Entering edit mode

I have added a link to the DRIP seq wikipedia entry. It should be quite clear what the implications are. I have also changed the title to more specific.

ADD REPLY
4
Entering edit mode
4.9 years ago
Michael 54k

In short, afaik, DESeq2 cannot be used for differential peak expression analysis as required by DRIP seq.

Longer answer: That doesn't mean that one could not somehow run DESeq(2) or other DE analysis packages on these data, e.g. after re-formatting, inserting 'pseudo'-transcript regions and counts for peaks, and get some p-values. But you can simply never be sure they are valid or meaningful, (edit) unless it has been properly tested for the recommended purpose.

I should note that I have no experience with that specific data type of yours, so this is to take with a grain of salt. See also Friederike's comment.

DESeq(2) has been developed for differential expression analysis in RNA-sequencing data. It hasn't been validated on the data you are trying to use it on and there is no established protocol (again afaik, if you can find a good, published paper describing one, please go ahead, and ignore me). I think you could apply peak calling and differential peak calling methods that apply to ChIP-seq because the data is rather similar.

As a bit of general advice and even though it might sound a bit dull: we are not really meant to cook up our own protocols each time we are seeing new data but use state-of-the-art methods that are published. Unless someone is really at the forefront of experimental sequencing tech, there is almost always a published study that has analyzed similar data. So look for published reviews, methods papers, and analyses on similar data, attempt to replicate their methods and go from there, in case modifications are necessary.

If running an analysis, it always makes sense to be able to state the following:

  • I am running an analysis of DRIP-seq data
  • My experimental design is this: ...
  • My data looks like this: ...
  • I am following the methods protocols as described in: Zeller et al. (2016) (this might fit).
ADD COMMENT
1
Entering edit mode

Hi Michael,

I'd be interested to hear more about why you think DESeq2 would not be appropriate per se -- after all, DiffBind is basically a wrapper around DESeq and is commonly used for differential peak analysis. There are probably some caveats that should be taken into consideration (in fact, DiffBind might be the more appropriate choice if DRIP-seq is as similar to ChIP-seq as it is implied in the wikipedia article), but I don't see why DESeq would be an inappropriate choice for the analysis. Happy to hear your thoughts!

ADD REPLY
1
Entering edit mode

Hi Friederike, first thanks a lot for the relevant hint. Maybe you would like to give an answer yourself.

For explanation, the reason for not recommending DEseq was that I was unable to find any paper that has used this combination of tools in their Methods. Whenever the terms appeared in the same paper, the study had used DEseq for a separate RNA-seq data set only and a different method for other data. I should maybe have pointed out the AFAIK more, that was supposed to say, I don't have experience with this type of data, and I am assuming it is not very common either. Also, I didn't know that DiffBind was essentially a wrapper to DEseq. Still, it is a different package and OP could benefit from using it directly, rather than trying to divide the genome into bins, and then running DE analysis on these bins. Without more background on the original data, it was hard to give further advice.

My main message, to best use established and published protocols, remains valid I suppose.

ADD REPLY
0
Entering edit mode

Thanks for your kind answers.

I thought that DESeq2 would be good to analyze the differential expression, but I think I have to find another way for my analysis. And thanks for your guidance for my problem! Your answers helped me a lot to clear my mind!

ADD REPLY
0
Entering edit mode

Hi, thanks for accepting this and I hope it helps. Given the sparsity of the information and also my lack of knowledge of the data type, I am not sure if the votes are justified yet. Please let us know how it turns out.

ADD REPLY
3
Entering edit mode
4.9 years ago
h.mon 35k

The DESeq2 vignette provides ample information on almost all aspects of a differential expression analysis, please read it carefully. DESeq2 expects raw read counts as input data. the vignette has a section on how to import data from count matrices, such as produced by featureCounts.

DESeq can not analyse bam or bed files, although these files may be used upstream to DESeq2. Can you show a snippet of your bed files? What is the output of:

head file.bed
ADD COMMENT
0
Entering edit mode

Thanks for your answers and this is my BED file.

Geneid  Chr Start   End Strand  Length  read counts
chr2L_1 chr2L   0           1000        +     1001        0  
chr2L_2 chr2L   1000    2000    +   1001    0
chr2L_3 chr2L   2000    3000    +   1001    0
chr2L_4 chr2L   3000    4000    +   1001    0
chr2L_5 chr2L   4000    5000    +   1001    0
chr2L_6 chr2L   5000    6000    +   1001    0
chr2L_7 chr2L   6000    7000    +   1001    0
chr2L_8 chr2L   7000    8000    +   1001    0

Sorry for my poor explanation, and I had to say about making a count table with BAM or BED files for DESeq2. And I will study more with this vignette. Thank you!

ADD REPLY
1
Entering edit mode

Hey H.Chloe, please up-vote and / or Accept answers and comments (where appropriate) that have helped.

ADD REPLY

Login before adding your answer.

Traffic: 2017 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6