Copy number variation using HTSeq/DESeq2
1
0
Entering edit mode
6.4 years ago
ThePresident ▴ 180

I was wondering if the combination of HTSeq / DESeq2 (normally used for differential expression analysis in RNA-seq) could be used for copy number variation in case of DNA sequencing? I can't see why not, but again I didn't look at the math behind DESeq package.

Thanks, TP

copy number DESeq HTSeq • 3.5k views
ADD COMMENT
0
Entering edit mode

Hi ThePresident,

Make sure to chose a more descriptive title for future threads, in this case "Copy number variation using HTSeq/DESeq2" would have been more informative. I've adapted this thread now, but please keep this in mind.

Cheers,
Wouter

ADD REPLY
0
Entering edit mode

I will, thank you for the advice.

ADD REPLY
5
Entering edit mode
6.4 years ago

If you don't see copy number variation analysis mentioned in the DESeq / DESeq2 manual, then don't use it for that purpose. The data distribution of your CNV data will not match that expected by DESeq (expects a negative binomial distribution). CNV data is measured as discrete intervals, so, something like a Hidden Markov Model (HMM) is more commonly employed (although it can be measured on a continuous scale too).

Note that the same question was asked in relation to edgeR and CNV on the Bioconductor forum: Question: edgeR for CNV detection

Also, take a look at this other Biostars question: Copy Number Variation from paired end RNA-Seq data Note, in particular Devon's reply, where he alludes to the "fundamental limitation" of trying to detect CNV from RNA-seq. This limitation relates to the fact that a copy number event does not necessarily alter gene expression levels. A gene could easily be duplicated, for example, but, without the promoter sequence and/or transcription start site (TSS), it will not be expressed (or just expressed at negligible levels).

If you can't afford to whole genome sequence, then the Affymetrix SNP 6.0 array can determine genome-wide CNV profiles, along with genotyping SNPs. I used this in my PhD years ago.

Best of luck, Kevin

ADD COMMENT
2
Entering edit mode

Just to add, CNV calling with a DE tool having the assumption that data is normally distributed does not in any way accord for finding CNV which works on discrete data. One needs to find the right tool and the right distribution for finding CNVs and there are plenty of technology to produce the data and tools to generate copy profiles from those data. One important this is properly accounting for allelic frequencies while scanning through the genome and then using segmentation for finding copy ratios. This cannot be done with DESeq2. Try to read about which technology is specific to which kind of data generation first to better understand the power and utility.

ADD REPLY
1
Entering edit mode

To add to this, for exome sequencing there is also exomeCopy from the same author as DESeq2. But tens of tools for WES CNV analysis are available.

ADD REPLY
0
Entering edit mode

I totally agree with all of you. I was simply curious but Kevin brought a good argument about different expected distributions in DE vs CNV. Thank you all.

ADD REPLY
1
Entering edit mode

Great - good luck. Hope that you win the next election, Mr. ThePresident.

PS - note that DESeq2 does say that it can be used for ChIP-seq data, although it is definitely more renowned for RNA-seq.

ADD REPLY
0
Entering edit mode

EdgeR and DESeq2 can be used for ChIPSeq mostly for differential peak calling which is different from CNV. Again data is counts and distribution is in accordance with RNAseq.

ADD REPLY
1
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLY

Login before adding your answer.

Traffic: 2644 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6