Question: Recommended Approach For Copy Number Analysis In Non-Human Organisms
2
gravatar for Noushin N
6.7 years ago by
Noushin N580
Baltimore, MD
Noushin N580 wrote:

Hello everyone,

I was wondering if you can share your experiences on copy number analysis (somatic) in non-human organisms. Since whole genome sequencing is not feasible, I would appreciate hints on array based approaches (e.g. mouse). I will also be glad if such analysis could be possible using whole exome sequencing; in which case my following question would be what the recommended tools are..

Thank you,

Noushin

array mouse copynumber somatic • 2.2k views
ADD COMMENTlink modified 6.7 years ago by Stefano Berri4.1k • written 6.7 years ago by Noushin N580
1
gravatar for Stefano Berri
6.7 years ago by
Stefano Berri4.1k
Cambridge, UK
Stefano Berri4.1k wrote:

Hi. You don't need high coverage data for copy number detection at the resolution of arrays.

You could use CNAnorm. It is designed to detect somatic CNA from low coverage genomic data (2 Million reads would be enough, very affordable if you multiplex on a run) and it does not assumes a particular reference genome (it has extra features if you use hg19, but they are mainly cosmetic)

You could use the development version that has some nice extra features and a better vignette. It will become release in April. I have tested on capture exome data, and it works pretty well. You can find further information and link to the paper here

ADD COMMENTlink modified 6.7 years ago • written 6.7 years ago by Stefano Berri4.1k

Hi Stefano,

Thanks a lot for your suggestion. I looked at the documentation and it looks very promising. I will give it a try and will update this post based on my experience here. Best!

ADD REPLYlink written 6.7 years ago by Noushin N580

Upon getting started to run CNAnorm, I realized that one needs to specify a window width. Can I ask you if you know of any considerations one should be aware of when selecting window size for exome sequencing data?

ADD REPLYlink written 6.7 years ago by Noushin N580

Hi. exome is a bit more tricky becose is uneven, but as a rule of thumb, try to have, as average, 50 reads per window. In gene rich regions you will have more, in gene poor a bit less. HOw many reads do you have in total? Good luck.

Stefano

ADD REPLYlink written 6.7 years ago by Stefano Berri4.1k

Thank you for the prompt response. That is exactly what I had in mind. In the exome scenario, doesn't this requirement favor quite large window sizes on average? My naive sense is that if one wants to brute force 50 reads per window for a fixed window size, the 99% of genome outside coding region will make this optimal window size quite large. I have in excess of 50 million reads. Thanks again!

ADD REPLYlink written 6.7 years ago by Noushin N580

50M reads is quite a lot, actually. In CNAnorm all windows are equally sized. If you set 10Kbp windows, you would get an average of 170 reads per window. Which is plenty. From a quick count, 85% of exons are less than 10Kbp apart, and 93% less than 25kbp apart, so most of your windows will have some reads.

ADD REPLYlink written 6.7 years ago by Stefano Berri4.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1762 users visited in the last hour