Hello everyone,
I was wondering if you can share your experiences on copy number analysis (somatic) in non-human organisms. Since whole genome sequencing is not feasible, I would appreciate hints on array based approaches (e.g. mouse). I will also be glad if such analysis could be possible using whole exome sequencing; in which case my following question would be what the recommended tools are..
Thank you,
Noushin
Hi Stefano,
Thanks a lot for your suggestion. I looked at the documentation and it looks very promising. I will give it a try and will update this post based on my experience here. Best!
Upon getting started to run CNAnorm, I realized that one needs to specify a window width. Can I ask you if you know of any considerations one should be aware of when selecting window size for exome sequencing data?
Hi. exome is a bit more tricky becose is uneven, but as a rule of thumb, try to have, as average, 50 reads per window. In gene rich regions you will have more, in gene poor a bit less. HOw many reads do you have in total? Good luck.
Stefano
Thank you for the prompt response. That is exactly what I had in mind. In the exome scenario, doesn't this requirement favor quite large window sizes on average? My naive sense is that if one wants to brute force 50 reads per window for a fixed window size, the 99% of genome outside coding region will make this optimal window size quite large. I have in excess of 50 million reads. Thanks again!
50M reads is quite a lot, actually. In CNAnorm all windows are equally sized. If you set 10Kbp windows, you would get an average of 170 reads per window. Which is plenty. From a quick count, 85% of exons are less than 10Kbp apart, and 93% less than 25kbp apart, so most of your windows will have some reads.