I want to calculate if the sample has whole genome duplication event or not?
Entering edit mode
14 months ago
Hyper_Odin ▴ 310

I have the copy number data, and i am taking into consideration the mean count of the major allele from each chromosome

the method is published here. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6072608/

and following is the script, I want to confirm is taking the mean of each chromosome correct?

major_cn <- diffa %>%
  filter(chromosome %in% 1:22) %>%
  group_by(chromosome) %>%
  summarize(major_cn = mean(nMajor))

prop_major_cn_two_or_more <- sum(major_cn$major_cn >= 2) / nrow(major_cn)

if (prop_major_cn_two_or_more >= 0.5) {
  cat("whole genome duplication.\n")
} else {
  cat("no evidence\n")


sequencing wgs • 859 views
Entering edit mode

Hello there! I think you might be wrong,your script checked if >50% segements's major cn >=2, however the sum of lengths of 50%segments may not equal to that of autosome. BTW, I'm also now trying to find a way to check if my sample(WES) had gone through a WGD, could you please share the paper? Thanks

Entering edit mode

first the picture is partially cut off. second, even if it werent, the algorithm proposed is ... imperfect. i would not use it. it misses one of the major problems with calling WGD from NGS data altogether unless there is more being done than we know about. tell me this: if you have a diploid chromosome that doubles, how do you tell that using NGS data alone? if thats the only genomic event, it is perfectly WGD. but it is also almost perfectly difficult to distinguish from 2 diploid cells, is it not? do you have cell barcoding you didnt tell us about? do you have a known diploid comparator on the same flow cell?

technically you didnt tell us what tech this is, even. in actuality we can't answer your question with any certainty.

if you are OK with rough guestimate that you havent benchmarked, then I am sure you can get the code you need, but please be advised that determining definitively whether WGD occurred definitively is pretty difficult in some cases...

it is particularly non trivial if you don't have some kind of dedicated control for overall CN level. typically youll have other gains and losses that will help you. for instance, if after a WGD, you have a gain of 5 and a loss of 3, youd have some amount of signal that is hard to tell apart from diploid for all the chromosoomes with CN of 4.

but the 5 and 3 would help you out because youd have a ratio of 5/4 and 3/4 for those 2.. but if no WGD, those ratios should be 3/2 and 1/2.

this isnt always true depending on the route to WGD, but the problem is in the worst case it can be... so overall writing an algorithm that handles it correctly without a known comparator is tricky

Entering edit mode

https://doi.org/10.1038/s41586-023-05783-5 This paper describes additional methods to determine WGD after calculating if >50% genome has a major cn >= 2 in its methods part. Hope it help.

Entering edit mode

Hello, i did not follow up on the problem since it was bit tricky. I figured out that tools like ASCAT gives you if there is WGD or NOT in the output apart from Copy number analysis. I appreciate all your efforts, thank you !!


Login before adding your answer.

Traffic: 2136 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6