Identifying germline and tumor samples
3
1
Entering edit mode
9.5 years ago
Kasthuri ▴ 300

Given two genome .bam files, one of which we know is from a tumor sample and the other from normal/germline from the same person, is there an efficient way to correctly identify them bioinformatically?

Thanks. -K.

Germline Tumor Normal • 2.8k views
ADD COMMENT
1
Entering edit mode
9.5 years ago

A simple approach is to use a copy-number or allelic imbalance analysis. Such analyses will almost always show significant abnormalities in the tumor sample. While there will also be copy number and apparent blocks of loss of heterozygosity in a "normal" genome, tumors typically have this to a much larger extent. There are many tools to do copy number analysis; the particular choice will probably not make much difference for such a broad question.

ADD COMMENT
0
Entering edit mode

Good idea. In fact, I did this analysis and saw one of them had huge variations (in particular loss). I inferred it should be the tumor since the other one was clean. I used Control-FREEC. I was thinking of some analysis that goes along with this to doubly confirm. For instance, if we call somatic mutations between actual normal (which we don't know) treating it as tumor and vice versa for the actual tumor treating it as normal, we should have less mutations since by theory a real tumor should contain all SNPs found in the germline plus de novo purely somatic mutations. But given the noise found in NGS, this seems tricky.

ADD REPLY
0
Entering edit mode

I would suggest pairing your copy number analysis with an analysis of regions of allelic imbalance. Your suggestion of doing a comparison of somatic variants should work, in theory, but somatic variant calling is, in my experience, not as quantitative as one might hope. However, allelic imbalance is fairly robust and should be present in the vast majority of tumor samples. Note that Control-FREEC should have this information readily available.

ADD REPLY
0
Entering edit mode
9.5 years ago
Manvendra Singh ★ 2.2k

Yes,

You can analyze their expression value following other Human lines which are available and you want to compare with.

Then cluster their transcriptome on spearman's correlation, where ever its clustering, sample belongs to the same

ADD COMMENT
0
Entering edit mode

I think the data are from genomic sequencing, not transcriptomic? Perhaps @Kasthuri could comment.

ADD REPLY
0
Entering edit mode

Yes, these are genomic data and not from transcriptome. You are right Sean.

ADD REPLY
0
Entering edit mode

Yes, I realized it now. I remove my answer.

ADD REPLY
0
Entering edit mode
9.5 years ago
Renesh ★ 2.2k

Use gene expression analysis approach by counting the reads in these two samples. Compare the fold change in expression in between them.

ADD COMMENT
0
Entering edit mode

Sorry, I should have been more specific. This is WGS and not RNA-seq. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2911 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6