Question: Identifying germline and tumor samples
1
gravatar for Kasthuri
5.4 years ago by
Kasthuri260
United States
Kasthuri260 wrote:

Given two genome .bam files, one of which we know is from a tumor sample and the other from normal/germline from the same person, is there an efficient way to correctly identify them bioinformatically?

Thanks. -K.

germline tumor normal • 1.8k views
ADD COMMENTlink modified 5.4 years ago by Renesh1.8k • written 5.4 years ago by Kasthuri260
1
gravatar for Sean Davis
5.4 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

A simple approach is to use a copy-number or allelic imbalance analysis.  Such analyses will almost always show significant abnormalities in the tumor sample.  While there will also be copy number and apparent blocks of loss of heterozygosity in a "normal" genome, tumors typically have this to a much larger extent.  There are many tools to do copy number analysis; the particular choice will probably not make much difference for such a broad question.

ADD COMMENTlink written 5.4 years ago by Sean Davis26k

Good idea. In fact, I did this analysis and saw one of them had huge variations (in particular loss). I inferred it should be the tumor since the other one was clean. I used Control-FREEC. I was thinking of some analysis that goes along with this to doubly confirm. For instance, if we call somatic mutations between *actual* normal (which we don't know) treating it as tumor and vice versa for the *actual* tumor treating it as normal, we should have less mutations since by theory a real tumor should contain all SNPs found in the germline plus de novo purely somatic mutations. But given the noise found in NGS, this seems tricky. 

ADD REPLYlink modified 5.4 years ago • written 5.4 years ago by Kasthuri260

I would suggest pairing your copy number analysis with an analysis of regions of allelic imbalance.  Your suggestion of doing a comparison of somatic variants should work, in theory, but somatic variant calling is, in my experience, not as quantitative as one might hope.  However, allelic imbalance is fairly robust and should be present in the vast majority of tumor samples.  Note that Control-FREEC should have this information readily available.  

ADD REPLYlink written 5.4 years ago by Sean Davis26k
0
gravatar for Manvendra Singh
5.4 years ago by
Manvendra Singh2.1k
Berlin, Germany
Manvendra Singh2.1k wrote:

Yes,

You can analyze their expression value following other Human lines which are available and you want to compare with.

Then cluster their transcriptome on spearman's correlation, where ever its clustering, sample belongs to the same

ADD COMMENTlink written 5.4 years ago by Manvendra Singh2.1k

I think the data are from genomic sequencing, not transcriptomic?  Perhaps @Kasthuri could comment.

ADD REPLYlink written 5.4 years ago by Sean Davis26k

Yes, these are genomic data and not from transcriptome. You are right Sean.

ADD REPLYlink written 5.4 years ago by Kasthuri260

Yes, I realized it now. I remove my answer.

ADD REPLYlink written 5.4 years ago by Manvendra Singh2.1k
0
gravatar for Renesh
5.4 years ago by
Renesh1.8k
United States
Renesh1.8k wrote:

Use gene expression analysis approach by counting the reads in these two samples. Compare the fold change in expression in between them.

ADD COMMENTlink written 5.4 years ago by Renesh1.8k

Sorry, I should have been more specific. This is WGS and not RNA-seq. Thanks.

ADD REPLYlink written 5.4 years ago by Kasthuri260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1881 users visited in the last hour