Question: Problem Analyzing Tumor-Normal Pairs With Varscan
gravatar for tommivat
7.9 years ago by
tommivat240 wrote:

I am using Varscan 2.2.11 to analyze a tumor-normal pair of whole exome NGS data. I used 1000genomes reference genome (from here) and my .bam files are sorted. I use shell script similar to this to call VarScan and get the following summary after my run:

2 015 987 104 positions in tumor
2 015 519 315 positions shared in normal
   90 913 571 had sufficient coverage for comparison
            0 were called Reference
          496 were mixed SNP-indel calls and filtered
   90 834 408 were called Germline
            0 were called LOH
            0 were called Somatic
        78667 were called Unknown
            0 were called Variant

Obviously, there are some problems, since almost all positions with sufficient coverage are called Germline and none are called Reference. Can you point out what am I doing wrong, please.

varscan next-gen samtools cancer • 2.6k views
ADD COMMENTlink modified 7.9 years ago by Matt Shirley9.3k • written 7.9 years ago by tommivat240

I suspect that the reference used in the alignment (which I haven't done myself) and paired analysis has to be same. In this case, I do not (yet) know which reference was used in alignment. Can you confirm if this can create the problem above.

ADD REPLYlink written 7.9 years ago by tommivat240

The header of your bam file should contain information about which reference was used for alignment.

ADD REPLYlink written 7.9 years ago by Chris Miller21k
gravatar for Matt Shirley
7.9 years ago by
Matt Shirley9.3k
Cambridge, MA
Matt Shirley9.3k wrote:

You most likely need to realign one or both of your samples to the same reference. To confirm that the BAM files are actually using different references:

samtools view -H normal.bam | egrep "@SQ" > normal
samtools view -H tumor.bam | egrep "@SQ" > tumor
diff -y normal tumor

The above is just a simple way of grabbing the sequence dictionary from the header and comparing. If there are any differences between your files, then you will need to align them both to the same reference.

ADD COMMENTlink written 7.9 years ago by Matt Shirley9.3k

Thank you for the answer! There are no differences in headers. Do you still think I need to realign both samples to my current reference? How is such a realignment done? (I realize this must be a simple task but I haven't works with samtools too much.)

ADD REPLYlink written 7.9 years ago by tommivat240

I think that you are probably using a reference genome sequence that does not match what you two BAM files were aligned to. Try tracking down the reference that your BAM files were aligned to and specifying that as the reference for VarScan2.

ADD REPLYlink written 7.9 years ago by Matt Shirley9.3k

This solved the problem. Thanks for help!

ADD REPLYlink written 7.9 years ago by tommivat240
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1526 users visited in the last hour