How to calculate genotype concordance, comparing two .vcf files with Picard or SnpSift
3
0
Entering edit mode
3.6 years ago
anamaria ▴ 220

Hello,

I was running:

java -jar picard.jar GenotypeConcordance -CALL_VCF new.vcf -CALL_SAMPLE 0_fam0110_G110 -O gc_concordance.vcf -TRUTH_VCF old.vcf -TRUTH_SAMPLE 0_fam0110_G110

and I got this error:

; Provider GCS is not available; Picard version: Version:2.23.3 [Wed Sep 02 20:01:28 CDT 2020] picard.vcf.GenotypeConcordance done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=2020081664 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequences at index 0 don't match: 0/247185616/1 0/249139234/1 at htsjdk.samtools.util.SequenceUtil.assertSequenceListsEqual(SequenceUtil.java:272) at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:334) at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:320) at picard.vcf.GenotypeConcordance.doWork(GenotypeConcordance.java:344) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:301) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

java -version
openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)

Can someone please tell me how to resolve this issue?

I also tried calculating concordance between those two .vcf files via:

java -jar SnpSift.jar concordance -v old.vcf new.vcf > concordance.txt

I transposed the results and for one sample I am getting this:

                             sample MISSING_ENTRY_old.MISSING_ENTRY_new 
                    0_fam6013_G6013                                   0 
   MISSING_ENTRY_old.MISSING_GT_new               MISSING_ENTRY_old.REF 
                                  0                              899539 
            MISSING_ENTRY_old.ALT_1             MISSING_ENTRY_old.ALT_2 
                             743281                              232363 
   MISSING_GT_old.MISSING_ENTRY_new       MISSING_GT_old.MISSING_GT_new 
                                  0                                   0 
                 MISSING_GT_old.REF                MISSING_GT_old.ALT_1 
                                  0                                   0 
               MISSING_GT_old.ALT_2               REF.MISSING_ENTRY_new 
                              0                             1603978 
             REF.MISSING_GT_new                             REF.REF 
                              0                                  96 
                      REF.ALT_1                           REF.ALT_2 
                             73                                  14 
        ALT_1.MISSING_ENTRY_new                ALT_1.MISSING_GT_new 
                         766721                                   0 
                      ALT_1.REF                         ALT_1.ALT_1 
                             34                                  39 
                    ALT_1.ALT_2             ALT_2.MISSING_ENTRY_new 
                             11                              172654 
           ALT_2.MISSING_GT_new                           ALT_2.REF 
                              0                                   9 
                    ALT_2.ALT_1                         ALT_2.ALT_2 
                              5                                   4 
                          ERROR 
                           1684

Can you please help me understand what does this output tells about difference of my sample 0_fam6013_G6013 in old and new vcf, which are the key parameters to look for and is there is any way to visualize this?

picard GenotypeConcordance SnpSift • 2.7k views
ADD COMMENT
0
Entering edit mode
2.7 years ago

Hi @anamaria. Did you ever figure out how to interpret the output of the concordance SNPSift output files?

ADD COMMENT
0
Entering edit mode
2.7 years ago
sbstevenlee ▴ 480

Just sharing another possible solution using Python API with the pyvcf.compare method I wrote:

>>> from fuc import pyvcf
>>> data = {
...     'CHROM': ['chr1', 'chr1', 'chr1', 'chr1', 'chr1'],
...     'POS': [100, 101, 102, 103, 104],
...     'ID': ['.', '.', '.', '.', '.'],
...     'REF': ['G', 'CT', 'T', 'C', 'A'],
...     'ALT': ['A', 'C', 'A', 'T', 'G,C'],
...     'QUAL': ['.', '.', '.', '.', '.'],
...     'FILTER': ['.', '.', '.', '.', '.'],
...     'INFO': ['.', '.', '.', '.', '.'],
...     'FORMAT': ['GT', 'GT', 'GT', 'GT', 'GT'],
...     'A': ['0/1', '0/0', '0/0', '0/1', '0/0'], # "Test" sample
...     'B': ['1/1', '0/1', './.', '0/1', '0/0'], # "Truth" sample
... }
>>> vf = pyvcf.VcfFrame.from_dict([], data)
>>> # vf = pyvcf.VcfFrame.from_file('in.vcf')
>>> vf.df
  CHROM  POS ID REF  ALT QUAL FILTER INFO FORMAT    A    B
0  chr1  100  .   G    A    .      .    .     GT  0/1  1/1
1  chr1  101  .  CT    C    .      .    .     GT  0/0  0/1
2  chr1  102  .   T    A    .      .    .     GT  0/0  ./.
3  chr1  103  .   C    T    .      .    .     GT  0/1  0/1
4  chr1  104  .   A  G,C    .      .    .     GT  0/0  0/0
>>> # Return (Ab, aB, AB, ab) == (FP, FN, TP, TN)
>>> vf.compare('A', 'B', mode='all')
(0, 1, 2, 1)
>>> vf.compare('A', 'B', mode='snv')
(0, 0, 2, 1)
>>> vf.compare('A', 'B', mode='indel')
(0, 1, 0, 0)
ADD COMMENT
0
Entering edit mode
2.7 years ago

Can someone please tell me how to resolve this issue?

you're comparing two VCFs that were not called with the very same reference.

See the lines starting with ##contig= in both VCF header.

ADD COMMENT

Login before adding your answer.

Traffic: 2555 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6