Question: VCF evaluation using RTG.jar vcfeval
0
gravatar for Sara
5 months ago by
Sara150
Sara150 wrote:

Hi there,

I am trying to do benchmarking for my pipeline (to analyze WES and WGS germline and generate VCF file for SNV and INDELs). to do so,

I got the WES data for this sample from hare:

https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/

and worked on these 2 datasets separately:

NIST7035_TAAGGCGA_L001_R1_001_trimmed.fastq.gz
NIST7035_TAAGGCGA_L001_R2_001_trimmed.fastq.gz





NIST7086_CGTACTAG_L002_R1_001_trimmed.fastq.gz
NIST7086_CGTACTAG_L002_R2_001_trimmed.fastq.gz

and also the VCF file from the same link as my reference (golden standard):

project.NIST.hc.snps.indels.vcf

then I tried to use the following command to evaluate my VCF file(made for the above files using my pipeline):

java -Xmx4G -jar  RTG.jar vcfeval -t Homo_sapiens.GRCh37.GATK.illumina.SDF  -T 6 --baseline=[GIAB truth VCF] --calls=[SNV/INDEL VCF] --all-records --bed-regions=[Exome BED file]

I made this folder : Homo_sapiens.GRCh37.GATK.illumina.SDF using this command:

rtg format --output  Homo_sapiens.GRCh37.GATK.illumina.SDF  hg19.fasta

as --baseline I used above VCF file (the golden standardnd as --calls I used the VCF file that I made). I also got the bed file from the same link. when I run the RTG.jar using the mentioned command I would get this error:

Error: No sample name provided but baseline is a multi-sample VCF.

do you know how to fix the problem?

Thanks

snp • 278 views
ADD COMMENTlink modified 5 months ago by Len Trigg1.5k • written 5 months ago by Sara150

I would contact Len Trigg at RTG: https://www.realtimegenomics.com/products/rtg-tools

ADD REPLYlink modified 5 months ago • written 5 months ago by Kevin Blighe69k

Hi Sara,

I am also benchmarking my pipeline using the same dataset as you have used. My question to you is, why you have selected project.NIST.hc.snps.indels.vcf as gold standard (Truth VCF) VCF and not HG001_*.vcf.gz (located at ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh37/)?

-Akshay

ADD REPLYlink written 5 months ago by Akshay Zawar10

You might get answer to your problem in this thread: vcfeval Error: No sample name provided but calls is a multi-sample VCF

ADD REPLYlink written 5 months ago by Akshay Zawar10
1
gravatar for Len Trigg
5 months ago by
Len Trigg1.5k
New Zealand
Len Trigg1.5k wrote:

Just circling back here for anyone coming from the future :-)

The --sample flag can be used to specify which sample to select from the baseline or calls VCF in the case of multi-sample VCFs. Check the vcfeval user manual for more information.

ADD COMMENTlink written 5 months ago by Len Trigg1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1822 users visited in the last hour