as far as i know, GIAB provides High_Conf.bed + High_Conf.vcf (Available here ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/), No Fastq files are provided.
but Garvan data is available here (ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/) with the bed file where you need it for variants calling and benchmark process.
1- download the reference genome (1-22, X, Y and MT) from any database (i used USCS) join all fasta files into one fasta by 'cat' command in lunix.
2- index your reference
bwa index -a bwtsw 'filesName'.fasta
3- align your read by bwa
bwa mem 'filesName'.fasta 'FileName'_1.fastq 'FileName'_2.fastq > 'FileName'.sam
4-Convert Sam file to Bam file
samtools view -@ '#of Thread you want to use as int.' -bS 'FileName'.sam -o 'FileName'.bam
5-Add read group to bam files as needed by GATK
picard-tools AddOrReplaceReadGroups I='InFileName'.bam O='OutFileName'.bam RGLB=lib1 RGPL=illumina RGPU=unit1 RGSM='SampleName' SORT_ORDER=coordinate CREATE_INDEX=TRUE VALIDATION_STRINGENCY=LENIENT
6-Remove duplicated read
picard-tools MarkDuplicates I='InFileName'.bam O='OutFileName'.bam M=marked_dup_metrics.txt REMOVE_DUPLICATES=true
7-Index last Bam for Varients calling
samtools index -@ '#of Thread you want to use as int.' 'FileName'.bam
gatk HaplotypeCaller --native-pair-hmm-threads ''#of Thread you want to use as int.'' -I='InFileName'.bam -O='OutFileName'.vcf -R='Ref_SeqName'.fasta -L='BedFileName'.bed
9-Now you are ready to compare you VCF (Quary.vcf) with GIAB VCF (True.vcf)
rtg vcfeval -b True.vcf regions='GIAB_BedFile'.bed -c Quary.vcf --evaluation-regions='YourBedFile' -o 'Path to output the reult' -t 'Path to SDF file'
- Do not forget to create SDF file from you reference sequence by rtg tools format
- Please see 'https://cdn.rawgit.com/RealTimeGenomics/rtg-tools/master/installer/resources/tools/RTGOperationsManual/index.html' for better understanding
- Use Fastqc 'or similar' to check the data quality and trim them if needed
after you achieve the result you want, try to make scrip from the above code in order to process sample in batches, but it the result still not reach you needs read in each tool documents and change the parameter accordingly.
I hope it was clear!