Question: Usage of GATK Joint genotyping and filtering, or filtering and then joint genotyping?
0
gravatar for cristianrohr768
2.7 years ago by
Spain
cristianrohr76830 wrote:

Hello,

I have 11 samples from a custom Truseq design

I used HaplotypeCaller for my 11 samplesjava -jar /home/horus/Instaladores/GenomeAnalysisTK-3.4-0/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/horus/Escritorio/GATK/GATK/2.8/b37/human_g1k_v37.fasta -ERC BP_RESOLUTION -I $file/alineamiento/recal_reads.bam -L ../../../../data/bed/disenio_ENP_illumina2_ordenado.bed --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o $file/alineamiento/raw_variants_bed.gvcf -variant_index_type LINEAR -variant_index_parameter 128000

and then do joint genotyping on my 11 samples
java -jar /home/horus/Instaladores/GenomeAnalysisTK-3.4-0/GenomeAnalysisTK.jar -T GenotypeGVCFs -R /home/horus/Escritorio/GATK/GATK/2.8/b37/human_g1k_v37.fasta -D /home/horus/Escritorio/GATK/GATK/2.8/b37/dbsnp_138.b37.vcf  --variant ../Analisis/SAR093-2015/alineamiento/raw_variants_bed.gvcf --variant ../Analisis/SAR094-2015/alineamiento/raw_variants_bed.gvcf --variant ../Analisis/SAR095-2015/alineamiento/raw_variants_bed.gvcf --variant ../Analisis/SAR096-2015/alineamiento/raw_variants_bed.gvcf --variant ../Analisis/SAR097-2015/alineamiento/raw_variants_bed.gvcf --variant ../Analisis/SAR098-2015/alineamiento/raw_variants_bed.gvcf --variant ../Analisis/SAR099-2015/alineamiento/raw_variants_bed.gvcf --variant ../Analisis/SAR100-2015/alineamiento/raw_variants_bed.gvcf --variant ../Analisis/SAR101-2015/alineamiento/raw_variants_bed.gvcf --variant ../Analisis/SAR102-2015/alineamiento/raw_variants_bed.gvcf --variant ../Analisis/SAR103-2015/alineamiento/raw_variants_bed.gvcf -o output.vcf


After the joint genotyping i have the file output.vcf a multisample VCF, i need to filter the variants (hard filtering), and i don't know which is the proper aproach to do this:

- Just apply hard filters to this multi sample VCF
- Split the multisample vcf by sample, and apply filters to each individual VCF (which i think is weird, since after the split the INFO column in every VCF is the same)

or

-Apply hard filters to each VCF, and then do the joing genotyping step???

thanks

ADD COMMENTlink modified 2.7 years ago by nchuang180 • written 2.7 years ago by cristianrohr76830
0
gravatar for nchuang
2.7 years ago by
nchuang180
United States
nchuang180 wrote:

maybe i'm a bit confused but are you following the best practices for GATK?

I haven't checked it in a bit but I believe after joint genotyping you have to do VQSR then do refinement steps and then you would do the filtering. I think the concept is to only filter after you are done processing the data. Is this human data? Why do you have to do hard filter?

you do not need to split by sample to filter in any case (where are you getting this?). 

ADD COMMENTlink written 2.7 years ago by nchuang180

Hello @nchuang, i'm following the GATK best practices. To be able to use the VQSR you need a lot of samples, more than 30 exomes. In my case i just have 11 samples, from targeted sequencing, so i must stick to the hard filters.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by cristianrohr76830
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1471 users visited in the last hour