Question: DE novo call from trio
0
gravatar for cristina_sabiers
5 weeks ago by
Spain
cristina_sabiers70 wrote:

Hi!

I had done

  • Alignment
  • Sorting sam file
  • Mark duplicates
  • Realignment call variants
  • BQSR
  • Call raw snps and Apply hard filtering

Then I merge my three vcf files (dad, mum, child)

 java -jar GenomeAnalysisTK.jar -T CombineVariants -R hg38.fa --variant sample02.vcf --variant sample06.vcf --variant sample08.vcf -o 268merged.vcf -genotypeMergeOptions UNIQUIFY

and apply next genotype refinement workflow

Im wondering if its common to dont get any results as hiConfDeNovo and lowConfDeNovo in the results (raw snps or after hard filtering)

 INFO=<ID=hiConfDeNovo,Number=1,Type=String,Description="High confidence possible de novo mutation (GQ >= 20 for all trio members)=[comma-delimited list of child samples]">
 INFO=<ID=loConfDeNovo,Number=1,Type=String,Description="Low confidence possible de novo mutation (GQ >= 10 for child, GQ > 0 for parents)=[comma-delimited list of child samples]">
CHROM   POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample02.variant    sample06.variant2   sample08.variant3
1   917495  .   C   T   30.74   PASS    AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;PG=0,0,0;QD=15.37;SOR=0.693;set=variant3   GT:AD:DP:FT:GQ:PL:PP    ./.:.:.:PASS    ./.:.:.:PASS    1/1:0,2:2:lowGQ:6:58,6,0:58,6,0

Thanks :)

exome • 232 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by cristina_sabiers70
1

Its possible. But I think following change would help.

  1. After BQSR, create g.vcf files for each sample using HaplotypeCaller -ERC GVCF option.
  2. Create a combined genotype using GenotypeGVCFs
  3. Separate SNPs and Indels using SelectVariants and perform hard filtration
  4. Combine SNPs and Indels using CombineVariants

Can you confirm that following steps were followed after hard filtration?

  1. Perform genotype refinement using CalculateGenotypePosteriors. A ped file will have to be provided.
  2. Filter low quality genotypes using VariantFiltration
  3. Annotate de novos using VariantAnnotator using -A PossibleDeNovo option. Again ped file will have to be provided.
ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Satyajeet Khare1.4k
0
gravatar for cristina_sabiers
5 weeks ago by
Spain
cristina_sabiers70 wrote:

Thanks for the help Khare :)

Its does really matter if you use gvcf format instead vcf? I thought was the same. Will try with -ERC GVCF option. and see if I get any different result. Thanks for the advice.

If I use Varscan using mpileup at the end I get similar results. I am just a bit surprised didnt appear any low or high De novo variants using Gatk.

"Can you confirm that following steps were followed after hard filtration?"

I do. I perform this pipeline after doing hard filtration (I only select snps variants atm Im not interested on indels)

CombineVariants

java -jar GenomeAnalysisTK.jar -T CombineVariants -R hg38.fa --variant 04PASS.vcf --variant 06PASS.vcf --variant 08PASS.vcf -o 468mergedPASS.vcf -genotypeMergeOptions UNIQUIFY

Convert to biallelic, If I dont perform those steps gatk gives me error.

java -jar GenomeAnalysisTK.jar -T SelectVariants -R hg38.fa --variant 468mergedPASS.vcf -restrictAllelesTo BIALLELIC -o bi468merged.vcf

java -jar GenomeAnalysisTK.jar -T SelectVariants -R hg38.fa --variant 1000G_phase1.snps.high_confidence.b37.vcf.gz -restrictAllelesTo BIALLELIC -o 1000G_phase1.snps.high_confidence.b37_BIALLELIC_ONLY.vcf.gz

Step 1: Derive posterior probabilities of genotypes

java -jar GenomeAnalysisTK.jar -R hg38.fa -T CalculateGenotypePosteriors --supporting 1000G_phase1.snps.high_confidence.b37_BIALLELIC_ONLY.vcf.gz -ped fam10.ped -V bi468merged.vcf -o recalibratedVariants.postCGP.vcf

Step 2: Filter low quality genotypes

java -jar GenomeAnalysisTK.jar -T VariantFiltration -R hg38.fa -V recalibratedVariants.postCGP.vcf -G_filter "GQ < 20.0" -G_filterName "lowGQ" -o recalibratedVariants.postCGP.Gfilteredcall.vcf

Step 3: Annotate possible de novo mutations

java -jar GenomeAnalysisTK.jar -T VariantAnnotator -R hg38.fa -V recalibratedVariants.postCGP.Gfilteredcall.vcf -A PossibleDeNovo -ped fam10.ped -o recalibratedVariants.postCGP.Gfiltered.deNovos.vcf
ADD COMMENTlink written 5 weeks ago by cristina_sabiers70
1

No, gvcf wont change the results. Just that adding samples later will be easier. I am not sure why "convert to biallelic" step is necessary. I also worked only on SNPs but did not run this command. Everything else looks fine. What were the hard filtration parameters?

Important: You may want to use the same genome version for all files. I mean the genome fasta file should also be from the b37 bundle.

Best,

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Satyajeet Khare1.4k

Ok, thanks

"I am not sure why "convert to biallelic" step is necessary."

I followed the tips at gatk forum

CalculateGenotypePosteriors error

"What were the hard filtration parameters?"

I used this pipeline:

https://gencore.bio.nyu.edu/variant-calling-pipeline/

 java -jar GenomeAnalysisTK.jar -T VariantFiltration -R ref -V raw_snps.vcf --filterExpression 'QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0 || SOR > 4.0' --filterName "basic_snp_filter" -o filtered_snps.vcf

.

"Important: You may want to use the same genome version for all files. I mean the genome fasta file should also be from the b37 bundle."

Im bit confuse at that point, when I did my alignment I used

wget ftp://ftp.ensembl.org/pub/release-97/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.chromosome.*.fa.gz

I should use instead --supporting 1000G_phase1.snps.high_confidence.b37_BIALLELIC_ONLY.vcf.gz that one?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by cristina_sabiers70
1

Oh, its select biallelic. Anyway, as I mentioned, I never had to do this step.

Regarding the last point (and this may get rid of this biallelic issue), you need to use b37 package for all files. Assuming you downloaded the vcf file 1000G_phase1.snps.high_confidence.b37.vcf.gz from this link either using Google cloud or FTP server access, you should download the genome fasta file also from the same folder. It is better to use all files from the same bundle to avoid compatibility issues.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Satyajeet Khare1.4k

Ok, but If I use the reference genome from ftp.broadinstitute.org/bundle then I asume I should do the alignment over again from the start with my fastq files right? ( I used from start the reference genome from ensembl.org)

Should really affect that point on my results to get high or low denovo variants?

Thanks

ADD REPLYlink written 4 weeks ago by cristina_sabiers70
1

Yes, you need to perform the alignment again. I think it will affect the outcome because the coordinates of the vcf file won't match the alignment files in current scenario.

ADD REPLYlink written 4 weeks ago by Satyajeet Khare1.4k

thanks for your time Khare, I really appreciate it :)

I have a last question

I could just use my reference genome from ensembl.org and use as --supporting file ftp://ftp.1000genomes.ebi.ac.uk/../../vol1/ftp/release/20130502/phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.....files?

then my files should have the same compatibility right?

The reason bc I use the reference genome from ensembl is the compatibility for do the annotations with snpEff. In my pc I take a hole day to do alignment so if I can avoid to do over again would save me lot time.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by cristina_sabiers70

The link did not open, but assuming that you have a vcf file from the same source and corresponding to the same genome version you used for alignment, it should work. But I would still recommend using GATK bundle since you may need some files in future which are exclusively present in the GATK bundle and then there will be compatibility issues.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Satyajeet Khare1.4k

Hello Khare

sorry, I didnt want to write you back until I perfom again the alignment. this time I used the reference genome from GATK and supporting file from GATK, and run the process from the start. My results are almost the same and dint get any high or lowdenovo variants.

ADD REPLYlink written 22 days ago by cristina_sabiers70

Sorry about the late reply. Did you get an error without "convert to biallelic" step this time too?

ADD REPLYlink written 6 days ago by Satyajeet Khare1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1155 users visited in the last hour