What's next after GATK variant calling pipeline?
Entering edit mode
5 weeks ago
mgranada3 ▴ 30

I have 63 DNA-seq files which I put through the GATK variant calling pipeline (https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/)

This is my first time doing this and I am confused about what my next steps are. How do I know which information I need to create figures like a Mueller plot? Can anyone recommend a good guide (preferably using R) that may be compatible with my output?

My outputs were:

  1. A .csv file with compiled statistics which included: # of Reads, # of Aligned Reads, % Aligned, # Aligned Bases, Read Length, % Paired, % Duplicate, Mean Insert Size, # SNPs, # Filtered SNPs, # SNPs after BQSR, # Filtered SNPs after BQSR, Average Coverage

  2. Annotated SNP and Predicted Effects in a .html and .txt file. In the text file was #GeneName, GeneId, TranscriptId BioType, variants_impact_HIGH, variants_impact_LOW, variants_impact_MODERATE variants_impact_MODIFIER, variants_effect_3_prime_UTR_variant, variants_effect_5_prime_UTR_premature_start_codon_gain_variant, variants_effect_5_prime_UTR_variant, variants_effect_downstream_gene_variant, variants_effect_intron_variant, variants_effect_missense_variant, variants_effect_non_coding_transcript_variant, variants_effect_stop_lost, variants_effect_synonymous_variant, variants_effect_upstream_gene_variant

  3. In the HTML file: Summary, Variant rate by chromosome, Variants by type, Number of variants by impact, Number of variants by functional class, Number of variants by annotation, Quality histogram, InDel length histogram, Base variant table, Transition vs transversions (ts/tv), Allele frequency, Allele Count, Codon change table, Amino acid change table, Chromosome variants plots, Details by gene

GATK pipeline figures DNA-seq • 216 views
Entering edit mode

The link didn't work for me. After getting variants from HaplotypeCaller, there are a lot of different options for follow-up analyses, but it vastly depends on the scientific question and the organism. Are you interested in single variants and their effects, or genome level analysis? I am just going to sketch some options because I know too little about your samples:

  • run bcftools stats and generate the plots and report document (should do this all the time)
  • further filtration by MAF, Hardy-Weinberg, etc.
  • create summary statistics like number and type of variants, heterozygosity, theta, pi, detect LOH
  • perform linkage-analysis
  • detect sites under selection, selective sweeps
  • population genomics, detect admixture, population history
  • create phylogenetic trees
  • Look for known phenotype-associated SNPs in DBSNP, OMIM, etc. (only human)
  • Look at variants overlapping with your genes of interest

Login before adding your answer.

Traffic: 1546 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6