Best way to call low-frequency SNVs in antibody sequence from FASTQ
Entering edit mode
4.1 years ago

I'm looking for a "standardized" workflow to align and call low-frequency variants from 150bp illumina sequencing data in fastq format. We are currently using bowtie 1.0 (as indels are not a concern) and a custom algorithm to find only variants that have a phred score >20 and avg neighborhood phred of >20 and are present in >1% of the aligned reads for a particular base from the output MAP file. I would like to replace it with something more modern/industry standard.

The purpose is to identify point mutations in subcloning experiments for b-cells and the reference is the parental clone antibody sequence

For the preprocessing steps does using BWA and GATK make sense for this type of data or is perhaps bowtie2 or something else a better aligner to use for speed with such a small reference?

Otherwise, I scanned through some of the published gatk "best practices" and tried to extract the relevant steps, it looks like the below might be a good preprocessing pipeline but if anyone could comment that would be very much appreciated:

  1. bwa mem to align fastq to ref.fa and convert to sam
  2. SortSam with Picard (and convert to bam)
  3. MarkDuplicates with Picard
  4. BuildBamIndex
  5. call variants with HaplotypeCaller (Gatk)
antibody SNP variant calling alignment • 970 views

Login before adding your answer.

Traffic: 2347 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6