Question: Best way to call low-frequency SNVs in antibody sequence from FASTQ
gravatar for maquino1985
2.7 years ago by
maquino19850 wrote:

I'm looking for a "standardized" workflow to align and call low-frequency variants from 150bp illumina sequencing data in fastq format. We are currently using bowtie 1.0 (as indels are not a concern) and a custom algorithm to find only variants that have a phred score >20 and avg neighborhood phred of >20 and are present in >1% of the aligned reads for a particular base from the output MAP file. I would like to replace it with something more modern/industry standard.

The purpose is to identify point mutations in subcloning experiments for b-cells and the reference is the parental clone antibody sequence

For the preprocessing steps does using BWA and GATK make sense for this type of data or is perhaps bowtie2 or something else a better aligner to use for speed with such a small reference?

Otherwise, I scanned through some of the published gatk "best practices" and tried to extract the relevant steps, it looks like the below might be a good preprocessing pipeline but if anyone could comment that would be very much appreciated:

  1. bwa mem to align fastq to ref.fa and convert to sam
  2. SortSam with Picard (and convert to bam)
  3. MarkDuplicates with Picard
  4. BuildBamIndex
  5. call variants with HaplotypeCaller (Gatk)
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by maquino19850
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2494 users visited in the last hour