Question

Best way to call low-frequency SNVs in antibody sequence from FASTQ

0

Entering edit mode

6.0 years ago

maquino1985 • 0

I'm looking for a "standardized" workflow to align and call low-frequency variants from 150bp illumina sequencing data in fastq format. We are currently using bowtie 1.0 (as indels are not a concern) and a custom algorithm to find only variants that have a phred score >20 and avg neighborhood phred of >20 and are present in >1% of the aligned reads for a particular base from the output MAP file. I would like to replace it with something more modern/industry standard.

The purpose is to identify point mutations in subcloning experiments for b-cells and the reference is the parental clone antibody sequence

For the preprocessing steps does using BWA and GATK make sense for this type of data or is perhaps bowtie2 or something else a better aligner to use for speed with such a small reference?

Otherwise, I scanned through some of the published gatk "best practices" and tried to extract the relevant steps, it looks like the below might be a good preprocessing pipeline but if anyone could comment that would be very much appreciated:

bwa mem to align fastq to ref.fa and convert to sam
SortSam with Picard (and convert to bam)
MarkDuplicates with Picard
BuildBamIndex
call variants with HaplotypeCaller (Gatk)

antibody SNP variant calling alignment • 1.1k views

ADD COMMENT • link 6.0 years ago by maquino1985 • 0