Question

How to align, filter, and call TruSeq Custom Amplicon data?

0

Entering edit mode

9.8 years ago

idedios ▴ 30

For the longest time I've been using BaseSpace to process my TruSeq Amplicon data however my company wants to move away from them for data security and reliability reasons.

So I've been tasked with figuring out how to streamline and eventually automate the processing of TruSeq Cancer Panel and TruSight Myeloid data generated from a NextSeq 500. The samples are run 2x150bp and have an average coverage of 5k with a max coverage of 50k. The NextSeq has 4 flowcell lanes as well (8 fastq files for each sample, ~40 samples per run). The TruSeq Cancer Panel samples are all FFPE and the TruSight Myeloid samples are all whole-blood.

I have some familiarity with working with Linux and have been using Fedora to demultiplex NextSeq data using bcl2fastq2. I've also looked into the whole workflow for aligning the data with BWA, converting the .sam files to .bam using Samtools, filtering with Samtools/PicardTools, and variant calling with GATK. However I've never been able to successfully process a single sample all the way through due to incorrect use of filtering or some other early step.

I guess what I'm asking is what are the commands and options to use for each program to process the samples from beginning to end. From there I'll work on making a bash or python script to pipe the output from one program into the next and so on, and then eventually making a daemon that will automatically detect and start processing data.

BWA SBS Samtools PicardTools GATK • 3.9k views

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by idedios ▴ 30

0

Entering edit mode

So bcbio-nextgen looks like the workflow I'm looking for. The only thing is it seems to get stuck during installation. Apparently ant-nodeps is a non-existant package (not for the latest version of ant anyway).

What should I do about this issue?

Also I saw on the git repo that bcbio-nextgen is targeted at RNASeq. Is it also compatible with DNASeq?

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by idedios ▴ 30

Ram · Answer 1 · 2015-01-13

I don't have specific advice for amplicon sequencing, but for the less specialized task of going from raw sequencer data to mapped reads, I'd recommend using an established automated pipeline such as bcbio-nextgen. It sounds like it could do most of what you want on the front end, starting with bcl files (see their "Sequencer integration" page here) and going to bams with mapped reads. After that, you could probably customize the variant calling parameters they have to optimize for amplicon sequencing.