Question: From pooled fastq data to SNPs
gravatar for Fedster
13 months ago by
Fedster20 wrote:

I have some pooled GBS data (96 samples( that was generated by two runs on an Illumina HiSeq4000. The DNA was sise selected, between 100 and 200bp. The output is two files (run 1 and run 2) as fastq.gz files. For each sample I have an unique barcode -- said barcodes are in a list in a CSV file.

In addition I have a reference genome (as a single fasta file) for the organism in question.

What I want is the genotype (as SNPs) for each sample, and thus I need to demultiplex my data, ideally throw out fragments that have too low quality, align fragments and pileup, call SNPs etc.

Writing in January 2018, is there a preferred pipeline for this? I think I could do everything using Stacks, but other approaches might be available and offer greater speed/whatever other benefit? everything being similar a faster approach would be preferred.

ADD COMMENTlink modified 13 months ago by bari.ballew0 • written 13 months ago by Fedster20
gravatar for bari.ballew
13 months ago by
bari.ballew0 wrote:

You probably want to check out Broad's best practices ( Specifically, look mainly at the sections on data pre-processing and germline SNPS+indels. Alternatively, you could check out the bcbio pipeline (

Note that you're basically asking how to analyze sequencing data from nuts to bolts, so depending on your background, this will likely take a good bit of effort, both in reading/understanding the pipelines and in implementing. Best of luck!

ADD COMMENTlink written 13 months ago by bari.ballew0

I used PyRAD before, and it was slow as a rock. I would have hoped that, natural selection on the proliferation of alternative methods could have produced a sensible standard, sensible in terms of use, results and speed.

ADD REPLYlink written 13 months ago by Fedster20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1434 users visited in the last hour