I am working on raw reads from GBS. My final goal is to obtain a SNP data set using the Stacks pipeline (denovo version).
After demultiplexing, I get about 2 million single end reads per sample.
However, after removing the Illumina adapter, filtering the reads and trimming them all to the same length (as the denovo pipeline requires the reads to be at the same length), I only get an average of 50,000 reads per sample.
Is this normal for GBS derived data? Is it enough to call SNPs and to run the pipeline? I think that most of my reads are shorter than that and thats why they get filtered out, however if I set the min length < 90 bp the fastqc reports do not look too good for my samples.
Anyone has any tips or thoughts about this?
Thanks so much in advance