Can anyone give advice/opinions on my data quality? I have 64 samples from a single plant species of genome size 270Mb, paired-end RAD-sequenced with TaqI. The files I’ve been given are demultiplexed and range in size from 3.6Gb to 0.06Gb of 150bp reads. With, most worryingly to my eyes, 60% of files being at least an order of magnitude smaller than the largest file. TotalReads(M) ranges from 9.69 to 0.01.

I have been using Stacks, processing with process_radtags (-e taqI -r -c –q), file sizes drop to 0.35Gb – 0.007Gb. After further processing with denovo_map.pl (several param settings: -m 3 and -n/–M: 2/2, 4/4, 5/5, 5/3 8/8) I get 15000 - 20000 loci, and subsequently populations (-r 0.7; I don’t even set --max-obs-het) I get 100 - 200 loci (far too few!).

Can anyone suggest any way to improve the numbers of generated loci or offer any thoughts on whether sequencing protocols may have caused an issue?

