Question

Finding an acceptable min/max read count in two groups samples

0

Entering edit mode

2.1 years ago

jobie1 ▴ 30

I have 15 seabird gut fecal samples, and 30 sediment samples that were sequenced using an Illumina MiSeq (300+300 bp PE). The files I am working with are in demultiplexed FASTQ format, and we expected to get MAXIMUM 50K reads per sample. When I ran a MultiQC I noticed the read counts for many samples were very low, and some samples had much higher read counts than we expected. The samples were isolated using a commercially available kit, and 16S-V4V5 primers were used for sequencing. Here is a short breakdown:

4/15 birds received read counts of: ~2000, 18000, 32000 and 16000.
All the other birds received ~100K reads and 1 bird recieved ~1M reads??

.

Only 2 sediments samples reached over 50K reads.
Sediments had an average read count of ~27 000.
Most sediment samples only reached around the 3000 read count mark.

Essentially the samples either went way over or stayed well under max read count, I am wondering why the read count of some samples went way over the expected max, while others are dwindling at such a low number?

It seems like the sediment samples I am working with did particularly bad. Some things to note:

The overall quality check shows that many of the samples have a higher number of duplicated reads compared to unique reads
Overall the phred score of all the samples stays well above 20

My real question is if there's an accepted threshold for read count that I can go by to do my analysis - I realize a lot of my sediment samples will have to be tossed but I am wondering what is an acceptable minimum/maximum number of reads to consider if I wanted to continue my analysis with some of the samples? I am using the samples in the Qiime2 workflow.

Thanks!

v4v5 amplicon 16s readcount Qiime2 • 273 views

ADD COMMENT • link 2.1 years ago by jobie1 ▴ 30