Question: Filtering and Trimming Metagenomic Data
gravatar for MK2000
3 months ago by
MK200030 wrote:

I am attempting to run through a whole genome shotgun metagenomics pipeline on some raw metagenomic reads from a hypersaline soil microbial community study on NCBI, and I am stuck on the initial quality control step (I have very little experience in this field). I utilized FastQC and Prinseq to get a quick picture of the quality of my data. My per-base sequence quality falls off pretty quickly into the poor region (at about position 300 in the read, with read range being 52-1780). It appears I don't have any adapter contamination or duplicated sequences. My GC content panel was another one that did fail, but there is evidence in the literature that different salt-tolerant microorganisms can have widely varying GC content, so I am hesitant to correct for that.

I am looking for advice on how to quality correct my data, or if anyone could point me to some good resources for reading more on filtering/trimming data based on quality assessment results. I am not sure how much I should be trimming my reads, and if I did how I should decide what read length should be my trimming cutoff, because I don't want biased coverage. I will be using Prinseq for this quality processing.

Thanks much!

ADD COMMENTlink modified 11 weeks ago by Biostar ♦♦ 20 • written 3 months ago by MK200030
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1148 users visited in the last hour