Question: Filtering and Trimming Metagenomic Data
gravatar for MK2000
15 months ago by
MK200030 wrote:

I am attempting to run through a whole genome shotgun metagenomics pipeline on some raw metagenomic reads from a hypersaline soil microbial community study on NCBI, and I am stuck on the initial quality control step (I have very little experience in this field). I utilized FastQC and Prinseq to get a quick picture of the quality of my data. My per-base sequence quality falls off pretty quickly into the poor region (at about position 300 in the read, with read range being 52-1780). It appears I don't have any adapter contamination or duplicated sequences. My GC content panel was another one that did fail, but there is evidence in the literature that different salt-tolerant microorganisms can have widely varying GC content, so I am hesitant to correct for that.

I am looking for advice on how to quality correct my data, or if anyone could point me to some good resources for reading more on filtering/trimming data based on quality assessment results. I am not sure how much I should be trimming my reads, and if I did how I should decide what read length should be my trimming cutoff, because I don't want biased coverage. I will be using Prinseq for this quality processing.

Thanks much!

ADD COMMENTlink modified 15 months ago by Biostar ♦♦ 20 • written 15 months ago by MK200030
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1717 users visited in the last hour