Filtering and Trimming Metagenomic Data
Entering edit mode
3.0 years ago
MK2000 ▴ 30

I am attempting to run through a whole genome shotgun metagenomics pipeline on some raw metagenomic reads from a hypersaline soil microbial community study on NCBI, and I am stuck on the initial quality control step (I have very little experience in this field). I utilized FastQC and Prinseq to get a quick picture of the quality of my data. My per-base sequence quality falls off pretty quickly into the poor region (at about position 300 in the read, with read range being 52-1780). It appears I don't have any adapter contamination or duplicated sequences. My GC content panel was another one that did fail, but there is evidence in the literature that different salt-tolerant microorganisms can have widely varying GC content, so I am hesitant to correct for that.

I am looking for advice on how to quality correct my data, or if anyone could point me to some good resources for reading more on filtering/trimming data based on quality assessment results. I am not sure how much I should be trimming my reads, and if I did how I should decide what read length should be my trimming cutoff, because I don't want biased coverage. I will be using Prinseq for this quality processing.

Thanks much!

metagenomics microbial prinseq fastqc • 1.0k views

Login before adding your answer.

Traffic: 2386 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6