Filtering and Trimming Metagenomic Data
0
2
Entering edit mode
3.0 years ago
MK2000 ▴ 30

I am attempting to run through a whole genome shotgun metagenomics pipeline on some raw metagenomic reads from a hypersaline soil microbial community study on NCBI, and I am stuck on the initial quality control step (I have very little experience in this field). I utilized FastQC and Prinseq to get a quick picture of the quality of my data. My per-base sequence quality falls off pretty quickly into the poor region (at about position 300 in the read, with read range being 52-1780). It appears I don't have any adapter contamination or duplicated sequences. My GC content panel was another one that did fail, but there is evidence in the literature that different salt-tolerant microorganisms can have widely varying GC content, so I am hesitant to correct for that.

I am looking for advice on how to quality correct my data, or if anyone could point me to some good resources for reading more on filtering/trimming data based on quality assessment results. I am not sure how much I should be trimming my reads, and if I did how I should decide what read length should be my trimming cutoff, because I don't want biased coverage. I will be using Prinseq for this quality processing.

Thanks much!

metagenomics microbial prinseq fastqc • 1.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 2386 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6