Question: What stage of pre-processing in EBI data?
gravatar for CAnna
22 months ago by
CAnna20 wrote:


I am trying to figure out the necessary pre-processing steps before using sequencing data retreived from online databases. I work with metagenomes from Huaman gut microbiome. I figured out that the main three steps for this type of data are:

1) Identify and mask Human reads

2) Remove duplicate reads

3) Trim low quality bases

Here is an example of a study from which I would like to use data.

I can't figure out at what stage those data are. I beleive Human reads masking should have been performed already, as this has to deal with subjects privacy/ethics. But I don't find a clear information telling me that this is the case or not. Are sequencing data available on inline repositories always already cleared of Human reads already?

Thank you, Camille

ADD COMMENTlink written 22 months ago by CAnna20

Assume that the provided data is raw, if there are no notes about it being processed. If all reads are the same length then it is not even scanned/trimmed.

I suggest that you use removehuman decontamination protocol using BBMap suite. Other tools in suite (will help you remove dups) and will help trim the data.

ADD REPLYlink written 22 months ago by genomax78k

Great, Thanks for the tools recommendation!

ADD REPLYlink written 22 months ago by CAnna20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2329 users visited in the last hour