What stage of pre-processing in EBI data?
0
0
Entering edit mode
6.1 years ago
CAnna ▴ 20

Hi,

I am trying to figure out the necessary pre-processing steps before using sequencing data retreived from online databases. I work with metagenomes from Huaman gut microbiome. I figured out that the main three steps for this type of data are:

1) Identify and mask Human reads

2) Remove duplicate reads

3) Trim low quality bases

Here is an example of a study from which I would like to use data.

I can't figure out at what stage those data are. I beleive Human reads masking should have been performed already, as this has to deal with subjects privacy/ethics. But I don't find a clear information telling me that this is the case or not. Are sequencing data available on inline repositories always already cleared of Human reads already?

Thank you, Camille

pre-processing EBI • 1.1k views
ADD COMMENT
1
Entering edit mode

Assume that the provided data is raw, if there are no notes about it being processed. If all reads are the same length then it is not even scanned/trimmed.

I suggest that you use removehuman decontamination protocol using BBMap suite. Other tools in suite clumpify.sh (will help you remove dups) and bbduk.sh will help trim the data.

ADD REPLY
0
Entering edit mode

Great, Thanks for the tools recommendation!

ADD REPLY

Login before adding your answer.

Traffic: 3340 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6