Decontaminating plant genome assembly
0
0
Entering edit mode
22 months ago
memy ▴ 20

When non-target sequence is detected inside scaffolds of plant genome (not at the start or end of the scaffolds) after NCBI submission quality check or local contaminant checking steps how do you deal to decontaminate your assembly?

In my case, most contaminants are ~38 bases and are fragments of Staphylococcus aureus YfhO family protein (NGB00725.1) and M42 family metallopeptidase (NGB00843.1)

Here is frequency of plot of the contaminant sequences.

enter image description here

Appreciate your thoughts on whether to split the scaffolds at the contaminant regions or exclude them. Do the approach to deal with decontaminating the assembly would depend on the sequencing platform on which the data is generated and the genome assembly method used?

Thanks,

decontaminate contaminants Genome assembly genome • 1.1k views
ADD COMMENT
2
Entering edit mode

This is really weird. I've seen some bacterial 16S in plant genomes but the distribution of length is weird. I would go to the raw reads and search for these sequences there to see where they came from, if the entire reads are contaminated and contain bacterial genes or just this region etc.

ADD REPLY
0
Entering edit mode

That would I think only help to trace back those non-target sequences in the raw reads. How would this help to decide the cleaning up options? The assembly needs to be decontaminated for submission.

ADD REPLY
2
Entering edit mode

What Asaf means is to

  • align all reads to the contaminant
  • then exclude these reads - eg samtools bamtofastq or whatever
  • then reassemble
ADD REPLY
1
Entering edit mode

Yes and no. You might find out that the contamination is greater than the algorithm predicted and then just masking those 38-mers won't be sufficient.

ADD REPLY
0
Entering edit mode

Thanks, colindaven I haven't looked at it that way, and could be one way to clean them up from raw reads and then eassembly. But I will not be able to do that for various and instead try to work out a plausible way to deal with the assembly at hand.

ADD REPLY
0
Entering edit mode

You could rather brutally mask the contaminants with Ns ... :

https://github.com/colindaven/blacklister

ADD REPLY
0
Entering edit mode

It does mask out the contaminants but I am not getting the ground for replacing contaminant sequences with gaps of Ns ?

ADD REPLY
0

Login before adding your answer.

Traffic: 2049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6