Where does my RNASeq contamination fits on the Tree of Life?
0
0
Entering edit mode
4.2 years ago

I have RNASeq of algal cultures so my samples are not really axenic. Instead, the presence of some bacteria or fungus is to be expected. I assembled my reads with Trinity and now I would like to estimate the origin of each individual contig. Ideally, I would like to get visualization of where the contamination is coming from on the Tree of Life as a:

1. quality metrics that the origin of my contamination makes sense (and I see what I expect to see for algal cultures)
2. to remove contaminants and "clean" the assembly

Is there any tool that could do this for me?

I started by automatically outputting the "best" blast hit for each contig, but I am getting large variety of the hits and I am not sure how to summarize them or properly assign them phylogenetically.

Thanks for help.

RNA-Seq rna-seq contamination species algae • 965 views
2
Entering edit mode

NCBI has a new ref_prok_rep (representative prokaryotic genomes) pre-made blast database available. Since you have assembled sequences you could do a quick blast against that to see if you can find any low hanging fruits in terms of identification.

0
Entering edit mode

Why not start with filtering out reads that can be mapped to known bacterial/fungal species?

0
Entering edit mode

Can you point me to such list/database?

0
Entering edit mode

I'm a big fan of Kraken for screening against contamination, the program assigns a taxid to each read, with a little leg work you could filter off of that. If you use the kraken-translate tool you should be able to get the whole taxonomy for each read and filter there using keywords. E.g. get a list of reads with the word "bacteria" in their kraken-translate entry, then toss all of those reads from your reads.

https://ccb.jhu.edu/software/kraken/MANUAL.html#output-format

I am honestly not sure which is better: clean before assembly or after assembly.