Question: Cross-species contamination in NGS data
0
gravatar for T
3.5 years ago by
T40
Germany
T40 wrote:

Dear all,

Given that in a certain sequencing study, a big majority of the reads are not mapping to the origins of interest (human / mouse & yeast), I am looking for a tool / approach check for cross-species contaminations of the reads.

A quick BLAST of some sequences revealed some bacterial RNA, but I want to classify all of the reads. Can you recommend me a tool / approach / best practice to do this high-throughput.

I have found a few online, but as far as I can see most of them are made for bacterial metagenomic studies. Probably some of you experienced users have a quickhack or a best practice.

Thank you very much.

sequencing qc • 1.3k views
ADD COMMENTlink modified 3.5 years ago by h.mon27k • written 3.5 years ago by T40

Do you have some (ideally) small sent of species you want to check? If you want to check everything then you're largely restricted to blasting a smallish number of reads.

ADD REPLYlink written 3.5 years ago by Devon Ryan92k
1
gravatar for genomax
3.5 years ago by
genomax72k
United States
genomax72k wrote:

In this case I suggest binning the reads you are interested in and separating the "others" into a different bin. Take a look at BBSplit from BBMap which would be perfect for this. If you are interested in finding out what the "other" bin contains you can do that separately later.

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by genomax72k

Thanks, this is a very nice approach to get the "unmapped" reads.

However, the question what the "others" do would be the main point of the question.

ADD REPLYlink written 3.5 years ago by T40

So you are interested in content of "other" bin? Your original question made it sound like that was "contamination" (but not in real meaning of the word then?). Like @Devon said it would depend on if you expect only a few species to be present. Otherwise you would have to blast against refseq/bacterial in order to try and identify what is there.

ADD REPLYlink written 3.5 years ago by genomax72k
1
gravatar for Daniel Swan
3.5 years ago by
Daniel Swan13k
Aberdeen, UK
Daniel Swan13k wrote:

You could try Kontaminant: https://github.com/TGAC/kontaminant

ADD COMMENTlink written 3.5 years ago by Daniel Swan13k
0
gravatar for h.mon
3.5 years ago by
h.mon27k
Brazil
h.mon27k wrote:

I would get the unmapped reads, assemble them with metaSPAdes or MEGAHIT, and use GC content, mapping coverage and Blast searches to examine the assembled contigs - a great tool for this is blobtools.

ADD COMMENTlink written 3.5 years ago by h.mon27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1874 users visited in the last hour