How to tell from which organisms (possibly contaminated) reads come from?
0
0
Entering edit mode
3.7 years ago
Marcel • 0

Hello,

we recently had a sequencing experiment where a lot of "undetermined" reads (i.e. reads that could not be properly demultiplexed) were produced. I was now asked to which species these reads map. Is there an easy way to do this, other than manually trying to align the reads to all sorts of different species?

sequencing next-gen • 563 views
ADD COMMENT
1
Entering edit mode

extract the unmapped reads, convert to fasta and blast them against nr.

ADD REPLY
1
Entering edit mode

You could use a program like kaiju (LINK) that is normally used for taxonomic classification with your fastq data. That said

reads that could not be properly demultiplexed

are you getting unexpected indexes? If that is the case you should investigate that rather than try to investigate the reads associated with these.

ADD REPLY
0
Entering edit mode

Thanks for the suggestion. I was told that these undetermined reads correspond to reads either without a barcode at all or with a barcode with one or more mismatches.

ADD REPLY
1
Entering edit mode

Reads without indexes are likely phiX (which is generally spiked in as a control during sequencing). If your indexes were well designed (or you were using commercial indexes) then they should allow 1-2 errors. You can thus retrieve data that would otherwise go waste (e.g. ATCGCTA == ATCGGTA these would be considered equivalent).

ADD REPLY

Login before adding your answer.

Traffic: 1774 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6