How to tell from which organisms (possibly contaminated) reads come from?

0

Entering edit mode

3.7 years ago

Marcel • 0

Hello,

we recently had a sequencing experiment where a lot of "undetermined" reads (i.e. reads that could not be properly demultiplexed) were produced. I was now asked to which species these reads map. Is there an easy way to do this, other than manually trying to align the reads to all sorts of different species?

sequencing next-gen • 563 views

ADD COMMENT • link 3.7 years ago by Marcel • 0

1

Entering edit mode

extract the unmapped reads, convert to fasta and blast them against nr.

ADD REPLY • link 3.7 years ago by Pierre Lindenbaum 161k

1

Entering edit mode

You could use a program like kaiju (LINK) that is normally used for taxonomic classification with your fastq data. That said

reads that could not be properly demultiplexed

are you getting unexpected indexes? If that is the case you should investigate that rather than try to investigate the reads associated with these.

ADD REPLY • link 3.7 years ago by GenoMax 141k

0

Entering edit mode

Thanks for the suggestion. I was told that these undetermined reads correspond to reads either without a barcode at all or with a barcode with one or more mismatches.

ADD REPLY • link 3.6 years ago by Marcel • 0

1

Entering edit mode

Reads without indexes are likely phiX (which is generally spiked in as a control during sequencing). If your indexes were well designed (or you were using commercial indexes) then they should allow 1-2 errors. You can thus retrieve data that would otherwise go waste (e.g. ATCGCTA == ATCGGTA these would be considered equivalent).

ADD REPLY • link 3.6 years ago by GenoMax 141k

Login before adding your answer.