Question

anyway to save contaminated samples?

0

Entering edit mode

3.2 years ago

demoraesdiogo2017 ▴ 100

Hello

I aligned a set of reads with C elegans genome. The alignment scores were around 80%, except for two samples, which hit 40%. I blasted the unaligned reads and it seems to come from drosophila (which we have no idea why). I aligned the samples again, this time for drosophila, and those 2 samples got a score of around 40% as well. Because the sample size is small I have been considering discarding the unmapped reads instead of discarding the whole sample. I assume a normalization like TMM could reduce the possible noise caused by the reduced counts and if the PCA clusters make sense, I would use the data in downstream analysis. Any opinions on this? Should I just discard those samples?

RNA-Seq • 604 views

ADD COMMENT • link 3.2 years ago by demoraesdiogo2017 ▴ 100

1

Entering edit mode

I would be very skeptical of the reads unless you figure out why there was so much contamination from an exogenous organism. Was your sequencing run shared with anyone else? Perhaps there could have been a mixup with barcodes or something.

ADD REPLY • link 3.2 years ago by rpolicastro 13k

0

Entering edit mode

We do have drosophila samples we sent to the same place for sequencing, so mislabeling was the initial suspicion. So I tried aligning the samples with drosophila genome, the clean samples had less than 1% of alignment and the dirty samples had ~40%. I also aligned the drosophila samples with c elegans genome and it was less than 1% too, with ~ 94% of alignment with drosophila. Very impressive from HISAT2 I guess. So I think is more likely samples got actually mixed somehow.

ADD REPLY • link 3.2 years ago by demoraesdiogo2017 ▴ 100

0

Entering edit mode

Sounds like there was definitely some sample mixup or problem somewhere. I wouldn't be confident in the reads you did recover from the samples, because there is no guarantee the labels are correct for those.

Your best bet would be to talk to the sequencing provider and also go back and see if there was any problem during sample collection.

ADD REPLY • link 3.2 years ago by rpolicastro 13k

1

Entering edit mode

If you are sure that your original samples were actually from C. elegans then you can ask your sequencing provider to re-make and resequence the libraries. Or at least check to make sure nothing amiss happened on their end.

ADD REPLY • link 3.2 years ago by GenoMax 141k

0

Entering edit mode

I guess one possibility, if you are sure you have C. elegans and drosophila would be to combine the reference genomes you are aligning against, then align all the reads to this 'hybrid' and see if they partition between the two samples. The alignment score as you have already looked at should improve as an indicator. I would consider that the safest way of being able to use the reads. Then it depends what analysis is planned downstream..

ADD REPLY • link 3.2 years ago by samuel.a.odonnell ▴ 520