Question: RNASeq with mixed tissues
0
gravatar for ddzhangzz
3.0 years ago by
ddzhangzz90
United States
ddzhangzz90 wrote:

I got some RNASeq fastq data from a customer, and he told me the samples were mainly from human cell lines but may have some contamination with mouse cells. My question is whether I should align those sequences against both human genome reference and mouse genome reference or just humna's. Any suggestions?

rna-seq • 1.2k views
ADD COMMENTlink modified 3.0 years ago by Manuel Landesfeind1.2k • written 3.0 years ago by ddzhangzz90
3
gravatar for informatics bot
3.0 years ago by
United States
informatics bot560 wrote:
  1. First align all the samples to the human genome.
  2. Then align the un-mapped reads to the mouse genome.

If you get a large portion of (un-mapped) reads mapping to mouse, then it's very likely the sample was contaminated.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by informatics bot560

Thanks @Lando Ringel. One problem could be that (maybe very likely) a sequence was actually from mouse but it can be mapped to both human and mouse.

ADD REPLYlink written 3.0 years ago by ddzhangzz90

That is true, but many of the mouse reads will remain un-mapped, you can use BLAST (or SNAP) to look at the unmapped reads more closely (i.e. determine which organism they belong to).

Do you plan on trying to using the contaminated samples? I personally would advise against that.

ADD REPLYlink written 3.0 years ago by informatics bot560

In this setting wouldn't t make more sense to align against a conjoined human/mouse reference, or to separately align to both human and mouse and select the species origin of the reads based on the quality of alignment in sp1 vs sp2

ADD REPLYlink written 3.0 years ago by russhh4.2k
3
gravatar for Devon Ryan
3.0 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

First subset the files (seqtk) and then use fastq_screen to get an idea what the contamination rate is. I've found it useful to only pay close attention to the "single alignment in a single organism" (or whatever that's called) category, since the others are more an indicator of sequence complexity. I happen to do this with all sequencing runs produced at our institute, since it immediately allows us to flag problematic samples (anything over 0.5% off-species unique alignment is a problem).

Ideally you won't have much contamination and if you do you can just exclude the sample. If you can't exclude the sample, then you'll need to simultaneously align to both genomes (get one from Ensembl and the other from UCSC, so the chromosome names differ, and then concatenate them). Align against the concatenated genome and then extract only the human reads with some meaningful MAPQ threshold. One can get more elegant with this, but that should suffice 99.9% of the time.

ADD COMMENTlink written 3.0 years ago by Devon Ryan88k
1
gravatar for genomax
3.0 years ago by
genomax63k
United States
genomax63k wrote:

BBSplit from BBMap has been designed to address this kind of a situation for binning reads (to best extent they can be assigned by alignment). It is a one step process.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by genomax63k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2128 users visited in the last hour