Question: How To Separate Reads From Two Different Species In Exome Dataset?
8
gravatar for 2184687-1231-83-
6.5 years ago by
2184687-1231-83-4.9k wrote:

I would like to know if there is any clever protocol to separate reads for an exome Illumina sequenced dataset from a sample of a heterotransplanted human tumour into an immunodeficient rodent (BALB/c train):

http://www.nature.com/nprot/journal/v2/n2/full/nprot.2007.25.html

The exome sample sequenced would contain both reads belonging to the human cancer cells sequenced that would have been enriched from surrounding mice cells due to the cross-species sequence annealing during the exome enrichment protocol.

Is there any clever way of separating the read sets from human and mouse in such a case?

exome • 2.0k views
ADD COMMENTlink modified 18 months ago by Biostar ♦♦ 20 • written 6.5 years ago by 2184687-1231-83-4.9k
8
gravatar for brentp
6.5 years ago by
brentp22k
Salt Lake City, UT
brentp22k wrote:

If you use an aligner that supports references greater than 4GB, and that allows you to pull out uniquely mapped reads (e.g. BWA 0.6+ ) Then you could include both mouse and human genomes into a single reference FASTA file. You'd have to prefix, so that human chromosome 1 is hg19chr1 and mouse is mm9chr1.

Then, when you pull out uniquely mapped reads, you'll know which organism they came from.

You will not be able to use this to separate reads that map equally well to either reference.

ADD COMMENTlink written 6.5 years ago by brentp22k
2

+David Quigley, that would only happen if a miscall happened to make a read from human more like mouse, right? BWA should still be able to find the correct mapping in human, and infer it's correct by pairing, right?

ADD REPLYlink written 6.5 years ago by brentp22k
1

If you have paired data (which you probably do) you're going to get hosed when one read in the pair maps to human Chr1 and the other maps to mouse Chr2. BWA will penalize the alignment score because the apparent read gap distance is huge. Just something to keep in mind.

ADD REPLYlink written 6.5 years ago by David Quigley11k

+1 for this solution, which is how we have dealt with reference-based mapping from hybrid genome sequences.

ADD REPLYlink written 6.5 years ago by Casey Bergman17k

so if there are reads that map equally well to both species, they will share coverage between one and the other? This will probably end up in sudden drops in coverage for exons that are highly conserved between mouse and human, is that right?

ADD REPLYlink written 6.5 years ago by 2184687-1231-83-4.9k

somewhat, but those reads will still be mapped, you'll just have to pull them out another way.

ADD REPLYlink written 6.5 years ago by brentp22k
2
gravatar for Larry_Parnell
6.5 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

Good and interesting question, but do you need to separate the reads based on species? Could you not succeed in the goals of exome sequencing with mouse and human reads mixed, then sorting out species based on alignments to something like RefSeq mRNAs? I would think that would all work fine.

I would filter beforehand for common repeats like the human Alu, which is known to be expressed as mRNA. Mouse B1 elements can be filtered as well.

ADD COMMENTlink written 6.5 years ago by Larry_Parnell16k

Good idea about using information from species specific TE lineages.

ADD REPLYlink written 6.5 years ago by Casey Bergman17k
0
gravatar for Damian Kao
6.5 years ago by
Damian Kao14k
USA
Damian Kao14k wrote:

This paper might be of interest to you.

Also check out barcode of life

ADD COMMENTlink written 6.5 years ago by Damian Kao14k

He want's to assign all reads to either mouse or human, what you suggest is to identify a few unique sequence snippets to see which species are present (which is known in this case).

ADD REPLYlink written 6.5 years ago by Michael Kuhn4.9k

I think this method would probably not be that good at separating reads other than telling me that there are human and mouse reads in my dataset.

ADD REPLYlink written 6.5 years ago by 2184687-1231-83-4.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 944 users visited in the last hour