How To Separate Reads From Two Different Species In Exome Dataset?
3
8
Entering edit mode
12.3 years ago

I would like to know if there is any clever protocol to separate reads for an exome Illumina sequenced dataset from a sample of a heterotransplanted human tumour into an immunodeficient rodent (BALB/c train):

http://www.nature.com/nprot/journal/v2/n2/full/nprot.2007.25.html

The exome sample sequenced would contain both reads belonging to the human cancer cells sequenced that would have been enriched from surrounding mice cells due to the cross-species sequence annealing during the exome enrichment protocol.

Is there any clever way of separating the read sets from human and mouse in such a case?

exome • 3.7k views
ADD COMMENT
8
Entering edit mode
12.3 years ago
brentp 24k

If you use an aligner that supports references greater than 4GB, and that allows you to pull out uniquely mapped reads (e.g. BWA 0.6+ ) Then you could include both mouse and human genomes into a single reference FASTA file. You'd have to prefix, so that human chromosome 1 is hg19chr1 and mouse is mm9chr1.

Then, when you pull out uniquely mapped reads, you'll know which organism they came from.

You will not be able to use this to separate reads that map equally well to either reference.

ADD COMMENT
2
Entering edit mode

+David Quigley, that would only happen if a miscall happened to make a read from human more like mouse, right? BWA should still be able to find the correct mapping in human, and infer it's correct by pairing, right?

ADD REPLY
1
Entering edit mode

If you have paired data (which you probably do) you're going to get hosed when one read in the pair maps to human Chr1 and the other maps to mouse Chr2. BWA will penalize the alignment score because the apparent read gap distance is huge. Just something to keep in mind.

ADD REPLY
0
Entering edit mode

+1 for this solution, which is how we have dealt with reference-based mapping from hybrid genome sequences.

ADD REPLY
0
Entering edit mode

so if there are reads that map equally well to both species, they will share coverage between one and the other? This will probably end up in sudden drops in coverage for exons that are highly conserved between mouse and human, is that right?

ADD REPLY
0
Entering edit mode

somewhat, but those reads will still be mapped, you'll just have to pull them out another way.

ADD REPLY
2
Entering edit mode
12.3 years ago

Good and interesting question, but do you need to separate the reads based on species? Could you not succeed in the goals of exome sequencing with mouse and human reads mixed, then sorting out species based on alignments to something like RefSeq mRNAs? I would think that would all work fine.

I would filter beforehand for common repeats like the human Alu, which is known to be expressed as mRNA. Mouse B1 elements can be filtered as well.

ADD COMMENT
0
Entering edit mode

Good idea about using information from species specific TE lineages.

ADD REPLY
0
Entering edit mode
12.3 years ago

This paper might be of interest to you.

Also check out barcode of life

ADD COMMENT
0
Entering edit mode

He want's to assign all reads to either mouse or human, what you suggest is to identify a few unique sequence snippets to see which species are present (which is known in this case).

ADD REPLY
0
Entering edit mode

I think this method would probably not be that good at separating reads other than telling me that there are human and mouse reads in my dataset.

ADD REPLY

Login before adding your answer.

Traffic: 2906 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6