9.0 years ago by
Let me rephrase your question a bit for the sake of making it comprehensible: you took human tumor cells, implanted them into lab mice, let the tumors grow, harvested and performed next gen exome sequencing to detect variations. Now, your reads are understandably contaminated with DNA from mice. What to do? Unfortunately, I didn't have the 'luck' to do get such contaminated data, so here is what I would do, given the theoretical possibility of being asked to analyse such a data set, and given I wanted to publish the results:
The best way of removing
contamination is to avoid it in the first place (if
I don't believe there is any secure way to remove contamination especially of highly similar sequences. To salvage this case I would try to apply rigorous filtering:
- Align the reads against the mouse and human genome
- remove those reads that align better or as well to mouse as to human reference genome
- check the alignment positions, discard all reads that align to non-exonic, intergenic regions, they should not be there anyway
- run snp detection, I don't think copy number variation detection is feasible
- after detecting a snp, align the genomic sequence flanking it's position against mouse using eg FASTA or SSearch. If mouse sequence is highly similar don't report it.
That way you will possibly be quite specific, the question is, if you will have many reads left.