Full Exome Sequencing Of Xenografted Tumor
3
4
Entering edit mode
13.1 years ago
Orca ▴ 140

I have to analyze NGS data after a targeted enrichment (sure select Agilent) of a xenografted tumor. We know that there is a contamination of murin stroma around 20%. How to manage this issue to be sure that the mutation annoted are human specific? Thanks,

Orc@

next-gen sequencing exome • 3.4k views
ADD COMMENT
3
Entering edit mode
13.1 years ago
Michael 54k

Let me rephrase your question a bit for the sake of making it comprehensible: you took human tumor cells, implanted them into lab mice, let the tumors grow, harvested and performed next gen exome sequencing to detect variations. Now, your reads are understandably contaminated with DNA from mice. What to do? Unfortunately, I didn't have the 'luck' to do get such contaminated data, so here is what I would do, given the theoretical possibility of being asked to analyse such a data set, and given I wanted to publish the results:

The best way of removing contamination is to avoid it in the first place (if possible) I don't believe there is any secure way to remove contamination especially of highly similar sequences. To salvage this case I would try to apply rigorous filtering:

  • Align the reads against the mouse and human genome
  • remove those reads that align better or as well to mouse as to human reference genome
  • check the alignment positions, discard all reads that align to non-exonic, intergenic regions, they should not be there anyway
  • run snp detection, I don't think copy number variation detection is feasible
  • after detecting a snp, align the genomic sequence flanking it's position against mouse using eg FASTA or SSearch. If mouse sequence is highly similar don't report it.

That way you will possibly be quite specific, the question is, if you will have many reads left.

ADD COMMENT
2
Entering edit mode
13.1 years ago
Darked89 4.6k

Humans are quite different on the nucleotide level from mouse (just blastn NM_000546.4 Homo sapiens tumor protein p53 (TP53), transcript variant 1, mRNA against mouse ref-seq). Even with 80% similarity there are hardly any 60bp long identical fragments. So if your read length is long enough then there is no chance that tumor will mutate giving you an exact mouse sequence. And we are talking here just exons, but (correct me if I am wrong) you should get some "dangling" intronic sequences flanking exons as well. Unless you land in some very peculiar parts of the genome, similarity drops there, so no cross-mapping of such reads.

ADD COMMENT
0
Entering edit mode
13.1 years ago
Lythimus ▴ 210

I've been working on this problem and am searching for data sets such as yours to test it out on in which there is a known degree of contamination. Take a look and contact me if you are interested in pursuing: https://github.com/Lythimus/PARSES/

ADD COMMENT

Login before adding your answer.

Traffic: 2934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6