Hi, I am analyzing genomic (RNA-seq) data from Patient Derived Xenograft tumor samples where cancer patient tumors are transplanted and grown in a mouse, harvested, and then extracted DNA and RNA is sequenced. I have never done this before and wondering if human or mouse reference genome should be used. I would guess the human reference genome should be used for alignment but is it possible that there would be some mixed in mouse cells? Thanks, - Pankaj
First, use some wet lab technology to filter out mouse cells prior to DNA/RNA extraction (just google "mouse cell depletion"). Then, in Bioinformatics analysis, map all reads to human and mouse genome/transcriptome in parallel and independently. Reads that map better to mouse are discarded from the analysis and only "human reads" are kept.
As poisonAlien mentioned, some tools exist to remove contamination but you have to carefully evaluate them. For example, some do this based on k-mer counting and can be too sensitive or too specific. We found, that the above described approach gives best results even though it is computationally demanding. In particular, if you want to call variants, this approach removes fewest reads but gives high sensitivity.
My company is specialized in xenograft research, so feel free to contact me for more information.
Hi, you would use human reference for alignment of course.
There are some methods to remove possible contaminated reads originating from mouse cells. You can use uniquely mapped reads with high mapping quality. There is also a tool available to separate reads especially from xenograft samples.