Question

Aligning reads from mouse samples that express 1 human gene

1

Entering edit mode

20 months ago

bompipi95 ▴ 160

Hi bioinformaticians!

I have a set of mouse samples genetically engineered to express a single human gene. I have performed alignment against the mouse genome with STAR, and am trying to find a way to recover those reads that were mapped to this single human gene.

My current thinking is to identify the unmapped reads from each sample and then realign them against the human reference genome, subsetted to include only the chromosomal region corresponding to this human gene of interest. I will also filter the GTF annotation file to include only the entry corresponding to this gene.

Any thoughts on the approach above and alternative suggestions are welcome!

realignment • 985 views

ADD COMMENT • link updated 20 months ago by swbarnes2 14k • written 20 months ago by bompipi95 ▴ 160

1

Entering edit mode

Hi! Your approach sounds reasonable, but if you want some alternatives, maybe worth taking a look to these previous two related posts: Extract uniquely mapped reads from one species and Tool to separate human and mouse rna seq reads

ADD REPLY • link 20 months ago by iraun 6.2k

0

Entering edit mode

Thank you for linking these helpful posts

ADD REPLY • link 20 months ago by bompipi95 ▴ 160

score 1 · Answer 1 · 2022-11-16

The problem with your approach is that given the sequence similarity between human and mouse, you will likely get reads of human origin that map to the mouse genome and vice versa. As an alternative to what's already proposed, if you know what human sequence was inserted and where it is in the mouse genome, you could edit your reference genome accordingly. This way you'd have a reference genome that match your samples genome.

score 1 · Answer 2 · 2022-11-16

You might not want to hear it, but the proper way to do this is to make a new reference of mouse + human gene, (with updated gtf) realign and recount.

Also, you might want to make sure that your method of gene counting is smart about handling reads that align to very similar things. FeatureCount, or HTSeq-count (which is what STAR uses) are not very smart. RSEM, or pseudo aligners like Kallisto or Salmon are. If you want to stick with STAR, STAR's transcriptome output is suitable for use with RSEM.