Question: Proper Way To Map Rna-Seq Data Against A Single (Or Small) Number Of Genes
gravatar for Jason
5.8 years ago by
United States
Jason60 wrote:

I have a large Illumina RNA-Seq dataset, and I have already mapped it to the reference genome using STAR and done quantification. But now I want to look at expression of GFP which is not native to the species (as this is a transgenic mouse).

I imagine the 'proper' way to do this is to create a new reference genome with the GFP gene added as an extra chromosome. But this would then require a lot of duplicated work, space, and time.

What I tried to do is create a new reference index with the single GFP gene, and then align against that, but STAR creates a 1.5GB index for this single gene, and what if I want to do this with more genes? This seems to using STAR outside the type of work it was originally designed for. Or is this in fact the correct approach?


Am I missing anything obvious here, like using BLAST or BLAT (I don't have any experience with these older tools)? Thanks.

rnaseq gene alignment mapping • 5.4k views
ADD COMMENTlink modified 4.3 years ago by cpad011212k • written 5.8 years ago by Jason60

Is GFP fused to something or is it being expressed by itself? You might just try bowtie2 or bwa, which should have smaller indexes and be fast enough for your purposes.

BTW, do you have the unmapped reads (this is an option for STAR)?

ADD REPLYlink modified 3 months ago by RamRS25k • written 5.8 years ago by Devon Ryan93k

Expressed by itself. Does that make a difference? And no, I didn't save the unmapped reads from the original mapping.

ADD REPLYlink written 5.8 years ago by Jason60

Only in that if it were fused to something else then you might get somewhat better results by putting the fusion protein in. Otherwise, no, that doesn't matter too much. Too bad you didn't save the unmapped reads, that would have made life simple :)

ADD REPLYlink written 5.8 years ago by Devon Ryan93k

Wouldn't that affect the alignment rate, so the counts from the native genes wouldn't be comparable to the GFP counts?

ADD REPLYlink written 4.5 years ago by igor9.1k

Why All The Capitals haha ;-) ?

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Irsan7.1k
gravatar for seidel
5.8 years ago by
United States
seidel6.9k wrote:

I've done this with bowtie to count GFP or the ERCC spike-in controls. A bowtie index of GFP and a few other genes came out to 4 MB.

ADD COMMENTlink written 5.8 years ago by seidel6.9k

I didn't think the indices would be so much smaller, but I guess the burroughs-wheeler transform of a small sequence is itself small (unlike the seed hash tables of STAR). Thanks!

ADD REPLYlink written 5.8 years ago by Jason60
gravatar for mathieu.bahin
4.3 years ago by
mathieu.bahin50 wrote:


I have a similar question, I have a TE fasta file (that I got from bedtools) looking like that:



How can I index this 'genome' with STAR?

I would like to map reads on that. The TEs are in the original complete fasta file, maybe finding them out after mapping on the whole genome is a better way?




ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by mathieu.bahin50

Please post things like this as new questions.

I would recommend that you do the following:

  1. Delete the TE fasta file, you don't need it.
  2. Align against the whole genome.
  3. Use the BED file that you used with BEDtools to subset the alignments according to whether the overlap one of your TEs.

Doing it that way will produce fewer false positives and a higher overall alignment rate.

ADD REPLYlink written 4.3 years ago by Devon Ryan93k
gravatar for cpad0112
4.3 years ago by
cpad011212k wrote:

My understanding is that RNAstar indexing allows multiple fasta files being indexed in genome dir. Probably you can keep both the host genome and GFP as individual fasta (and corresponding gtf) files in genome dir and index them.  Check if STAR uses GFP reference.

ADD COMMENTlink written 4.3 years ago by cpad011212k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 758 users visited in the last hour