Question: "Un-Projecting" genome alignments onto a transcriptome
gravatar for Rob
4.9 years ago by
United States
Rob3.4k wrote:


  I'm curious to know if there is a standard tool or method to perform the following function:

  Given some set of genomic alignments (e.g. a set of .bam files aligning to hg19), generate alignments to a set of transcripts from this genome represented by e.g. a GTF file.

So, I've seen people talk about doing the opposite before; going from alignments to the transcriptome and projecting them back to the genomic coordinates, but I want to go the other way --- a sort of "un-projection".  Particularly (and this is key), alignments to a single genomic origin that correspond to multiple isoforms of a gene should generate multiple, output alignments.

Does anyone know of any software that would allow me to perform such processing?

rna-seq alignment genome • 1.7k views
ADD COMMENTlink modified 4.8 years ago by Biostar ♦♦ 20 • written 4.9 years ago by Rob3.4k

Do you actually want the transcriptome coordinates or do you just want counts of things? The latter is more common since the former tends to not be useful.

ADD REPLYlink written 4.9 years ago by Devon Ryan91k

Hi Devon,

  I actually would like the transcriptome coordinates.  Literally, I want to project the genomic alignments onto all annotated transcriptomes.  I realize this makes the problem more burdensome, which is why I came here to see if anyone has attempted something similar.

ADD REPLYlink written 4.9 years ago by Rob3.4k

I'd be surprised if there's not something prewritten to do this, but I'm not personally aware of it. If you've not found anything then you could always write something up. Using Rsamtools and GenomicFeatures should make this an easy enough thing to code (yes, that will be a bit slow).

ADD REPLYlink written 4.9 years ago by Devon Ryan91k

what is the output? - a SAM record or otherwise reasonably complete alignment to the transcript?

ADD REPLYlink written 4.9 years ago by Istvan Albert ♦♦ 81k

Yes.  The tool I'd imagine would look something like this.

Input: GTF file describing potential target transcripts, BAM/SAM alignment to the genome.

Output: SAM/BAM alignment to the target transcripts identified in the GTF file, where genomic alignments have been "expanded" to all of the transcripts they cover (i.e. a read may be unique in genomic location, but map to potentially many transcript --- all of these alignments should be output).

Like I said before --- I know of tools for going the other way, but not for going from genome -> transcriptome.

ADD REPLYlink written 4.9 years ago by Rob3.4k

Interesting concept, I don't know of a tool that does this but it feels quite useful and possibly not that complicated (though I might not fully understand all the implications). 

Wouldn't it be a matter of just shifting coordinates by a translation, the POS field -> Alignment POS - Each transcript's leftmost POS -> New POS, the CIGAR is already relative to the alignment.

ADD REPLYlink written 4.9 years ago by Istvan Albert ♦♦ 81k

Well, I agree that it's not that complicated, conceptually (though I see it taking a little time to round out all the rough edges).  The motivation (mine at least) would be to be able to use existing alignments to a genome with RNA-seq quantification tools like RSEM, eXpress and (my new tool) Salmon, that work based off of alignments relative to a transcriptome.

ADD REPLYlink written 4.9 years ago by Rob3.4k

Aha, now I get the rationale, not having to realign the sequences would  indeed make it a whole lot easier to evaluate another transcript base methodology and would head off the criticism of not using the whole genome.

I think just the conversion tool on its own would be a quite the helpful tool in our arsenal!

ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by Istvan Albert ♦♦ 81k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1070 users visited in the last hour