"Un-Projecting" genome alignments onto a transcriptome
0
1
Entering edit mode
9.7 years ago
Rob 6.7k

Hi,

I'm curious to know if there is a standard tool or method to perform the following function:

Given some set of genomic alignments (e.g. a set of .bam files aligning to hg19), generate alignments to a set of transcripts from this genome represented by e.g. a GTF file.

So, I've seen people talk about doing the opposite before; going from alignments to the transcriptome and projecting them back to the genomic coordinates, but I want to go the other way --- a sort of "un-projection". Particularly (and this is key), alignments to a single genomic origin that correspond to multiple isoforms of a gene should generate multiple, output alignments.

Does anyone know of any software that would allow me to perform such processing?

RNA-Seq alignment genome • 2.6k views
ADD COMMENT
0
Entering edit mode

Do you actually want the transcriptome coordinates or do you just want counts of things? The latter is more common since the former tends to not be useful.

ADD REPLY
0
Entering edit mode

Hi Devon,

I actually would like the transcriptome coordinates. Literally, I want to project the genomic alignments onto all annotated transcriptomes. I realize this makes the problem more burdensome, which is why I came here to see if anyone has attempted something similar.

ADD REPLY
0
Entering edit mode

I'd be surprised if there's not something prewritten to do this, but I'm not personally aware of it. If you've not found anything then you could always write something up. Using Rsamtools and GenomicFeatures should make this an easy enough thing to code (yes, that will be a bit slow).

ADD REPLY
0
Entering edit mode

What is the output? - a SAM record or otherwise reasonably complete alignment to the transcript?

ADD REPLY
0
Entering edit mode

Yes. The tool I'd imagine would look something like this.

Input: GTF file describing potential target transcripts, BAM/SAM alignment to the genome.

Output: SAM/BAM alignment to the target transcripts identified in the GTF file, where genomic alignments have been "expanded" to all of the transcripts they cover (i.e. a read may be unique in genomic location, but map to potentially many transcript --- all of these alignments should be output).

Like I said before --- I know of tools for going the other way, but not for going from genome -> transcriptome.

ADD REPLY
0
Entering edit mode

Interesting concept, I don't know of a tool that does this but it feels quite useful and possibly not that complicated (though I might not fully understand all the implications).

Wouldn't it be a matter of just shifting coordinates by a translation, the POS field -> Alignment POS - Each transcript's leftmost POS -> New POS, the CIGAR is already relative to the alignment.

ADD REPLY
0
Entering edit mode

Well, I agree that it's not that complicated, conceptually (though I see it taking a little time to round out all the rough edges). The motivation (mine at least) would be to be able to use existing alignments to a genome with RNA-seq quantification tools like RSEM, eXpress and (my new tool) Salmon, that work based off of alignments relative to a transcriptome.

ADD REPLY
0
Entering edit mode

Aha, now I get the rationale, not having to realign the sequences would indeed make it a whole lot easier to evaluate another transcript base methodology and would head off the criticism of not using the whole genome.

I think just the conversion tool on its own would be a quite the helpful tool in our arsenal!

ADD REPLY

Login before adding your answer.

Traffic: 1904 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6