Question

Building A Splice Junction Library For Aligning Mrna Against The Rat Genome: Chrx_Random.Fa Problem

1

Entering edit mode

13.7 years ago

Jimbo ▴ 120

Hi

I am mapping RNAseq reads to the rat genome.

I am building

For each chromosome there is a chrX_random.fa fasta file with it.

Should I ignore these when building the splice junctions libraries? It seems no genes map to the chrX_random.fa files anyway, according to the ENSEMBL annotation i got from UCSC (though I might be wrong about this?).

I realise I should still keep them for aligning reads against.

I am following these instructions:

http://useq.sourceforge.net/usageRNASeq.html

Many thanks.

Also: I am not using e.g. tophat because my reads are 34bp, and tophat states explicitly in the manual "The software is optimized for reads 75bp or longer."

Plus I am not sure if I like the idea of tophat realigning the orginally unmapped reads in a "second round" surely this is problematic with shorter reads, since they are more likely to map ambiguously:

Wouldn't it be better to align against a genome plus junctions in the same round, to give the junctions "equal chance" of being mapped to as genomic regions, esp. with short reads which could easily map erroneously to pseudo-genes more easily than might be the case with longer reads, or paired end reads.

rna genome • 4.8k views

ADD COMMENT • link updated 11.8 years ago by Biostar 20 • written 13.7 years ago by Jimbo ▴ 120

score 1 · Answer 1 · 2012-01-31

1

Entering edit mode

13.4 years ago

Malachi Griffith 20k

Yes, for such reads it may be a good idea to use a junction database. You can download a pre-computed, exon-exon junction database here: ALEXA-seq

ADD COMMENT • link 13.4 years ago by Malachi Griffith 20k

0

Entering edit mode

I want to make an exon junction bed file for CAST mouse strain. I can't get ALEXA-seq to work because of an unknown host error with cvs to ensembl. I imagine because it's quite old, do you happen to have a new version of the alternativeExpressionDatabase/createExonJunctionDatabase.pl program that's updated? Also perl is legit :)

ADD REPLY • link 6.3 years ago by QVINTVS_FABIVS_MAXIMVS ★ 2.6k

score 0 · Answer 2 · 2012-03-29

0

Entering edit mode

13.3 years ago

Dm Church ▴ 30

In many cases, the chr*_random sequences do contain annotation. This is even true in human and mouse (two well curated, high quality assemblies). These are just sequences that can't be ordered and oriented on the assembly, but they may still contain features.

ADD COMMENT • link 13.3 years ago by Dm Church ▴ 30