Building A Splice Junction Library For Aligning Mrna Against The Rat Genome: Chrx_Random.Fa Problem
2
1
Entering edit mode
12.5 years ago
Jimbo ▴ 120

Hi

I am mapping RNAseq reads to the rat genome.

I am building

For each chromosome there is a chrX_random.fa fasta file with it.

Should I ignore these when building the splice junctions libraries? It seems no genes map to the chrX_random.fa files anyway, according to the ENSEMBL annotation i got from UCSC (though I might be wrong about this?).

I realise I should still keep them for aligning reads against.

I am following these instructions:

http://useq.sourceforge.net/usageRNASeq.html

Many thanks.

Also: I am not using e.g. tophat because my reads are 34bp, and tophat states explicitly in the manual "The software is optimized for reads 75bp or longer."

Plus I am not sure if I like the idea of tophat realigning the orginally unmapped reads in a "second round" surely this is problematic with shorter reads, since they are more likely to map ambiguously:

Wouldn't it be better to align against a genome plus junctions in the same round, to give the junctions "equal chance" of being mapped to as genomic regions, esp. with short reads which could easily map erroneously to pseudo-genes more easily than might be the case with longer reads, or paired end reads.

rna genome • 4.4k views
ADD COMMENT
1
Entering edit mode
12.2 years ago

Yes, for such reads it may be a good idea to use a junction database. You can download a pre-computed, exon-exon junction database here: ALEXA-seq

ADD COMMENT
0
Entering edit mode

I want to make an exon junction bed file for CAST mouse strain. I can't get ALEXA-seq to work because of an unknown host error with cvs to ensembl. I imagine because it's quite old, do you happen to have a new version of the alternativeExpressionDatabase/createExonJunctionDatabase.pl program that's updated? Also perl is legit :)

ADD REPLY
0
Entering edit mode
12.1 years ago
Dm Church ▴ 30

In many cases, the chr*_random sequences do contain annotation. This is even true in human and mouse (two well curated, high quality assemblies). These are just sequences that can't be ordered and oriented on the assembly, but they may still contain features.

ADD COMMENT

Login before adding your answer.

Traffic: 1907 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6