I am mapping RNAseq reads to the rat genome.
I am building
For each chromosome there is a chrX_random.fa fasta file with it.
Should I ignore these when building the splice junctions libraries? It seems no genes map to the chrX_random.fa files anyway, according to the ENSEMBL annotation i got from UCSC (though I might be wrong about this?).
I realise I should still keep them for aligning reads against.
I am following these instructions:
Also: I am not using e.g. tophat because my reads are 34bp, and tophat states explicitly in the manual "The software is optimized for reads 75bp or longer."
Plus I am not sure if I like the idea of tophat realigning the orginally unmapped reads in a "second round" surely this is problematic with shorter reads, since they are more likely to map ambiguously:
Wouldn't it be better to align against a genome plus junctions in the same round, to give the junctions "equal chance" of being mapped to as genomic regions, esp. with short reads which could easily map erroneously to pseudo-genes more easily than might be the case with longer reads, or paired end reads.