Dear All,
I have been using the sequences and annotations for mouse from tophat website here and specifically the UCSC version mm10.
I see that the sequences (genome.fa) contains Chr1- Chr19, ChrX, ChrY and ChrM... while the corresponding GTF file contains features for these random chromosome sequences such as Chr4_JH584294_random.
I was trying to calculate rpkm values for my alignment (tophat bam files  that used this genome sequences) using the GTF.
 library(IRanges)
 library(Rsamtools)
 library(GenomicFeatures)
 library(GenomicRanges)
 aligns<- readBamGappedAlignments(filepath\accepted_hits.bam)`
 txdb<-makeTranscriptDbFromGFF("genes.gtf", format="gtf",species="Mus musculus",dataSource="http://tophat.cbcb.umd.edu/igenomes.shtml")
exonRanges.gene<-exonsBy(txdb,"gene")`
seqlevels(aligns)
[1] "chr1"  "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18" "chr19" "chr2"  "chr3"  "chr4"  "chr5"  "chr6" 
[17] "chr7"  "chr8"  "chr9"  "chrM"  "chrX"  "chrY" 
seqlevels(exonRanges.gene)
[1] "chr13"                "chr9"                 "chr6"                 "chrX"                 "chr17"               
[6] "chr2"                 "chr7"                 "chr18"                "chr8"                 "chr4"                
[11] "chr19"                "chr5"                 "chr16"                "chr11"                "chr10"               
[16] "chr14"                "chr1"                 "chr3"                 "chr15"                "chr12"               
[21] "chrY"                 "chrX_GL456233_random" "chr5_JH584299_random" "chr5_JH584298_random" "chr4_GL456216_random"
[26] "chr4_GL456350_random" "chr4_JH584294_random" "chr4_JH584293_random" "chr5_GL456354_random" "chr7_GL456219_random"
[31] "chr5_JH584296_random" "chr5_JH584297_random" "chr4_JH584292_random" "chr1_GL456221_random" "chrUn_JH584304"
Because they have different Chromosomes (extra chrN_XXXXX_random), I get the following error/warnings.
1: In .deduceExonRankings(exs, format = "gtf") :
  Infering Exon Rankings.  If this is not what you expected, then please be sure that you have provided a valid attribute for exonRankAttributeName
2: In matchCircularity(chroms, circ_seqs) :
  None of the strings in your circ_seqs argument match your seqnames.
3: In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': chrX_GL456233_random, chr5_JH584299_random, chr5_JH584298_random, chr4_GL456216_random, chr4_GL456350_random, chr4_JH584294_random, chr4_JH584293_random, chr5_GL456354_random, chr7_GL456219_random, chr5_JH584296_random, chr5_JH584297_random, chr4_JH584292_random, chr1_GL456221_random, chrUn_JH584304
  - in 'y': chrM
  Make sure to always combine/compare objects based on the same reference
  genome (use suppressWarnings() to suppress this warning).
I understand that this is due to conflict in Chromosomes and I can handle this by deleting the ChrN_XXXXXX_random features from .gtf file or by ` aligning to genome with these sequences. Is there any advantages or disadvantages (right/wrong) for the former or later.
Any other work around for this? additionally, I would like to know, if there is any genome/gtf build that can be downloaded which is coherent in these respects for mouse, rat and human genome. Thanks in advance!!
Thanks!! That sounds promising.. I should nevertheless try Ensembl Annotations!!