I am attempting to use MapSplice2 for circRNA detection in mice. I downloaded the dna sequences for each chromosome from Ensembl and built my Bowtie index from the Ensembl primary assembly file. I receive this Error: Reference name in Bowtie Index contains space: '1 dna:chromosome chromosome:GRCm38:1:1:195471971:1 REF' contains space
So, I used sed to replace all space characters " " with underscores "_" for the headers in the chromosome fasta files. I do the same for the primary fasta assembly. I then rebuilt my Bowtie indices with the space-free fasta files and receive this Error: Bowtie Index not consistent with Reference Sequence '1_dna:chromosome_chromosome:GRCm38:1:1:195471971:1_REF' does not exist in Reference Sequence Error: Bowtie Index not consistent with Reference Sequence '2_dna:chromosome_chromosome:GRCm38:2:1:182113224:1_REF' does not exist in Reference Sequence Error: Bowtie Index not consistent with Reference Sequence '3_dna:chromosome_chromosome:GRCm38:3:1:160039680:1_REF' does not exist in Reference Sequence Error: Bowtie Index not consistent with Reference Sequence '4_dna:chromosome_chromosome:GRCm38:4:1:156508116:1_REF' does not exist in Reference Sequence
So I go to my reference sequence to look at the headers. Here they are: ==> 1.fasta <==
==> 2.fasta <==
==> 3.fasta <==
==> 4.fasta <==
So, I've proceeded in a straightforward manner trying to run MapSplice. My Bowtie indices are created from the primary assembly file from Ensembl. The chromosomal reference sequences are also downloaded from Ensembl. This resulted in the 1st error running MapSplice, above. So I edit all of the headers in the chromosomal sequences and in the primary assembly to replace spaces with underscores, then I build Bowtie indices from the edited primary assembly reference sequence. That leads to the second error.
MapSplice2 detailed manual has been down for days. Is there a problem in the MapSplice code when it checks for similarity between Bowtie indices and reference sequences? In both cases I am using Bowtie indices created from the reference that I use in my MapSplice command.
Is there an example of what they mean by "Bowtie Index not consistent with Reference Sequence"? My indices are literally directly built using the bowtie-build command on the primary assembly version, which is a concatenated file featuring all chromosomes and scaffolds. Do I need to run bowtie-build on each chromosome's sequence, independently?
Here is my command-line: Mapsplice="/l/Yu/YuLab/Bioinformatics/projects/mhills_circRNA-ADAR/MapSplice-v2.2.1" data="/l/Yu/YuLab/Bioinformatics/projects/mhills_circRNA-ADAR/data" treatment="tetG2" index="ACAGTG" output="/l/Yu/YuLab/Bioinformatics/projects/mhills_circRNA-ADAR/00_Alignment/MapSplice"
python $Mapsplice/mapsplice.py -1 $data/$treatment/all_$index.fq \ -c /l/Yu/YuLab/Bioinformatics/projects/mhills_circRNA-ADAR/mm10/chromosomes \ -x /l/Yu/YuLab/Bioinformatics/projects/mhills_circRNA-ADAR/00_Alignment/Bowtie_1/bt_mm10 \ -p 8 \ -o $output/$index/alignment \ --min-fusion-distance 200 \ --gene-gtf /l/Yu/YuLab/Bioinformatics/projects/mhills_circRNA-ADAR/mm10/mm10.ensembl.gtf \ --fusion > $output/$index/MapSplice.out 2> $output/$index/MapSplice.err