EDIT: problem seems to be that 88% of reads seem to be 'too short' but not sure how to resolve this -- more details in my comment below.
Hi all, I've been trying to re-map some RNA-seq fasta files to mm39 using STAR. I was told by the sequencing facility who ran the requencing for me that when they mapped the reads onto mm10 using BWA MEM their mapping frequency for each file was around 95%. However when I do the alignment, my unique reads frequency is a little under 10%. Judging by the mapping frequency achieved using BWA MEM I doubt that rRNA or other contamination is a problem. Do you have any suggestions as to why I'm coming up against this problem? Could I be using incorrect genome/annotation files for the genome indexing?
For reference, I used the following code to generate my genome index (with genome and annotation files from Ensembl):
STAR --runMode genomeGenerate --genomeDir mm39starindex/index/ \ --genomeFastaFiles /fullpath/Mus_musculus.GRCm39.dna.primary_assembly.fasta \ --sjdbGTFfile /fullpath/Mus_musculus.GRCm39.104.gtf \ --runThreadN 2
And the format of the code I'm using for the alignment:
#!/bin/bash #SBATCH --job-name=star #SBATCH --ntasks=1 #SBATCH --cpus-per-task=16 #SBATCH --time=0-64:00:00 #SBATCH --output=outfile.%j #SBATCH --error=errfile.%j module load STAR/2.7.4 STAR --runThreadN 16 --genomeDir mm39starindex/mm39starindex/index \ --readFilesIn filename_1.fastq.gz filename_2.fastq.gz --readFilesCommand zcat \ --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts \ --outFileNamePrefix alignments/filename
Any advice will be much appreciated!