I have a a query regarding STAR alignment. I used the following commands to convert the BAM files to fastq (as they were some issues while using cufflinks)
samtools sort -n file.bam > file_sort.bam (sorted the file) bedtools bamtofastq -i file_sort.bam -fq file_R1.fq -fq2 file_R2.fq (converted bam to fastq)
I further did the alignment using STAR where i used the following command
STAR --genomeDir star-genome --readFilesIn file_R1.fq file_R2.fq --runThreadN 6 --outFileNamePrefix file
My main issue is that i am getting very low unique alignment which is 8% to 15%. The output of one of the file looks like the following Started job on | Apr 18 09:39:59 Started mapping on | Apr 18 09:43:53 Finished on | Apr 18 10:49:50 Mapping speed, Million of reads per hour | 125.36
Number of input reads | 137795751 Average input read length | 200 UNIQUE READS: Uniquely mapped reads number | 20600304 Uniquely mapped reads % | 14.95% Average mapped length | 196.65 Number of splices: Total | 7843335 Number of splices: Annotated (sjdb) | 0 Number of splices: GT/AG | 7766279 Number of splices: GC/AG | 35700 Number of splices: AT/AC | 3805 Number of splices: Non-canonical | 37551 Mismatch rate per base, % | 0.42% Deletion rate per base | 0.02% Deletion average length | 1.42 Insertion rate per base | 0.01% Insertion average length | 1.58 MULTI-MAPPING READS: Number of reads mapped to multiple loci | 53570046 % of reads mapped to multiple loci | 38.88% Number of reads mapped to too many loci | 3298118 % of reads mapped to too many loci | 2.39% UNMAPPED READS: % of reads unmapped: too many mismatches | 0.00% % of reads unmapped: too short | 43.76% % of reads unmapped: other | 0.02% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00%
Could anyone please suggest how I can improve my alignment quality as most of my data shows reads unmapped :too short?
It will be great if I can get some expert suggestion.