STAR alignment, RNA-seq
0
0
Entering edit mode
7.0 years ago
prasoonagar ▴ 10

I have a a query regarding STAR alignment. I used the following commands to convert the BAM files to fastq (as they were some issues while using cufflinks)

samtools sort -n file.bam > file_sort.bam (sorted the file)
bedtools bamtofastq -i file_sort.bam -fq file_R1.fq -fq2 file_R2.fq (converted bam to fastq)

I further did the alignment using STAR where i used the following command

STAR --genomeDir star-genome --readFilesIn file_R1.fq file_R2.fq  --runThreadN 6  --outFileNamePrefix file

My main issue is that i am getting very low unique alignment which is 8% to 15%. The output of one of the file looks like the following Started job on | Apr 18 09:39:59 Started mapping on | Apr 18 09:43:53 Finished on | Apr 18 10:49:50 Mapping speed, Million of reads per hour | 125.36

                      Number of input reads |      137795751
                  Average input read length |   200
                                UNIQUE READS:
               Uniquely mapped reads number |          20600304
                    Uniquely mapped reads % | 14.95%
                      Average mapped length |    196.65
                   Number of splices: Total |      7843335
        Number of splices: Annotated (sjdb) |          0
                   Number of splices: GT/AG |   7766279
                   Number of splices: GC/AG |   35700
                   Number of splices: AT/AC |    3805
           Number of splices: Non-canonical |            37551
                  Mismatch rate per base, % |   0.42%
                     Deletion rate per base |        0.02%
                    Deletion average length |      1.42
                    Insertion rate per base |        0.01%
                   Insertion average length |     1.58
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |    53570046
         % of reads mapped to multiple loci |            38.88%
    Number of reads mapped to too many loci | 3298118
         % of reads mapped to too many loci |         2.39%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |             0.00%
             % of reads unmapped: too short |            43.76%
                 % of reads unmapped: other |               0.02%
                              CHIMERIC READS:
                   Number of chimeric reads |   0
                        % of chimeric reads |           0.00%

Could anyone please suggest how I can improve my alignment quality as most of my data shows reads unmapped :too short?

It will be great if I can get some expert suggestion.

Thanks

Prasoon

RNA-Seq • 7.2k views
ADD COMMENT
0
Entering edit mode

% of reads unmapped: too short | 43.76%

Looks like you have a lot of short reads in your dataset.

ADD REPLY
0
Entering edit mode

Ya I want to know if we can align these reads or they are all wasted. Is there a sequencing problem?

ADD REPLY
0
Entering edit mode

How short is short?

ADD REPLY
0
Entering edit mode

Try decreasing --outFilterMatchNminOverLread.

ADD REPLY
0
Entering edit mode

For posterity sake, take a handful of the unmapped reads and blastn them.

ADD REPLY

Login before adding your answer.

Traffic: 2740 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6