Question

STAR alignment, RNA-seq

0

Entering edit mode

8.2 years ago

prasoonagar ▴ 10

I have a a query regarding STAR alignment. I used the following commands to convert the BAM files to fastq (as they were some issues while using cufflinks)

samtools sort -n file.bam > file_sort.bam (sorted the file)
bedtools bamtofastq -i file_sort.bam -fq file_R1.fq -fq2 file_R2.fq (converted bam to fastq)

I further did the alignment using STAR where i used the following command

STAR --genomeDir star-genome --readFilesIn file_R1.fq file_R2.fq  --runThreadN 6  --outFileNamePrefix file

My main issue is that i am getting very low unique alignment which is 8% to 15%. The output of one of the file looks like the following Started job on | Apr 18 09:39:59 Started mapping on | Apr 18 09:43:53 Finished on | Apr 18 10:49:50 Mapping speed, Million of reads per hour | 125.36

                      Number of input reads |      137795751
                  Average input read length |   200
                                UNIQUE READS:
               Uniquely mapped reads number |          20600304
                    Uniquely mapped reads % | 14.95%
                      Average mapped length |    196.65
                   Number of splices: Total |      7843335
        Number of splices: Annotated (sjdb) |          0
                   Number of splices: GT/AG |   7766279
                   Number of splices: GC/AG |   35700
                   Number of splices: AT/AC |    3805
           Number of splices: Non-canonical |            37551
                  Mismatch rate per base, % |   0.42%
                     Deletion rate per base |        0.02%
                    Deletion average length |      1.42
                    Insertion rate per base |        0.01%
                   Insertion average length |     1.58
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |    53570046
         % of reads mapped to multiple loci |            38.88%
    Number of reads mapped to too many loci | 3298118
         % of reads mapped to too many loci |         2.39%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |             0.00%
             % of reads unmapped: too short |            43.76%
                 % of reads unmapped: other |               0.02%
                              CHIMERIC READS:
                   Number of chimeric reads |   0
                        % of chimeric reads |           0.00%

Could anyone please suggest how I can improve my alignment quality as most of my data shows reads unmapped :too short?

It will be great if I can get some expert suggestion.

Thanks

Prasoon

RNA-Seq • 7.7k views

ADD COMMENT • link 8.2 years ago by prasoonagar ▴ 10

0

Entering edit mode

% of reads unmapped: too short | 43.76%

Looks like you have a lot of short reads in your dataset.

ADD REPLY • link 8.2 years ago by WouterDeCoster 48k

0

Entering edit mode

Ya I want to know if we can align these reads or they are all wasted. Is there a sequencing problem?

ADD REPLY • link 8.2 years ago by prasoonagar ▴ 10

0

Entering edit mode

How short is short?

ADD REPLY • link 8.2 years ago by WouterDeCoster 48k

0

Entering edit mode

Try decreasing --outFilterMatchNminOverLread.

ADD REPLY • link 8.2 years ago by Devon Ryan 105k

0

Entering edit mode

For posterity sake, take a handful of the unmapped reads and blastn them.

ADD REPLY • link 8.2 years ago by mforde84 ★ 1.4k