Question: STAR alignment, RNA-seq
0
gravatar for prasoonagar
16 months ago by
prasoonagar10
prasoonagar10 wrote:

I have a a query regarding STAR alignment. I used the following commands to convert the BAM files to fastq (as they were some issues while using cufflinks)

samtools sort -n file.bam > file_sort.bam (sorted the file)
bedtools bamtofastq -i file_sort.bam -fq file_R1.fq -fq2 file_R2.fq (converted bam to fastq)

I further did the alignment using STAR where i used the following command

STAR --genomeDir star-genome --readFilesIn file_R1.fq file_R2.fq  --runThreadN 6  --outFileNamePrefix file

My main issue is that i am getting very low unique alignment which is 8% to 15%. The output of one of the file looks like the following Started job on | Apr 18 09:39:59 Started mapping on | Apr 18 09:43:53 Finished on | Apr 18 10:49:50 Mapping speed, Million of reads per hour | 125.36

                      Number of input reads |      137795751
                  Average input read length |   200
                                UNIQUE READS:
               Uniquely mapped reads number |          20600304
                    Uniquely mapped reads % | 14.95%
                      Average mapped length |    196.65
                   Number of splices: Total |      7843335
        Number of splices: Annotated (sjdb) |          0
                   Number of splices: GT/AG |   7766279
                   Number of splices: GC/AG |   35700
                   Number of splices: AT/AC |    3805
           Number of splices: Non-canonical |            37551
                  Mismatch rate per base, % |   0.42%
                     Deletion rate per base |        0.02%
                    Deletion average length |      1.42
                    Insertion rate per base |        0.01%
                   Insertion average length |     1.58
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |    53570046
         % of reads mapped to multiple loci |            38.88%
    Number of reads mapped to too many loci | 3298118
         % of reads mapped to too many loci |         2.39%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |             0.00%
             % of reads unmapped: too short |            43.76%
                 % of reads unmapped: other |               0.02%
                              CHIMERIC READS:
                   Number of chimeric reads |   0
                        % of chimeric reads |           0.00%

Could anyone please suggest how I can improve my alignment quality as most of my data shows reads unmapped :too short?

It will be great if I can get some expert suggestion.

Thanks

Prasoon

rna-seq • 1.3k views
ADD COMMENTlink written 16 months ago by prasoonagar10

% of reads unmapped: too short | 43.76%

Looks like you have a lot of short reads in your dataset.

ADD REPLYlink written 16 months ago by WouterDeCoster31k

Ya I want to know if we can align these reads or they are all wasted. Is there a sequencing problem?

ADD REPLYlink written 16 months ago by prasoonagar10

How short is short?

ADD REPLYlink written 16 months ago by WouterDeCoster31k

Try decreasing --outFilterMatchNminOverLread.

ADD REPLYlink written 16 months ago by Devon Ryan82k

For posterity sake, take a handful of the unmapped reads and blastn them.

ADD REPLYlink written 16 months ago by mforde841.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1606 users visited in the last hour