Question

Is there a preferred RNA-seq aligner for (very) short reads and potential homologous regions?

0

Entering edit mode

6.2 years ago

O.rka ▴ 750

I'm running the Dropseq pipeline on a de-novo assembled diatom with gene calls from Maker. The pipeline suggests using STAR-aligner for the read mapping. The only problem is that I'm getting a REALLY low mapping.

Does anybody know if there are parameters I can adjust for this particular dataset in STAR or another aligner designed to specifically address these types of issues?

Here is my command:

STAR --genomeDir star_index_extended --readFilesIn unaligned_mc_tagged_polyA_filtered.fastq --outFileNamePrefix star_ --runThreadN 16 --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMatchNmin 50 --outReadsUnmapped Fastx

Here is the summary output:

bash-4.1$ cat star_Log.final.out
                                 Started job on |   May 10 14:58:11
                             Started mapping on |   May 10 14:58:16
                                    Finished on |   May 10 20:57:36
       Mapping speed, Million of reads per hour |   85.18

                          Number of input reads |   510139798
                      Average input read length |   57
                                    UNIQUE READS:
                   Uniquely mapped reads number |   9914109
                        Uniquely mapped reads % |   1.94%
                          Average mapped length |   58.89
                       Number of splices: Total |   689950
            Number of splices: Annotated (sjdb) |   0
                       Number of splices: GT/AG |   415996
                       Number of splices: GC/AG |   11503
                       Number of splices: AT/AC |   297
               Number of splices: Non-canonical |   262154
                      Mismatch rate per base, % |   4.32%
                         Deletion rate per base |   0.07%
                        Deletion average length |   1.54
                        Insertion rate per base |   0.03%
                       Insertion average length |   1.64
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |   11525318
             % of reads mapped to multiple loci |   2.26%
        Number of reads mapped to too many loci |   11342282
             % of reads mapped to too many loci |   2.22%
                                  UNMAPPED READS:
       % of reads unmapped: too many mismatches |   0.00%
                 % of reads unmapped: too short |   84.30%
                     % of reads unmapped: other |   9.27%
                                  CHIMERIC READS:
                       Number of chimeric reads |   0
                            % of chimeric reads |   0.00%

RNA-Seq • 2.2k views

ADD COMMENT • link updated 6.2 years ago by swbarnes2 15k • written 6.2 years ago by O.rka ▴ 750

0

Entering edit mode

can you provide some stats on the input fastq file you are using?

ADD REPLY • link 6.2 years ago by lieven.sterck 15k

0

Entering edit mode

What other stats would be useful to add here? I think the only one STAR outputs regarding this is: Average mapped length | 58.89 but I can run another tool as well.

ADD REPLY • link 6.2 years ago by O.rka ▴ 750

0

Entering edit mode

yes, but i mean more in the line of length of your input reads etc (eg. something that you would get from running fastQC or such)

ADD REPLY • link 6.2 years ago by lieven.sterck 15k

0

Entering edit mode

Try setting --outFilterMatchNmin to 20 to see if you can get more mapping. However, that means it only requires 20 bases to map, which is pretty low.

ADD REPLY • link 6.2 years ago by Damian Kao 16k

score 0 · Answer 1 · 2019-05-21

0

Entering edit mode

6.2 years ago

swbarnes2 15k

"Too short" usually means "didn't map". Are you sure this is an appropriate reference genome?

Have you eyeballed your reads to see if they make sense?

ADD COMMENT • link 6.2 years ago by swbarnes2 15k

0

Entering edit mode

Do you recommend a method for checking the quality of an assembly vs. transcriptome besides mapping the reads to the assembly and seeing the % that maps?

ADD REPLY • link 6.2 years ago by O.rka ▴ 750

0

Entering edit mode

I'm getting some interesting mapping results when using bwa to map against the reference genome. Will need to talk to the rest of the team to figure out the details between different experiments. Thanks!

Sample  Mapped  Unmapped    Total   Ratio Mapped
Fresh   50337077    459841575   510178652   0.098665589
13  21809   279292  301101  0.072430845
14  32791   554770  587561  0.055808673
15  5217    56826   62043   0.084086843
21  4750237 1375941 6126178 0.775399768

ADD REPLY • link 6.2 years ago by O.rka ▴ 750