Question: Is there a preferred RNA-seq aligner for (very) short reads and potential homologous regions?
gravatar for O.rka
28 days ago by
O.rka110 wrote:

I'm running the Dropseq pipeline on a de-novo assembled diatom with gene calls from Maker. The pipeline suggests using STAR-aligner for the read mapping. The only problem is that I'm getting a REALLY low mapping.

Does anybody know if there are parameters I can adjust for this particular dataset in STAR or another aligner designed to specifically address these types of issues?

Here is my command:

STAR --genomeDir star_index_extended --readFilesIn unaligned_mc_tagged_polyA_filtered.fastq --outFileNamePrefix star_ --runThreadN 16 --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMatchNmin 50 --outReadsUnmapped Fastx

Here is the summary output:

bash-4.1$ cat
                                 Started job on |   May 10 14:58:11
                             Started mapping on |   May 10 14:58:16
                                    Finished on |   May 10 20:57:36
       Mapping speed, Million of reads per hour |   85.18

                          Number of input reads |   510139798
                      Average input read length |   57
                                    UNIQUE READS:
                   Uniquely mapped reads number |   9914109
                        Uniquely mapped reads % |   1.94%
                          Average mapped length |   58.89
                       Number of splices: Total |   689950
            Number of splices: Annotated (sjdb) |   0
                       Number of splices: GT/AG |   415996
                       Number of splices: GC/AG |   11503
                       Number of splices: AT/AC |   297
               Number of splices: Non-canonical |   262154
                      Mismatch rate per base, % |   4.32%
                         Deletion rate per base |   0.07%
                        Deletion average length |   1.54
                        Insertion rate per base |   0.03%
                       Insertion average length |   1.64
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |   11525318
             % of reads mapped to multiple loci |   2.26%
        Number of reads mapped to too many loci |   11342282
             % of reads mapped to too many loci |   2.22%
                                  UNMAPPED READS:
       % of reads unmapped: too many mismatches |   0.00%
                 % of reads unmapped: too short |   84.30%
                     % of reads unmapped: other |   9.27%
                                  CHIMERIC READS:
                       Number of chimeric reads |   0
                            % of chimeric reads |   0.00%
rna-seq • 113 views
ADD COMMENTlink modified 28 days ago by swbarnes25.8k • written 28 days ago by O.rka110

can you provide some stats on the input fastq file you are using?

ADD REPLYlink written 28 days ago by lieven.sterck5.2k

What other stats would be useful to add here? I think the only one STAR outputs regarding this is: Average mapped length | 58.89 but I can run another tool as well.

ADD REPLYlink written 28 days ago by O.rka110

yes, but i mean more in the line of length of your input reads etc (eg. something that you would get from running fastQC or such)

ADD REPLYlink written 27 days ago by lieven.sterck5.2k

Try setting --outFilterMatchNmin to 20 to see if you can get more mapping. However, that means it only requires 20 bases to map, which is pretty low.

ADD REPLYlink written 28 days ago by Damian Kao15k
gravatar for swbarnes2
28 days ago by
United States
swbarnes25.8k wrote:

"Too short" usually means "didn't map". Are you sure this is an appropriate reference genome?

Have you eyeballed your reads to see if they make sense?

ADD COMMENTlink modified 28 days ago • written 28 days ago by swbarnes25.8k

Do you recommend a method for checking the quality of an assembly vs. transcriptome besides mapping the reads to the assembly and seeing the % that maps?

ADD REPLYlink written 28 days ago by O.rka110

I'm getting some interesting mapping results when using bwa to map against the reference genome. Will need to talk to the rest of the team to figure out the details between different experiments. Thanks!

Sample  Mapped  Unmapped    Total   Ratio Mapped
Fresh   50337077    459841575   510178652   0.098665589
13  21809   279292  301101  0.072430845
14  32791   554770  587561  0.055808673
15  5217    56826   62043   0.084086843
21  4750237 1375941 6126178 0.775399768
ADD REPLYlink written 26 days ago by O.rka110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 914 users visited in the last hour