Question: % of reads unmapped: too short - STAR 2.5.2b
gravatar for quan.clement
2.3 years ago by
quan.clement0 wrote:

I am trying to map different RNAseq data to my reference genome. The data were collected from the ddbj database using wget. I then performed a quality control of the reads and trimming using trimmomatic. The trimmed reads were then mapped against my reference genome using STAR.

find -name '*R1_paired.fastq' -execdir bash -c '\
  export B=$(basename {} R1_paired.fastq);\
  /home/annag/tools/STAR-2.5.2b/bin/Linux_x86_64_static/STAR \
    --runThreadN 48 \
    --runMode alignReads\
    --outSAMtype BAM Unsorted \
    --genomeDir /home/efe22/shared/STAR/RIPG \
    --readFilesIn ${B}R1_paired.fastq ${B}R2_paired.fastq \
    --outFileNamePrefix ${B}' \;

The problem is that for each samples I only get unmapped reads.

                             Started job on |   Nov 07 01:54:06
                         Started mapping on |   Nov 07 01:54:07
                                Finished on |   Nov 07 05:38:52
   Mapping speed, Million of reads per hour |   10.62

                      Number of input reads |   39768046
                  Average input read length |   202
                                UNIQUE READS:
               Uniquely mapped reads number |   11842
                    Uniquely mapped reads % |   0.03%
                      Average mapped length |   193.10
                   Number of splices: Total |   7874
        Number of splices: Annotated (sjdb) |   4543
                   Number of splices: GT/AG |   6275
                   Number of splices: GC/AG |   34
                   Number of splices: AT/AC |   1
           Number of splices: Non-canonical |   1564
                  Mismatch rate per base, % |   1.88%
                     Deletion rate per base |   0.06%
                    Deletion average length |   1.03
                    Insertion rate per base |   0.01%
                   Insertion average length |   1.60
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   77721
         % of reads mapped to multiple loci |   0.20%
    Number of reads mapped to too many loci |   3605
         % of reads mapped to too many loci |   0.01%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |   0.00%
             % of reads unmapped: too short |   99.75%
                 % of reads unmapped: other |   0.02%
                              CHIMERIC READS:
                   Number of chimeric reads |   0
                        % of chimeric reads |   0.00%

I then wondered if the data was containing any reads mapping to my genome so I did a local blast using one of the fastq file generated and got reads perfectly matching to my query, which seems to indicate that the problem comes from the mapping itself.

I also tried to do the mapping with the fastq files before any trimming, and got the same results. I also checked for the fastq file integrity using a script from a colleague and it seems that the files are good and the reads properly sorted.

/home/ls752/genomes/scripts/ DRR001921_1_R1_paired.fastq DRR001921_1_R2_paired.fastq \;

" total validated mates: 25097181 and 25097181 read-pairs are properly ordered "

I would be very grateful for any input !


rna-seq star alignment • 2.5k views
ADD COMMENTlink written 2.3 years ago by quan.clement0

Try aligning the mates separately. Sometimes STAR gives up on a whole pair if one of the mates has a lot of issues.

ADD REPLYlink written 2.3 years ago by Devon Ryan98k

% reads unmapped: Too short is what you get when your read just don't map: If a read doesn't map, STAR will clip bases off the end of the read until either the read maps, or it is too short. So STAR is just saying that it can't map the reads. We recently had a similar problem where STAR wouldn't map our reads (0.01% mapping) despite the fact blast would. We tested if STAR was the problem by mapping with HiSAT, and got 99% mapping. In the end we thought the problem must have been our STAR index.

ADD REPLYlink written 2.3 years ago by i.sudbery11k

Hi Clement,

I'd try two things:

1st, align the 2 read sets separately and check if it works in principle. 2nd If that works, you can (visually) inspect if the R1 & R2 read overlap. If he overlap is quit big, you might need to go for the STAR 2.6 version with the --peOverlapNbasesMin parameter according to your observed overlap.

ADD REPLYlink written 2.3 years ago by michael.ante3.6k

Thank you for your quick replies.

I ll try the different suggestions. Hopefully it will work.

ADD REPLYlink written 2.3 years ago by quan.clement0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2724 users visited in the last hour