trouble with 'too short' reads using STAR. each mate maps fine, but the paired end mapping does not
2
1
Entering edit mode
3.1 years ago

I'm trying to align mouse RNA-seq, but I'm running into the 'too short' problem with STAR. Basically, all of the reads are being filtered because of this. The confusing part is that both read 1 and read 2 seem to map just fine if I map them separately as single end reads.

Here is the command for mapping the paired reads:

STAR \
    --runMode alignReads \
    --genomeDir $STAR_index \
    --readFilesIn $scratch/${sample}_tmp/${sample}_R1.fastq.gz $scratch/${sample}_tmp/${sample}_R2.fastq.gz \
    --readFilesCommand zcat \
    --runThreadN $THREADS \
    --outFileNamePrefix $BASEDIR/${sample}/${genome}/STAR/STAR_alignment/${sample}_ \
    --outReadsUnmapped Fastx \
    --outSAMtype BAM SortedByCoordinate \
    &> $BASEDIR/${sample}/${genome}/logs/${sample}_Star.log

And here's the log file:

Number of input reads |       51150550
                      Average input read length |       202
                                    UNIQUE READS:
                   Uniquely mapped reads number |       3907
                        Uniquely mapped reads % |       0.01%
                          Average mapped length |       175.69
                       Number of splices: Total |       690
            Number of splices: Annotated (sjdb) |       2
                       Number of splices: GT/AG |       515
                       Number of splices: GC/AG |       61
                       Number of splices: AT/AC |       0
               Number of splices: Non-canonical |       114
                      Mismatch rate per base, % |       5.56%
                         Deletion rate per base |       0.03%
                        Deletion average length |       1.91
                        Insertion rate per base |       0.01%
                       Insertion average length |       1.84
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |       126222
             % of reads mapped to multiple loci |       0.25%
        Number of reads mapped to too many loci |       6978
             % of reads mapped to too many loci |       0.01%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |       0
       % of reads unmapped: too many mismatches |       0.00%
            Number of reads unmapped: too short |       51012111

Here is the command for the single end mapping:

STAR \
        --runMode alignReads \
        --genomeDir $STAR_index \
        --readFilesIn $scratch/${sample}_tmp/${sample}_R1.fastq.gz \
        --readFilesCommand zcat \
        --runThreadN $THREADS \                                                                                                 
        --outFileNamePrefix $BASEDIR/${sample}/${genome}/STAR/STAR_alignment/${sample}_ \
        --outReadsUnmapped Fastx \   
        --outSAMtype BAM SortedByCoordinate \
        &> $BASEDIR/${sample}/${genome}/logs/${sample}_Star.log

and the accompanying log file:

Number of input reads |       51150550
                      Average input read length |       101
                                    UNIQUE READS:
                   Uniquely mapped reads number |       44108161
                        Uniquely mapped reads % |       86.23%
                          Average mapped length |       100.16
                       Number of splices: Total |       19083172
            Number of splices: Annotated (sjdb) |       18953752
                       Number of splices: GT/AG |       18956436
                       Number of splices: GC/AG |       96098
                       Number of splices: AT/AC |       11062
               Number of splices: Non-canonical |       19576
                      Mismatch rate per base, % |       0.21%
                         Deletion rate per base |       0.01%
                        Deletion average length |       1.33
                        Insertion rate per base |       0.01%
                       Insertion average length |       1.24
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |       5988911
             % of reads mapped to multiple loci |       11.71%
        Number of reads mapped to too many loci |       280004
             % of reads mapped to too many loci |       0.55%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |       0
       % of reads unmapped: too many mismatches |       0.00%
            Number of reads unmapped: too short |       749975
                 % of reads unmapped: too short |       1.47%
                Number of reads unmapped: other |       23499
                     % of reads unmapped: other |       0.05%
                                  CHIMERIC READS:
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%

Does anyone know why the paired end mapping seems to think all the reads are too short (e.g. more than 1/3 of the read does not map). Thanks for the help.

RNA-Seq alignment • 2.6k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

yes, I've seen this one. FastQC suggests each mate is 100bp. And I realize it's not that the reads are short necessarily, it's that more than 1/3 of the total read is not mapping. The default value being 0.66. I can't explain why read 1 and read 2 will map just fine on their own, but as paired end reads they will not map.

ADD REPLY
0
Entering edit mode

Did you trim the reads independently by any chance? Perhaps your R1/R2 files are out of sync. You can try using repair.sh in that case to re-sync and remove any singletons.

ADD REPLY
0
Entering edit mode

I don't generally trim reads when aligning with star. I haven't tried repair.sh though. I'll see how that goes and report back. Thank you.

ADD REPLY
0
Entering edit mode

If you did not trim then the reads would not be out of sync. There must be some other reason.

ADD REPLY
0
Entering edit mode
2.8 years ago
atchen3 • 0

Hi,

I ran into the same problem recently, where my paired-end run (2 x 41bp) was aligning < 1% but each mate separately was aligning 85%+.

It turns out my reads were reverse-complement, and after fixing this, I was able to get 90%+ alignment on my paired-end reads (slightly higher than each mate alone). I have Illumina data, so I was able to flip the "ReverseComplement" setting in the sample sheet, but bcl2fastq has an option to reverse complement as well.

I always ignored forward vs. reverse complement since the other alignment tools I use (bowtie2, kallisto) seem to not care. But it made a difference for me in STAR

Hope this helps!

ADD COMMENT
0
Entering edit mode
2.8 years ago

Try separating the fastqs with a comma. Then try a comma and a space.

ADD COMMENT

Login before adding your answer.

Traffic: 2054 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6