1
0
Entering edit mode
2.9 years ago
caggtaagtat ★ 1.4k

Hi everybody,

I just did the alignment of my samples and for one of the samples, STAR used only 1% of the reads of the trimmed fastq file for mapping. Does someone know, what the reason could be for that? The references I used worked just fine for the rest of the data and I ran out of ideas where the error lies.

The FASTQ file contains around 300 million reads and STAR only uses 3 million. This is the command I used (its the last alignment step of a 2 pass run):

STAR --outFilterType BySJout --outFilterMismatchNmax 10 --outFilterMismatchNoverLmax 0.04 --alignEndsType EndToEnd -runThreadN 8 --outSAMtype BAM SortedByCoordinate --alignSJDBoverhangMin 4 --alignIntronMax 300000 --alignSJoverhangMin 8 --alignIntronMin 20 --genomeDir /path/to/Genome/ --sjdbOverhang 149 --quantMode GeneCounts --sjdbGTFfile /path/to/hg91.gtf --readFilesIn /path/to/file.fq > STAR.log


This is the Final log of the STAR run:

                             Started job on |   May 14 16:56:28
Started mapping on |   May 14 16:59:07
Finished on |   May 14 17:02:06
Mapping speed, Million of reads per hour |   65.72

Number of input reads |   3267930
Average input read length |   134
Uniquely mapped reads number |   3111505
Uniquely mapped reads % |   95.21%
Average mapped length |   135.04
Number of splices: Total |   1497184
Number of splices: Annotated (sjdb) |   1497124
Number of splices: GT/AG |   1483304
Number of splices: GC/AG |   12329
Number of splices: AT/AC |   1093
Number of splices: Non-canonical |   458
Mismatch rate per base, % |   0.18%
Deletion rate per base |   0.01%
Deletion average length |   1.85
Insertion rate per base |   0.01%
Insertion average length |   1.51
Number of reads mapped to multiple loci |   116052
% of reads mapped to multiple loci |   3.55%
Number of reads mapped to too many loci |   492
% of reads mapped to too many loci |   0.02%
% of reads unmapped: too many mismatches |   0.75%
% of reads unmapped: too short |   0.41%
% of reads unmapped: other |   0.06%
Number of chimeric reads |   0
% of chimeric reads |   0.00%


Any help is greatfully appriciated!

RNA-Seq STAR mapping input • 1.4k views
0
Entering edit mode

Is there a chance of the input file being somehow corrupt? Do you see any errors anywhere?

0
Entering edit mode

Mapping with salmon worked and I don't see any errors during trimming

Edit: with salmon 330 million rads were mapped

0
Entering edit mode

There's likely to still be an error in the fastq file that salmon happens to work around. Don't use SortedByCoordinate and look to see if the last read in the output file is around the 3.2 millionth in the file.

0
Entering edit mode

I will try that, thank you.

0
Entering edit mode

3 million out of 300 million is 1%, not 10%. Do you really have one sample with 300 million reads for RNAseq?

0
Entering edit mode

Oh your right, I edited it in the question

2
Entering edit mode
2.9 years ago
caggtaagtat ★ 1.4k

Ok I learned, that in fact my FASTQ files was corrupt after using sortMeRNA to remove reads from rRNA

It seems like there can be an error during the run, where it inserts a blank line in your FASTQ file, which leads to STAR cutting the whole file at that position. I hope after removing of the line everything should be fine.

Salmon did not have any problems with the extra blank line

Edit: Removing the blank line did the trick