Question: Only 1% of reads are used as "input reads" in STAR
0
gravatar for caggtaagtat
5 months ago by
caggtaagtat330
caggtaagtat330 wrote:

Hi everybody,

I just did the alignment of my samples and for one of the samples, STAR used only 1% of the reads of the trimmed fastq file for mapping. Does someone know, what the reason could be for that? The references I used worked just fine for the rest of the data and I ran out of ideas where the error lies.

The FASTQ file contains around 300 million reads and STAR only uses 3 million. This is the command I used (its the last alignment step of a 2 pass run):

STAR --outFilterType BySJout --outFilterMismatchNmax 10 --outFilterMismatchNoverLmax 0.04 --alignEndsType EndToEnd -runThreadN 8 --outSAMtype BAM SortedByCoordinate --alignSJDBoverhangMin 4 --alignIntronMax 300000 --alignSJoverhangMin 8 --alignIntronMin 20 --genomeDir /path/to/Genome/ --sjdbOverhang 149 --quantMode GeneCounts --sjdbGTFfile /path/to/hg91.gtf --readFilesIn /path/to/file.fq > STAR.log

This is the Final log of the STAR run:

                             Started job on |   May 14 16:56:28
                         Started mapping on |   May 14 16:59:07
                                Finished on |   May 14 17:02:06
   Mapping speed, Million of reads per hour |   65.72

                      Number of input reads |   3267930
                  Average input read length |   134
                                UNIQUE READS:
               Uniquely mapped reads number |   3111505
                    Uniquely mapped reads % |   95.21%
                      Average mapped length |   135.04
                   Number of splices: Total |   1497184
        Number of splices: Annotated (sjdb) |   1497124
                   Number of splices: GT/AG |   1483304
                   Number of splices: GC/AG |   12329
                   Number of splices: AT/AC |   1093
           Number of splices: Non-canonical |   458
                  Mismatch rate per base, % |   0.18%
                     Deletion rate per base |   0.01%
                    Deletion average length |   1.85
                    Insertion rate per base |   0.01%
                   Insertion average length |   1.51
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   116052
         % of reads mapped to multiple loci |   3.55%
    Number of reads mapped to too many loci |   492
         % of reads mapped to too many loci |   0.02%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |   0.75%
             % of reads unmapped: too short |   0.41%
                 % of reads unmapped: other |   0.06%
                              CHIMERIC READS:
                   Number of chimeric reads |   0
                        % of chimeric reads |   0.00%

Any help is greatfully appriciated!

rna-seq input star mapping • 340 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by caggtaagtat330

Is there a chance of the input file being somehow corrupt? Do you see any errors anywhere?

ADD REPLYlink modified 5 months ago • written 5 months ago by genomax57k

Mapping with salmon worked and I don't see any errors during trimming

Edit: with salmon 330 million rads were mapped

ADD REPLYlink modified 5 months ago • written 5 months ago by caggtaagtat330

There's likely to still be an error in the fastq file that salmon happens to work around. Don't use SortedByCoordinate and look to see if the last read in the output file is around the 3.2 millionth in the file.

ADD REPLYlink written 5 months ago by Devon Ryan85k

I will try that, thank you.

ADD REPLYlink written 5 months ago by caggtaagtat330

3 million out of 300 million is 1%, not 10%. Do you really have one sample with 300 million reads for RNAseq?

ADD REPLYlink written 5 months ago by h.mon20k

Oh your right, I edited it in the question

ADD REPLYlink written 5 months ago by caggtaagtat330
2
gravatar for caggtaagtat
5 months ago by
caggtaagtat330
caggtaagtat330 wrote:

Ok I learned, that in fact my FASTQ files was corrupt after using sortMeRNA to remove reads from rRNA

It seems like there can be an error during the run, where it inserts a blank line in your FASTQ file, which leads to STAR cutting the whole file at that position. I hope after removing of the line everything should be fine.

Salmon did not have any problems with the extra blank line

Here is the link to the answer of matt.shenton who knew about this error!

ADD COMMENTlink written 5 months ago by caggtaagtat330
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1868 users visited in the last hour