Why do I observe a large number of “Unassigned_NoFeatures” reads in featureCounts after standard mouse RNA-seq workflow?
0
0
Entering edit mode
4 hours ago
城玮 • 0

Hi everyone,

I’m performing a standard RNA-seq analysis on mouse (Mus musculus) and encountered an issue at the counting step using featureCounts. Below is my workflow:

  1. Downloaded the mouse reference genome from NCBI (Assembly: GCF_000001635.27/GRCm39) and built the index.
  2. Performed quality control on raw sequencing data using Trimmomatic.
  3. Aligned the trimmed reads to the reference genome with STAR, producing sorted BAM files.
  4. Counted reads using featureCounts to obtain raw count matrix:

featureCounts -T 25 -s 2 \

-a mmu_hg38/ncbi_dataset/data/GCF_000001635.27/GCF_000001635.27_GRCm39_genomic.gtf \

-o BD-1-1_featurecounts.txt \

BD-1-1_counts_Aligned.sortedByCoord.out.bam

Here is the summary file from the result of above code:

Status                         BD-1-1_counts_Aligned.sortedByCoord.out.bam
Assigned                       25698967
Unassigned_Unmapped            2131357
Unassigned_Read_Type           0
Unassigned_Singleton           0
Unassigned_MappingQuality      0
Unassigned_Chimera             0
Unassigned_FragmentLength      0
Unassigned_Duplicate           0
Unassigned_MultiMapping        8512525
Unassigned_Secondary           0
Unassigned_NonSplit            0
Unassigned_NoFeatures          27869011
Unassigned_Overlapping_Length  0
Unassigned_Ambiguity           137274

As you can see, a large number of reads (~27.9 million) are labeled as Unassigned_NoFeatures, and only ~25.7 million reads are “Assigned”. Considering the total alignments reported (~64.3 million), the assigned proportion (~40%) is relatively low.

Questions:

  1. Is this assigned/unassigned read distribution reasonable for a mouse RNA-seq experiment?
  2. Could the -s 2 option (reverse-stranded) be causing or contributing to this high proportion of “NoFeatures” reads?
  3. What other common reasons exist for such a high “Unassigned_NoFeatures” count (for example, annotation mismatch, genome build/version inconsistency, un–annotated transcripts, etc.)?

Do you have suggestions for next diagnostic steps to identify the root cause?

Thank you in advance for any advice or insights!

RNA-SEQ SEQ NGS RNA • 97 views
ADD COMMENT
1
Entering edit mode

It's common to have some fraction of the reads to be unassigned but the amount (or ratio) you observe is too high I feel.

Can you tell a bit more on the RNA input data? was this for instance a rRNA depleted sample (or was it 'whole RNA' sample) ?

and yes all those reasons you list there can be the cause indeed, among even more other reasons ;)

ADD REPLY
0
Entering edit mode

Thank you very much for your thoughtful and detailed reply.

I successfully solved this problem by using the option -s 0 but not -s 2. The result of using option -s 0 is showing below:

Assigned        48431455
Unassigned_Unmapped     2131357
Unassigned_Read_Type    0
Unassigned_Singleton    0
Unassigned_MappingQuality       0
Unassigned_Chimera      0
Unassigned_FragmentLength       0
Unassigned_Duplicate    0
Unassigned_MultiMapping 8512525

As we could see here, the result is much more suitable compared with the result of option -s 2.

I guess this is because -s 2 specifies the reverse strand of the reference gene, which caused a large number of reads to fail to match.

Thanks again for your kind help!

ADD REPLY
0
Entering edit mode

I successfully solved this problem by using the option -s 0 but not -s 2

While that is great do you know for sure if the library you are working with was made in an "unstranded" manner i.e. not preserving strand information. Knowing details about the experimental setup is important since that guides the options used for analysis.

ADD REPLY

Login before adding your answer.

Traffic: 4317 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6