Question

RNA-seq assigned alignments

0

Entering edit mode

4 months ago

Manko47 ▴ 10

Hey everyone,

I've a rather technical question regarding assigned alignments percentage to human genome (GRCh38.p14) - I'm using Gencode v47 version along it's annotation.

My lab recently changed sequencing company to a cheaper one. Before a final decision is made I've got data RNA-seq data to play around and determine it's quality. Importantly previously we were getting single-end data, meanwhile now it's paired end RNA-seq. Obviously I updated STAR mapping script accordingly to deal with paired-end data, and featureCounts script accordingly to count reads on paired-end data.

I've think I mostly made the decision as the new datasets are looking rather good (good base quality, per sequence quality and sequence counts. In short - nothing wrong on the first glance from fastQC/multiQC report). More importantly I'm getting very optimal unique mapping percentage (86-90%).

There is just one thing bothering me. Previously on the single-end data from old company I was getting around 50-60% of the uniquely aligned reads being assigned by featureCounts (for this purpose I was using Gencode basic gene annotation). Even samples with lower mapping percentages (60-70%) usually had over 50% of uniquely mapped reads being assigned.

Meanwhile on the paired-end data from new company, I'm getting only around 30-40% of uniquely mapped reads being assigned by featureCounts (once again to Gencode basic gene annotation). So despite stellar uniquely mapped rate (about 90%) only around 30-40% of these alignments are getting assigned. I've tried more comprehensive annotations, but it didn't improve much.

And now I'm wondering why that is the case. Is it normal - perhaps the alignements and assignments statistics are being counted differently on paired-end data than single-end data. Or maybe not and in this case this lower assignment percentage could indicate some issues. I'd be glad to get any feedback from someone more knowledgable.

Oh and just in case my FeatureCounts script for paired-end reads has these options set:

-p -s 2 -B --countReadPairs -C. + annotation and files obviously.

For single-end I've only used -s 2 to select reverse-strand.

EDIT: Just to give more info. Number of aligned reads on the new datasets is around 30-40M per sample. Of which around 13-17M of reads are being assigned. From the rest, majority are excluded due to no overlapping features - regardless of how comprehensive annotation I use.

paired-end STAR RNA-seq mapping single-end • 884 views

ADD COMMENT • link updated 4 months ago by jaro.slamecka ▴ 270 • written 4 months ago by Manko47 ▴ 10

score 1 · Answer 1 · 2025-07-14

1

Entering edit mode

4 months ago

GenoMax 154k

Previously on the single-end data from old company I was getting around 50-60% of the uniquely aligned reads

Are you comparing the same exact library sequenced by two different companies. If not this is not a fair comparison and nothing substantive can be concluded from the observation.

Are you still making the libraries or is the new sequence provider making them? If latter that adds another variable.

As a test, you could use just the R1 reads from your new provider and see if the assignments go back to being closer to what you were seeing before. Otherwise having paired-end data is providing an additional anchor on the 3'end of the fragment and thus the spatial information now should be providing better data. If you intend to compare the new results with old in some form then you will need to consider the batch effect.

ADD COMMENT • link 4 months ago by GenoMax 154k

0

Entering edit mode

Are you comparing the same exact library sequenced by two different companies.

Nah it's just majorty of my old datasets osciliated around these kind of numbers, therefore once I saw this drop on the new data, it immediately caught my attention. Therefore this isn't a direct 1:1 comparision.

As a test, you could use just the R1 reads from your new provider and see if the assignments go back to being closer to what you were seeing before.

Thank you, this seems like a great idea. Haven't thought of that, will definitely check that!

ADD REPLY • link 4 months ago by Manko47 ▴ 10

0

Entering edit mode

Since you have decided to move to a new sequence provider that decision is for economic reasons and not scientific, so not much to think about as long as you are making libraries using normal SOP. Be cautious about any cross provider data comparisons (batch effect).

ADD REPLY • link 4 months ago by GenoMax 154k

0

Entering edit mode

To reiterate one of GenoMax's questions, is the new sequencing provider making the libraries for you? If so, have they by any chance used ribosomal RNA depletion instead of polyA selection? For ribo-depleted libraries, numbers like 30-40% assigned are not totally unusual.

ADD REPLY • link 4 months ago by jaro.slamecka ▴ 270