FeatureCount : Successfully assigned alignments is low
1
0
Entering edit mode
11 weeks ago
HarperReed • 0

Hello, I hope you can help me !!

I'm analyzing RNA-seq data from dogs. I used fastp to automatically detect and remove adapters, then aligned the reads with HISAT2, and finally performed quantification using featureCounts.

With featureCounts, when using only uniquely mapped reads, I get a low rate of Successfully assigned alignments: 23%. When including both uniquely and multi-mapped reads, the rate improves to 32%.

I used the GTF file mentioned in the original publication.

Is it normal to have such a low assignment rate? Is this considered low?

I've also added my MultiQC reports from before and after using fastp.

I am using the FASTA and GTF files from the same Ensembl version For strandness, I tested -s 0, -s 1, and -s 2, but the highest assignment rate (32%) was with -s 0.


1. BEFORE FASTP:


enter image description here enter image description here enter image description here


2. AFTER FASTP


enter image description here enter image description here enter image description here

thank you in advance

rnaseq featurecount • 949 views
ADD COMMENT
0
Entering edit mode

20-30% does sound low indeed.

However to give a more definitive answer you'll need to provide us with much more info: numbers like, alignment rate, number of input reads, ... Other useful info is related to to library you sequenced (both biological and technical): which kind of samples, how was rRNA depletion done, ... things like that

On first sight you do seem to have a quite substantial number of duplicated reads. Any explanation for that observation?

ADD REPLY
0
Entering edit mode

thank you for replying

I used RNA-seq data from the following article:

"RNA-seq of serial kidney biopsies obtained during progression of chronic kidney disease from dogs with X-linked hereditary nephropathy" DOI: 10.1038/s41598-017-16603-y

To download the FASTQ files from the SRA database (BioProject: PRJNA378728), I used the following command:

fasterq-dump --split-files --threads

Below is an example of the HISAT2 alignment summary for one of the samples: t1_control-2

9152321 pairs aligned concordantly 0 times; of these:


2020759 (22.08%) aligned discordantly 1 time

7131562 pairs aligned 0 times concordantly or discordantly; of these:

14263124 mates make up the pairs; of these:

11949808 (83.78%) aligned 0 times

1932120 (13.55%) aligned exactly 1 time

381196 (2.67%) aligned >1 times

82.17% overall alignment rate

 33509610 reads; of these:

33509610 (100.00%) were paired; of these:

9152321 (27.31%) aligned concordantly 0 times

22450775 (67.00%) aligned concordantly exactly 1 time

1906514 (5.69%) aligned concordantly >1 times



**t1_control-1**

 10002648 (29.27%) aligned concordantly 0 times

22860924 (66.91%) aligned concordantly exactly 1 time

1305383 (3.82%) aligned concordantly >1 times
----
10002648 pairs aligned concordantly 0 times; of these:

1929055 (19.29%) aligned discordantly 1 time
----
8073593 pairs aligned 0 times concordantly or discordantly; of these:

16147186 mates make up the pairs; of these:

13874194 (85.92%) aligned 0 times

1966097 (12.18%) aligned exactly 1 time

306895 (1.90%) aligned >1 times

79.70% overall alignment rate

 34168955 reads; of these:

34168955 (100.00%) were paired
  • Based on the article : to remove both cytoplasmic and mitochondrial rRNA and its compatibility with canine samples. Samples were then sequenced using the Illumina Genome Analyzer (HiSeq. 2500v4 High Output)
ADD REPLY
0
Entering edit mode

I don't know if it's an important information or not but I used the same tools and configuration of (fastp, HISAT2, and featureCounts with multiped option) on data from the following article: Mechanism of Growth Regulation of Yeast Involving Hydrogen Sulfide From S-Propargyl-Cysteine Catalyzed by Cystathionine-y-Lyase.

In this case, I obtained the following results:

 Paired-end reads are included.           

The reads are assigned on the single-end mode.   

Total alignments : 2382029                                              

Successfully assigned alignments : 2237656 (93.9%)                      

Running time : 0.01 minutes  

Process BAM file SRR13978642.bam...

hisat output for this second article is

 3941180 reads; of these:

3941180 (100.00%) were paired; of these:

112279 (2.85%) aligned concordantly 0 times

2849955 (72.31%) aligned concordantly exactly 1 time

978946 (24.84%) aligned concordantly >1 times


112279 pairs aligned concordantly 0 times; of these:

47682 (42.47%) aligned discordantly 1 time

64597 pairs aligned 0 times concordantly or discordantly; of these:

129194 mates make up the pairs; of these:

77666 (60.12%) aligned 0 times

41774 (32.33%) aligned exactly 1 time

9754 (7.55%) aligned >1 times

99.01% overall alignment rate
ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Looks like you used the "non-chromosomal DNA" file instead of using the actual genome file. Use https://ftp.ensembl.org/pub/current/fasta/canis_lupus_familiaris/dna/Canis_lupus_familiaris.ROS_Cfam_1.0.dna.toplevel.fa.gz instead.

ADD REPLY
0
Entering edit mode

thank you for your answer.

I made a mistake when copying the link in my previous comment, but the file I actually used is: https://ftp.ensembl.org/pub/current/fasta/canis_lupus_familiaris/dna/Canis_lupus_familiaris.ROS_Cfam_1.0.dna.toplevel.fa.gz,

so I believe this is the correct one.

ADD REPLY
0
Entering edit mode

Are you tried to change the -s option? try with -s 1 or 2 and check the total counts.

ADD REPLY
0
Entering edit mode

yes I tried it's low

ADD REPLY
1
Entering edit mode
10 weeks ago
ATpoint 89k

From the methods text:

The average RIN was 3.4

There is your answer. The RNA was very poor quality, the libraries mostly consist of (lets use scientific elaborate english) uttermost crap. Nothing you can do about that. It's published data, so you cannot do anything about the quality. Use the successfully-assigned counts and see whether you can infer anything meaningful for your story, else move on. No in silico magic will save the dataset.

ADD COMMENT
0
Entering edit mode

Thank you very much for your response.

Would you be able to recommend a well-regarded publication or dataset with high-quality RNA-seq data that I can use to further test and validate my pipeline pleaaaseeee? I've already tested it on the S.cerevisiae dataset, but I’d like to assess its performance on an additional dataset to ensure robustness.

thank you again

ADD REPLY
0
Entering edit mode

There are any number of publication that would fit the bill (choosing a random one that does some eval of pipelines) . You can use the data from https://www.nature.com/articles/s41598-020-76881-x#Abs1 available at NCBI here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE95077 and https://www.ncbi.nlm.nih.gov/Traces/study/?query_key=1&WebEnv=MCID_685c0bd7054961505629611f&o=acc_s%3Aa

ADD REPLY
0
Entering edit mode

thank you very much

ADD REPLY

Login before adding your answer.

Traffic: 3524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6