Hello, I hope you can help me !!
I'm analyzing RNA-seq data from dogs. I used fastp to automatically detect and remove adapters, then aligned the reads with HISAT2, and finally performed quantification using featureCounts.
With featureCounts, when using only uniquely mapped reads, I get a low rate of Successfully assigned alignments: 23%. When including both uniquely and multi-mapped reads, the rate improves to 32%.
I used the GTF file mentioned in the original publication.
Is it normal to have such a low assignment rate? Is this considered low?
I've also added my MultiQC reports from before and after using fastp.
I am using the FASTA and GTF files from the same Ensembl version For strandness, I tested -s 0, -s 1, and -s 2, but the highest assignment rate (32%) was with -s 0.
1. BEFORE FASTP:
2. AFTER FASTP
thank you in advance
20-30% does sound low indeed.
However to give a more definitive answer you'll need to provide us with much more info: numbers like, alignment rate, number of input reads, ... Other useful info is related to to library you sequenced (both biological and technical): which kind of samples, how was rRNA depletion done, ... things like that
On first sight you do seem to have a quite substantial number of duplicated reads. Any explanation for that observation?
thank you for replying
I used RNA-seq data from the following article:
"RNA-seq of serial kidney biopsies obtained during progression of chronic kidney disease from dogs with X-linked hereditary nephropathy" DOI: 10.1038/s41598-017-16603-y
To download the FASTQ files from the SRA database (BioProject: PRJNA378728), I used the following command:
Below is an example of the HISAT2 alignment summary for one of the samples: t1_control-2
I don't know if it's an important information or not but I used the same tools and configuration of (fastp, HISAT2, and featureCounts with multiped option) on data from the following article: Mechanism of Growth Regulation of Yeast Involving Hydrogen Sulfide From S-Propargyl-Cysteine Catalyzed by Cystathionine-y-Lyase.
In this case, I obtained the following results:
Process BAM file SRR13978642.bam...
hisat output for this second article is
one final detail : for the first article :
in the article they mentionned this fasta file :
ftp://ftp.ensembl.org/pub/current_fasta/canis_familiaris/dna/Canis_familiaris.CanFam3.1.dna.toplevel.fa.gz
the link is not available so I used this one : https://ftp.ensembl.org/pub/current/fasta/canis_lupus_familiaris/dna/Canis_lupus_familiaris.ROS_Cfam_1.0.dna.nonchromosomal.fa.gz
Looks like you used the "non-chromosomal DNA" file instead of using the actual genome file. Use https://ftp.ensembl.org/pub/current/fasta/canis_lupus_familiaris/dna/Canis_lupus_familiaris.ROS_Cfam_1.0.dna.toplevel.fa.gz instead.
thank you for your answer.
I made a mistake when copying the link in my previous comment, but the file I actually used is: https://ftp.ensembl.org/pub/current/fasta/canis_lupus_familiaris/dna/Canis_lupus_familiaris.ROS_Cfam_1.0.dna.toplevel.fa.gz,
so I believe this is the correct one.
Are you tried to change the -s option? try with -s 1 or 2 and check the total counts.
yes I tried it's low