Question: Multi-mapping High with featureCounts but not STAR
1
gravatar for garbuzov
12 months ago by
garbuzov20
garbuzov20 wrote:

Hi everyone, I am sequencing neurons isolated from rat. I'm working with a limited amount of material, so using Nugen Ovation Universal library kits. The QC using FastQC looks ok. My library is paired-end with 75bp reads. I ran STAR using this command:

STAR --runThreadN 3 \
--genomeDir ../genomes/rn6/Ensembl/star2 \
--readFilesCommand gunzip -c \
--readFilesIn ${R1} ${R2} \
--outFileNamePrefix starMapped/${job_name} \
--outSAMtype BAM Unsorted \
--seedSearchStartLmax 40 \
--outFilterScoreMinOverLread 0.5 \
--outFilterMatchNminOverLread 0.5

My unique mapped reads is ~70-90% for all samples with multi-mapped reads 7-15%.

I used the same GTF file to then make the count table using featureCounts using this command:

featureCounts -T 6 -p -t exon -g gene_name -a ../genomes/rn6/Ensembl/Rattus_norvegicus.Rnor_6.0.93.gtf -o combined_counts.txt *.bam

But my % assigned reads is only 33% and unassigned_multimap reads are ~30% of my reads and unassigned_NoFeature are the other third. I don't understand how the number of multi-mapping reads and mapping can be so different between the two packages given I am using the same GFT file. What's going on here? And why is the number of unassigned reads so high with featureCounts?

sequencing rna-seq alignment • 698 views
ADD COMMENTlink modified 12 months ago • written 12 months ago by garbuzov20
1

I have a similar problem to yours. I also used the Rnor6 gtf annotation from ENSEMBL for alignment and quantification. The report from read_distribution.py from RSeQC package indicates that ~20-30% reads were aligned to intronic or intergenic regions. You can also try to see your results:

read_distribution.py -r /path/to/hg38_gencode_v29.bed -i /path/to/bam

read_distribution.py: http://rseqc.sourceforge.net/#read-distribution-py

I suspect this is because the Rat annotation file is still incomplete compared with human and mouse

ADD REPLYlink modified 11 months ago • written 11 months ago by biock20

You can try StringTie, it will assemble novel transcripts based on the genome, and quantify known and novel transcripts.

ADD REPLYlink written 11 months ago by h.mon29k

Thank you! I have used stringtie. But I still want to know that why there are so many reads were aligned to intronic region, is this normal for human data? Are there any way to determine whether this is due to the sequencing data or to my analyzing pipeline?

(I just analyzed a group of human data today. Uniquely mapping rate looks well (~90%). While featureCount can only assign ~40% reads successfully. Report from read_distribution.py shows that there are about ~20% reads.)

Thanks!

ADD REPLYlink written 11 months ago by biock20

Can you confirm that you are using the BAM files generated by STAR for counting with featureCounts? If you can post detailed STAR stats that would be great.

Since you did not change the -outFilterMultimapNmax parameter for STAR it is set to 10 (default) so STAR is reporting up to 10 alignments per such reads. Those will not be counted by featureCounts unless you turn the -M option on.

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax83k

You can obtain counts directly from STAR with --quantMode GeneCounts, this quantification should be similar to that obtained with featureCounts. You didn't use -s in your featureCounts command, are you sure the Nugen Ovation results in an unstranded library?

ADD REPLYlink written 12 months ago by h.mon29k

Good point (see this, under Data analysis) :

Note that in libraries generated by the Ovation Universal RNA-Seq System, the forward read corresponds to the sense strand.

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax83k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1014 users visited in the last hour