Question

Multi-mapping High with featureCounts but not STAR

6

Entering edit mode

4.9 years ago

garbuzov ▴ 70

Hi everyone, I am sequencing neurons isolated from rat. I'm working with a limited amount of material, so using Nugen Ovation Universal library kits. The QC using FastQC looks ok. My library is paired-end with 75bp reads. I ran STAR using this command:

STAR --runThreadN 3 \
--genomeDir ../genomes/rn6/Ensembl/star2 \
--readFilesCommand gunzip -c \
--readFilesIn ${R1} ${R2} \
--outFileNamePrefix starMapped/${job_name} \
--outSAMtype BAM Unsorted \
--seedSearchStartLmax 40 \
--outFilterScoreMinOverLread 0.5 \
--outFilterMatchNminOverLread 0.5

My unique mapped reads is ~70-90% for all samples with multi-mapped reads 7-15%.

I used the same GTF file to then make the count table using featureCounts using this command:

featureCounts -T 6 -p -t exon -g gene_name -a ../genomes/rn6/Ensembl/Rattus_norvegicus.Rnor_6.0.93.gtf -o combined_counts.txt *.bam

But my % assigned reads is only 33% and unassigned_multimap reads are ~30% of my reads and unassigned_NoFeature are the other third. I don't understand how the number of multi-mapping reads and mapping can be so different between the two packages given I am using the same GFT file. What's going on here? And why is the number of unassigned reads so high with featureCounts?

RNA-Seq sequencing alignment • 5.1k views

ADD COMMENT • link updated 4 weeks ago by GenoMax 141k • written 4.9 years ago by garbuzov ▴ 70

1

Entering edit mode

I have a similar problem to yours. I also used the Rnor6 gtf annotation from ENSEMBL for alignment and quantification. The report from read_distribution.py from RSeQC package indicates that ~20-30% reads were aligned to intronic or intergenic regions. You can also try to see your results:

read_distribution.py -r /path/to/hg38_gencode_v29.bed -i /path/to/bam

read_distribution.py: http://rseqc.sourceforge.net/#read-distribution-py

I suspect this is because the Rat annotation file is still incomplete compared with human and mouse

ADD REPLY • link 4.8 years ago by biock ▴ 60

0

Entering edit mode

You can try StringTie, it will assemble novel transcripts based on the genome, and quantify known and novel transcripts.

ADD REPLY • link 4.8 years ago by h.mon 35k

0

Entering edit mode

Thank you! I have used stringtie. But I still want to know that why there are so many reads were aligned to intronic region, is this normal for human data? Are there any way to determine whether this is due to the sequencing data or to my analyzing pipeline?

(I just analyzed a group of human data today. Uniquely mapping rate looks well (~90%). While featureCount can only assign ~40% reads successfully. Report from read_distribution.py shows that there are about ~20% reads.)

Thanks!

ADD REPLY • link 4.8 years ago by biock ▴ 60

0

Entering edit mode

You can obtain counts directly from STAR with --quantMode GeneCounts, this quantification should be similar to that obtained with featureCounts. You didn't use -s in your featureCounts command, are you sure the Nugen Ovation results in an unstranded library?

ADD REPLY • link 4.9 years ago by h.mon 35k

0

Entering edit mode

Good point (see this, under Data analysis) :

Note that in libraries generated by the Ovation Universal RNA-Seq System, the forward read corresponds to the sense strand.

ADD REPLY • link 4.9 years ago by GenoMax 141k

score 4 · Answer 1 · 2019-06-03

4

Entering edit mode

4.9 years ago

GenoMax 141k

Can you confirm that you are using the BAM files generated by STAR for counting with featureCounts? If you can post detailed STAR stats that would be great.

Since you did not change the -outFilterMultimapNmax parameter for STAR it is set to 10 (default) so STAR is reporting up to 10 alignments per such reads. Those will not be counted by featureCounts unless you turn the -M option on.

ADD COMMENT • link 4.9 years ago by GenoMax 141k

0

Entering edit mode

But doesn't it still not explain why the percentages of multimapped reads accross featurecounts and star are different? Because -outFilterMultimapNmax 10 will still include the reads as multimapped (even though they become aligned in star). I am having the same issue, where my starmultimapping is 20, but featurecounts has it at 80%

ADD REPLY • link 5 weeks ago by Daniel ▴ 30

0

Entering edit mode

In one case we are referring to number of locations where a read maps and in other %.

ADD REPLY • link 4 weeks ago by GenoMax 141k