Question: Issues with Rhesus Monkey RNASeq Analysis -- featureCounts has very low assignment rates
0
gravatar for harithaa
10 months ago by
harithaa0
harithaa0 wrote:

Hello,

I am working with non-human RNA-seq for the first time and this is with the Rhesus Monkey. I am having very low alignment rates after featureCounts. The out put is like so:

> Load annotation file rheMac10refGene.gtf.sorted ...                     
   Features : 47357                                              
    Meta-features : 6331                                  
Chromosomes/contigs : 46                                                

> Process BAM file *****.bam...              
Single-end reads are included.                                          
    Assign alignments to features...                                        
    Total alignments : 84767064                                             
    Successfully assigned alignments : 8615189 (10.2%)                      
    Running time : 4.04 minutes

So the steps I've done so far:

  1. Downloaded the entire assembly from ftp://hgdownload.soe.ucsc.edu/goldenPath/rheMac10/bigZips/*

  2. Converted the 2bit format to .fa using twoBitToFa tool from rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/

-- this is the file, which after indexing was the reference file for HISAT2

  1. Next, for the GTF file I downloaded the rheMac10.refGene.txt and converted it to a GTF file using genePredToGtf tool. Which after cleaning, supplied as the gtf file for featureCounts.

Is this the way to go? My mapping rates after HISAT2 are high and comparable to the human genome ones. What am I doing wrong?

rna-seq • 257 views
ADD COMMENTlink modified 10 months ago • written 10 months ago by harithaa0

featureCounts has very low alignment rate

You have low assignment rates.

Can you provide actual featureCounts command you are using? Did you actually use a matched reference and annotation when you did all your analysis? You can't mix and match those from different sources.

ADD REPLYlink written 10 months ago by GenoMax94k
for sortedbam in $(ls *.bam)
do
name=${sortedbam##*/} #removes the filepath name up until the / of the file name
base=${name%.bam}

echo $base && featureCounts -T 12  -a /ref/rheMac10refGene.gtf.sorted \
-o /counts/"${base}".txt $sortedbam

done
ADD REPLYlink modified 10 months ago by GenoMax94k • written 10 months ago by harithaa0

Yes, my reference was also from rheMac10 and the annotation (derived from rheMac10.refGene.txt) is also from the same. -- The reference genome (.fa) is only 2.9G and the GTF file 17M. This is also a part that confuses me. Is something wrong with this? This is the reference file I used

rheMac10.2bit - contains the complete rhesus/rheMac10 genome sequence in the 2bit file format. Repeats from RepeatMasker and Tandem Repeats Finder (with period of 12 or less) are shown in lower case; non-repeating sequence is shown in upper case. The utility program, twoBitToFa (available from the kent src tree), can be used to extract .fa file(s) from this file. A pre-compiled version of the command line tool can be found at: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/

ADD REPLYlink written 10 months ago by harithaa0

A couple of things to check first. If you have rRNA contamination in the data. It is possible that a lot of your reads are multi-mapping. These are not counted by featureCounts by default. You should also add -p (if this was a paired end dataset). You would want to count at exon level, but summarize at gene level (something like -t exon -g gene_id ). I don't recollect immediately if that is the default.

ADD REPLYlink modified 10 months ago • written 10 months ago by GenoMax94k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2080 users visited in the last hour