lncRNA quantification using featureCounts
0
0
Entering edit mode
8 months ago
kuntalasb ▴ 10

Hello, I am trying to obtain read counts for differential expression of lncRNAs by using featureCounts but everytime I am getting low alignment rates (0.2%, 1.6% etc). Since I am new to this field, I have no idea whether I should worry about the alignment rates or proceed with it. I am using sam files as input alignment files. Is there any way to increase the alignment rate or am I committing any mistake? My library is paired-end, not strand-specific and the code that I have used is this:

fc=featureCounts(files,annot.ext="annot.gtf",isGTFAnnotationFile=TRUE,isPairedEnd=TRUE,GTF.attrType="transcript_id")


The output summary:

    || Process SAM file EU1sorted.sam...                                          ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 26777689                                             ||
||    Successfully assigned alignments : 547535 (2.0%)                        ||
||    Running time : 29.05 minutes                                            ||
|| Process SAM file EU2sorted.sam...                                          ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 23341158                                             ||
||    Successfully assigned alignments : 428156 (1.8%)                        ||
||    Running time : 24.80 minutes                                            ||
||                                                                            ||
|| Process SAM file EU3sorted.sam...                                          ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 30994812                                             ||
||    Successfully assigned alignments : 535538 (1.7%)                        ||
||    Running time : 32.14 minutes                                            ||


Eagerly waiting for assistance. Thank you!

lncRNA featureCounts • 323 views
1
Entering edit mode

Are your input data alignments with mRNA-Seq (poly-A selection) against a genome or transcriptome, and is annot.gtf just lncRNAs? Also, how was annot.gtf obtained? Edit: do you only have two samples (one biological replicate per treatment)?

0
Entering edit mode

My input sam files are generated by aligning the RNA-seq data files with reference genome and the library is not poly-A selected. And yes, annot.gtf is just lncRNAs which is obtained from the final merged.gtf (that contains all coding/noncoding transcripts). annot.gtf is obtained after several filtering processes for lncRNAs. I have three replicates for each sample

1
Entering edit mode

If this is then total RNA-Seq, then the rest (~98%) is perhaps ribosomal RNAs and coding RNAs.

0
Entering edit mode

Ok, then probably I can carry out the downstream processes without much worrying about the alignment rate? Thank you so much!

1
Entering edit mode

I would not say "without much worrying" as with any experiment, there can be problems in the wet lab that can have some strange influences on the data. I was merely offering a suggestion. The number of counts per sample seems similar based on the data that you have shown. You can search the forum for more suggestions for RNA QC, but perhaps you might check features RSeQC from http://rseqc.sourceforge.net/, especially for 5' to 3' coverage bias (edit so-called http://rseqc.sourceforge.net/#genebody-coverage-py)

0
Entering edit mode

Many thanks for your valuable suggestion!

0
Entering edit mode

You are welcome, no problem.