encode data for RNAseq, type of library
2
0
Entering edit mode
8.3 years ago
tonja.r ▴ 600

I need to use featureCounts on the public available data from encode project for mm9 from Bing Ren lab. As far as I understood, the rna-seq is strand specific but I need to know the library type (ff-firststrand, ff-secondstrand, ff-unstranded, fr-firststrand, fr-secondstrand, fr-unstranded) to specify correctly the -s parameter of the featureCounts:

-s <int>      Indicate if strand-specific read counting should be performed.
                  It has three possible values:  0 (unstranded), 1 (stranded) and
                  2 (reversely stranded). 0 by default.
RNA-Seq • 2.8k views
ADD COMMENT
0
Entering edit mode
8.3 years ago
iraun 6.2k

There are a lot of posts related to this issue. I'd recommend you to give a try to infer_experiment.py script, inside RSeQC package. Read the manual, and if you get in troubles, you can always come back here.

ADD COMMENT
0
Entering edit mode

I found out that I have single-end reads with the following configuration.

Does it mean that I need to use -s 2 for featureCounts and for cuffdiff fr-secondstrand?

  1. +-,-+

    • read mapped to '+' strand indicates parental gene on '-' strand
    • read mapped to '-' strand indicates parental gene on '+' strand
    • ...
ADD REPLY
0
Entering edit mode
8.2 years ago
microbe77 ▴ 30

I had the same question in mind. I use the Kapa RNA-seq kit which is based on the dUTP method. We did this on bacteria RNA. As I understand from this post that we should use the -s 2 option. Honestly, I lack good understanding of why exactly we should use these options. I wished that the documentation were more specific.

These are the stats from different -s options:

-s 0 option:

Total fragments : 106454
Successfully assigned fragments : 87545 (82.2%)

-s 1 option:

Total fragments : 106454
Successfully assigned fragments : 15582 (14.6%)

-s 2 option:

Total fragments : 106454
Successfully assigned fragments : 72078 (67.7%)

So -s 0 is the highest because it combines all possible alignments and roughly equals the sum of -s 1 and -s 2.

This is the command line used:

featureCounts -g gene_id -t exon -f -O -T 1 -p -d 100 -D 1000 -s 2 -P -R -a file.gtf -o fileout file.bam

Please let me know if you think there is something wrong with these options. Many thanks

ADD COMMENT

Login before adding your answer.

Traffic: 2700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6