Question

encode data for RNAseq, type of library

0

Entering edit mode

8.3 years ago

tonja.r ▴ 600

I need to use featureCounts on the public available data from encode project for mm9 from Bing Ren lab. As far as I understood, the rna-seq is strand specific but I need to know the library type (ff-firststrand, ff-secondstrand, ff-unstranded, fr-firststrand, fr-secondstrand, fr-unstranded) to specify correctly the -s parameter of the featureCounts:

-s <int>      Indicate if strand-specific read counting should be performed.
                  It has three possible values:  0 (unstranded), 1 (stranded) and
                  2 (reversely stranded). 0 by default.

RNA-Seq • 2.8k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.3 years ago by tonja.r ▴ 600

Ram · Answer 1 · 2015-12-23

0

Entering edit mode

8.3 years ago

iraun 6.2k

There are a lot of posts related to this issue. I'd recommend you to give a try to infer_experiment.py script, inside RSeQC package. Read the manual, and if you get in troubles, you can always come back here.

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.3 years ago by iraun 6.2k

0

Entering edit mode

I found out that I have single-end reads with the following configuration.

Does it mean that I need to use -s 2 for featureCounts and for cuffdiff fr-secondstrand?

+-,-+
- read mapped to '+' strand indicates parental gene on '-' strand
- read mapped to '-' strand indicates parental gene on '+' strand
- ...

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.3 years ago by tonja.r ▴ 600

Ram · Answer 2 · 2016-01-28

I had the same question in mind. I use the Kapa RNA-seq kit which is based on the dUTP method. We did this on bacteria RNA. As I understand from this post that we should use the -s 2 option. Honestly, I lack good understanding of why exactly we should use these options. I wished that the documentation were more specific.

These are the stats from different -s options:

-s 0 option:

Total fragments : 106454
Successfully assigned fragments : 87545 (82.2%)

-s 1 option:

Total fragments : 106454
Successfully assigned fragments : 15582 (14.6%)

-s 2 option:

Total fragments : 106454
Successfully assigned fragments : 72078 (67.7%)

So -s 0 is the highest because it combines all possible alignments and roughly equals the sum of -s 1 and -s 2.

This is the command line used:

featureCounts -g gene_id -t exon -f -O -T 1 -p -d 100 -D 1000 -s 2 -P -R -a file.gtf -o fileout file.bam

Please let me know if you think there is something wrong with these options. Many thanks