Question: encode data for RNAseq, type of library
0
gravatar for tonja.r
3.2 years ago by
tonja.r450
UK
tonja.r450 wrote:

I need to use featureCounts on the public available data from encode project for mm9 from Bing Ren lab. As far as I understood, the rna-seq is strand specific but I need to know the library type (ff-firststrand, ff-secondstrand, ff-unstranded, fr-firststrand, fr-secondstrand, fr-unstranded) to specify correctly the -s parameter of the featureCounts:
 

-s <int>      Indicate if strand-specific read counting should be performed.
                  It has three possible values:  0 (unstranded), 1 (stranded) and
                  2 (reversely stranded). 0 by default.
rna-seq • 1.2k views
ADD COMMENTlink modified 3.1 years ago by microbe7730 • written 3.2 years ago by tonja.r450
0
gravatar for iraun
3.2 years ago by
iraun3.5k
Norway
iraun3.5k wrote:

There are a lot of posts related to this issue. I'd recommend you  to give a try to infer_experiment.py script, inside RSeQC package: http://rseqc.sourceforge.net/#infer-experiment-py. Read the manual, and if you get in troubles, you can always come back here.

ADD COMMENTlink written 3.2 years ago by iraun3.5k

I found out that I have single-end reads with the following configuration. 

Does it mean that I need to use -s 2 for featureCounts and for cuffdiff fr-secondstrand?


2.     +-,-+

  • read mapped to ‘+’ strand indicates parental gene on ‘-‘ strand

  • read mapped to ‘-‘ strand indicates parental gene on ‘+’ strand

  •  
ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by tonja.r450
0
gravatar for microbe77
3.1 years ago by
microbe7730
USA
microbe7730 wrote:

I had the same question in mind. I use the Kapa RNA-seq kit which is based on the dUTP method. We did this on bacteria RNA. As I understand from this post: https://support.bioconductor.org/p/66733/ that we should use the -s 2 option. Honestly, I lack good understanding of why exactly we should use these options. I wished that the documentation were more specific.

These are the stats from different -s options:

-s = 0 option:

Total fragments : 106454
Successfully assigned fragments : 87545 (82.2%)

-s =1 option:

Total fragments : 106454
Successfully assigned fragments : 15582 (14.6%)

-s =2 option:

Total fragments : 106454
Successfully assigned fragments : 72078 (67.7%)

So -s=0 is the highest because it combines all possible alignments and roughly equals the sum of -s 1 and -s 2.

This is the command line used:

featureCounts -g gene_id -t exon -f -O -T 1 -p -d 100 -D 1000 -s 2 -P -R -a file.gtf -o fileout file.bam

 

Please let me know if you think there is something wrong with these options. Many thanks

 

ADD COMMENTlink written 3.1 years ago by microbe7730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 713 users visited in the last hour