Question

Content of T was larger than others in NSR-RNAseq data, why?

0

Entering edit mode

7.5 years ago

wm ▴ 560

I found a strange base content in an NSR-RNAseq data (SE60), T% was higher than all others (A.C.G), Can anyone tell, what's wrong with the data set?

The fastqc output (default parameter) were shown below:

Add: This data set was generated by Illumina HiSeq2000

Clip reads (TruSeq adapter Index), discard the 5 bases at the 5' end:

cutadapt -a  AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -q 20 -m 20 --cut=5 -o out.fq in.fq

Raw

enter image description here

Clip adapter from the 3' end

enter image description here

RNA-Seq • 1.6k views

ADD COMMENT • link updated 12 months ago by Ram 43k • written 7.5 years ago by wm ▴ 560

0

Entering edit mode

Could you elaborate on the experimental procedure to generate this library?

ADD REPLY • link 7.5 years ago by WouterDeCoster 47k

0

Entering edit mode

It is an NSR-primed whole transcriptome cDNA library, you can find the details here: http://www.nature.com/nmeth/journal/v6/n9/fig_tab/nmeth.1360_F1.html

enter image description here

ADD REPLY • link 7.5 years ago by wm ▴ 560

0

Entering edit mode

Is your data Illumina? Illumina has know issues at the 5' end resulting in biased nucleotides. Depending on how the sample was processed, this could be the result of Nextera tagmentation bias (e.g. Fig4 here DOI: 10.1186/s12859-016-0976-y). Alternatively can be caused by not so random, random hexamers (http://seqanswers.com/forums/showthread.php?t=11843). Your first figure looks rather extreme though, like WouterDeCoster said - how was the library generated?

ADD REPLY • link 7.5 years ago by Tonor ▴ 480

score 2 · Answer 1 · 2016-11-01

2

Entering edit mode

7.5 years ago

harold.smith.tarheel ★ 4.9k

Strand-specific mRNA-Seq that contains a substantial amount of poly(A) contamination can produce plots like this one. It's often indicative of degradation of the RNA sample.

ADD COMMENT • link 7.5 years ago by harold.smith.tarheel ★ 4.9k