Question

Quality control issues for mRNA sequencing fastq files based on FastQC, based on Per Base Sequence Content

0

Entering edit mode

5.8 years ago

svlachavas ▴ 790

Dear Community,

i would like to ask some comments and suggestions concerning the interpretation of some initial quality control results of fastq files. In detail, in a current RNA-Seq project for gene expression data, based on 2 different cancer cell lines, mRNA sequencing was performed for 12 samples (HiSeq4000, 2 * 75 bases) and very briefly:

a) purification of PolyA containing mRNA molecules using poly-T oligo attached magnetic beads from 1µg total RNA

b) a fragmentation using divalent cations under elevated temperature to obtain approximately 300bp pieces

c) double strand cDNA synthesis

d) Illumina adapters ligation and cDNA library amplification by PCR for sequencing.

So, based on some initial exploratory quality control results based on FastQC, the plethora of samples have failed/got a warning in the section of the "Per Base Sequence Content". This is evident in the relative figures i have also attached, for one sample (which is similar in the others), in which no other quality issues were evident, as also the RIN numbers were fine:

fastq.overall example1 example2 data.quality.extra

Thus, from a small interpretation of the plots, one could argue that these failures/warnings for each sample, are indicative of putatively overepresented sequences, which might have contaminated the library construction ?

Moreover, is there a way to adjust for this, or still downstream analysis is possible ?

Any suggestions or ideas would be grateful !!

fastqc mRNAsequencing RNA-Seq sequence • 1.4k views

ADD COMMENT • link updated 5.8 years ago by WouterDeCoster 47k • written 5.8 years ago by svlachavas ▴ 790

0

Entering edit mode

How to add images to a Biostars post

ADD REPLY • link 5.8 years ago by WouterDeCoster 47k

0

Entering edit mode

Dear WouterDeCoster, still not ok ?

ADD REPLY • link 5.8 years ago by svlachavas ▴ 790

0

Entering edit mode

I don't think a dropbox link will work. Try hosting the image as described in the linked post.

ADD REPLY • link 5.8 years ago by WouterDeCoster 47k

0

Entering edit mode

Moreover, is there a way to adjust for this, or still downstream analysis is possible ?

What are the downstream analyses you intend to perform?

ADD REPLY • link 5.8 years ago by h.mon 35k

score 2 · Answer 1 · 2018-07-09

2

Entering edit mode

5.8 years ago

WouterDeCoster 47k

Random priming generates a "non-random" sequence composition at the start of reads, see also this blog post of QCfail: Positional sequence bias in random primed libraries

ADD COMMENT • link 5.8 years ago by WouterDeCoster 47k

0

Entering edit mode

Thank you very much for your useful link-so, you think that downstream analysis is still feasible ? or you think it would be biased ?

ADD REPLY • link 5.8 years ago by svlachavas ▴ 790

1

Entering edit mode

Since it seems you didn't spend time reading the post carefully (I agree it's easier to just ask me again about it) I'll give you a quote:

Whilst the warnings generated by this problem reflect a real issue it’s not something which can be fixed, and doesn’t seem to have any serious consequences for downstream analysis.

ADD REPLY • link 5.8 years ago by WouterDeCoster 47k

0

Entering edit mode

Dear WouterDeCoster,

of course i have read your very useful post, and also found this specific part-my mistake here, as i did not mention specifically my target goal of downstream analysis, in order for your answer to be more helpful:

our goal, is to essentially test for the over- or under-representation of a small RNA-motif based on the groups of samples in specific target genes, which motifs have been created from a previous computational pipeline, and have been initially tested with in vitro assays-

that is why my extra question, as it is not directly intended for DE analysis

ADD REPLY • link 5.8 years ago by svlachavas ▴ 790