Question: Quality control issues for mRNA sequencing fastq files based on FastQC, based on Per Base Sequence Content
0
gravatar for svlachavas
5 months ago by
svlachavas530
Greece
svlachavas530 wrote:

Dear Community,

i would like to ask some comments and suggestions concerning the interpretation of some initial quality control results of fastq files. In detail, in a current RNA-Seq project for gene expression data, based on 2 different cancer cell lines, mRNA sequencing was performed for 12 samples (HiSeq4000, 2 * 75 bases) and very briefly:

a) purification of PolyA containing mRNA molecules using poly-T oligo attached magnetic beads from 1µg total RNA

b) a fragmentation using divalent cations under elevated temperature to obtain approximately 300bp pieces

c) double strand cDNA synthesis

d) Illumina adapters ligation and cDNA library amplification by PCR for sequencing.

So, based on some initial exploratory quality control results based on FastQC, the plethora of samples have failed/got a warning in the section of the "Per Base Sequence Content". This is evident in the relative figures i have also attached, for one sample (which is similar in the others), in which no other quality issues were evident, as also the RIN numbers were fine:

fastq.overall example1 example2 data.quality.extra

Thus, from a small interpretation of the plots, one could argue that these failures/warnings for each sample, are indicative of putatively overepresented sequences, which might have contaminated the library construction ?

Moreover, is there a way to adjust for this, or still downstream analysis is possible ?

Any suggestions or ideas would be grateful !!

ADD COMMENTlink modified 5 months ago by WouterDeCoster35k • written 5 months ago by svlachavas530

How to add images to a Biostars post

ADD REPLYlink written 5 months ago by WouterDeCoster35k

Dear WouterDeCoster, still not ok ?

ADD REPLYlink written 5 months ago by svlachavas530

I don't think a dropbox link will work. Try hosting the image as described in the linked post.

ADD REPLYlink written 5 months ago by WouterDeCoster35k

Moreover, is there a way to adjust for this, or still downstream analysis is possible ?

What are the downstream analyses you intend to perform?

ADD REPLYlink written 5 months ago by h.mon22k
2
gravatar for WouterDeCoster
5 months ago by
Belgium
WouterDeCoster35k wrote:

Random priming generates a "non-random" sequence composition at the start of reads, see also this blog post of QCfail: Positional sequence bias in random primed libraries

ADD COMMENTlink written 5 months ago by WouterDeCoster35k

Thank you very much for your useful link-so, you think that downstream analysis is still feasible ? or you think it would be biased ?

ADD REPLYlink written 5 months ago by svlachavas530
1

Since it seems you didn't spend time reading the post carefully (I agree it's easier to just ask me again about it) I'll give you a quote:

Whilst the warnings generated by this problem reflect a real issue it’s not something which can be fixed, and doesn’t seem to have any serious consequences for downstream analysis.

ADD REPLYlink written 5 months ago by WouterDeCoster35k

Dear WouterDeCoster,

of course i have read your very useful post, and also found this specific part-my mistake here, as i did not mention specifically my target goal of downstream analysis, in order for your answer to be more helpful:

our goal, is to essentially test for the over- or under-representation of a small RNA-motif based on the groups of samples in specific target genes, which motifs have been created from a previous computational pipeline, and have been initially tested with in vitro assays-

that is why my extra question, as it is not directly intended for DE analysis

ADD REPLYlink written 5 months ago by svlachavas530
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1090 users visited in the last hour