Question: How to improve fastq quality based on fastqc output ?
0
gravatar for Angelique
20 months ago by
Angelique10
Angelique10 wrote:

Good morning,

I performed a FASTQC analysis on a fastq file ( the results :https://drive.google.com/open?id=1SzMSjaKuOdFL62r-ouJeCwslHN7Ndp_Q ), the results are ok for the main points but wrong for others. The duplication level is really high (81%), some kmer are enriched and the GC content is high too. I don't know how to improve the quality of the file. i think I should trim but I don't know where. Thank you for your advices

rna-seq • 690 views
ADD COMMENTlink modified 20 months ago • written 20 months ago by Angelique10
1

I see that you tagged rna-seq in the topic. What is the sequencing kit you used ?

You have reads (I presume single reads) with length = 50bp, is that correct ? Or did you cut all the graphs ?

The first 13 bases of your data are not very well distributed in term of nucleotide. Maybe try to remove them using Trimmomatic and re-process FastQC on output data.

ADD REPLYlink modified 20 months ago • written 20 months ago by Bastien HervĂ©4.5k
2

This is RNAseq data. Nucleotide distribution at the beginning of the reads is characteristic and does not require trimming.

ADD REPLYlink written 20 months ago by genomax76k
1

Ok ! When I saw reads with 50 bases long I wasn't sure about the RNAseq analysis. Thanks for the info

ADD REPLYlink written 20 months ago by Bastien Hervé4.5k
1

First, you need to clarify what you have sequenced using NGS platform and second, what is the aim of your project. Because all these parameters need to tackle carefully based on your requirement. For instance, RNASeq data have high duplication rate, amplicon sequencing can have abnormal GC content etc.

ADD REPLYlink written 20 months ago by toralmanvar830

I am working with public RNA-seq data set ( from https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP091947, the reads are already all cut to 50 bp) and I want to perform a differential expression analysis with this data. It is sequenced with llumina HiSeq 2000, paired-end from human hepatocytes.

ADD REPLYlink modified 20 months ago • written 20 months ago by Angelique10
1

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

I am going to suggest that you proceed with alignment and downstream analysis as is. Manipulating this data is likely not going to lead to "improvement". STAR/DESeq2 (or salmon) would be the way to go.

ADD REPLYlink modified 20 months ago • written 20 months ago by genomax76k

Sorry I am new to the forum and to RNA-seq analysis ... Thank you for all your answers. So the fastq file is ok according to an RNA-seq experiment even if the eleven first bases are weird ?

ADD REPLYlink modified 20 months ago • written 20 months ago by Angelique10
4

Yes it should be fine. Please see this blog post by Dr. Simon Andrews (Author of FastQC). You may also want to read some of the other FastQC related posts to understand other tests it does.

ADD REPLYlink written 20 months ago by genomax76k

Ok Thanks a lot for your help !

ADD REPLYlink written 20 months ago by Angelique10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 591 users visited in the last hour