Question: Quality Control On Publicly Available Datasets
gravatar for skm770
5.9 years ago by
skm770150 wrote:

Hi all,

I am in the process of analyzing number of publicly available datasets for RNA-seq and methylation at SRA and GEO. But I am stuck at the first step of quality control.

This is what I did :-

  • Downloaded and combined all the fastq files for a particular experiment into one file for that experiment
  • Ran the fastqc analysis : results were pretty bad

For quality control I have tried a couple of tools "cutadapt" and "trimmomatic" (for datasets which had Illumina as platform) running this command of cutadapt does not removes any adaptor sequences from the file.

/u1/tools/public/cutadapt/bin/cutadapt -q 10 -a GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG -b ACACTCTTTCCCTACACGACGCTCTTCCGATCT GSM602252.fastq > GSM602252_trim.fastq

and the fastqc results obtained on running fastqc on trimmed file are the same. Kindly let me know where I am going wrong and what should I do to correct it.

The data set that I am trying to analyse also has ABI-SOLID RNA-seq data and it has it in the form of fastq files. I have never analysed ABI-SOLID data but from what I have heard/read it has two file the .csqual file and .csfasta file but in sra it is in the form of fastq file although the fastq file is has numbers instead of sequences in place. I would really appreciate if any body could provide pointers as to what to do in cases like this.


geo sra ngs methylation rna-seq • 2.8k views
ADD COMMENTlink modified 5.6 years ago by Prakki Rama2.3k • written 5.9 years ago by skm770150

No answer Anybody!!

ADD REPLYlink written 5.9 years ago by skm770150
gravatar for Abhi
5.6 years ago by
United States
Abhi1.5k wrote:

on a quick note it is not necessary that diff technologies (Illumina / ABI SOLiD) etc will have same adaptor sequence.

Few things to note:

  1. If you mix the fastq from ABI and Illumina it is likely they will have different quality scoring schemes etc
  2. It would be helpful if you could explain what you those bad QC results were. May be that would shed more light on whats going on here.
  3. check for ABI adaptors sequences and see if they match the ones you are using

hth, -Abhi

ADD COMMENTlink written 5.6 years ago by Abhi1.5k
gravatar for Prakki Rama
5.6 years ago by
Prakki Rama2.3k
Prakki Rama2.3k wrote:

We had similar case when dealing with illumina data. So we had to use not only the exact adapter sequence, but also the reverse, as well as reverse compliment of adapter. Moreover, we also observed cases where the adapters occured both in the beggining and also in the end of the sequencing read. So, we had to use multiple '-b' option and perform the adapter trimming.

ADD COMMENTlink written 5.6 years ago by Prakki Rama2.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1452 users visited in the last hour