Quality Control On Publicly Available Datasets
Entering edit mode
10.2 years ago
skm770 ▴ 150

Hi all,

I am in the process of analyzing number of publicly available datasets for RNA-seq and methylation at SRA and GEO. But I am stuck at the first step of quality control.

This is what I did :-

  • Downloaded and combined all the fastq files for a particular experiment into one file for that experiment
  • Ran the fastqc analysis : results were pretty bad

For quality control I have tried a couple of tools "cutadapt" and "trimmomatic" (for datasets which had Illumina as platform) running this command of cutadapt does not removes any adaptor sequences from the file.

/u1/tools/public/cutadapt/bin/cutadapt -q 10 -a GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG -b ACACTCTTTCCCTACACGACGCTCTTCCGATCT GSM602252.fastq > GSM602252_trim.fastq

and the fastqc results obtained on running fastqc on trimmed file are the same. Kindly let me know where I am going wrong and what should I do to correct it.

The data set that I am trying to analyse also has ABI-SOLID RNA-seq data and it has it in the form of fastq files. I have never analysed ABI-SOLID data but from what I have heard/read it has two file the .csqual file and .csfasta file but in sra it is in the form of fastq file although the fastq file is has numbers instead of sequences in place. I would really appreciate if any body could provide pointers as to what to do in cases like this.


sra geo rna-seq methylation ngs • 3.7k views
Entering edit mode

No answer Anybody!!

Entering edit mode
9.9 years ago
Abhi ★ 1.6k

on a quick note it is not necessary that diff technologies (Illumina / ABI SOLiD) etc will have same adaptor sequence.

Few things to note:

  1. If you mix the fastq from ABI and Illumina it is likely they will have different quality scoring schemes etc
  2. It would be helpful if you could explain what you those bad QC results were. May be that would shed more light on whats going on here.
  3. check for ABI adaptors sequences and see if they match the ones you are using

hth, -Abhi

Entering edit mode
9.9 years ago
Prakki Rama ★ 2.7k

We had similar case when dealing with illumina data. So we had to use not only the exact adapter sequence, but also the reverse, as well as reverse compliment of adapter. Moreover, we also observed cases where the adapters occured both in the beggining and also in the end of the sequencing read. So, we had to use multiple '-b' option and perform the adapter trimming.


Login before adding your answer.

Traffic: 1702 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6