I am in the process of analyzing number of publicly available datasets for RNA-seq and methylation at SRA and GEO. But I am stuck at the first step of quality control.
This is what I did :-
- Downloaded and combined all the fastq files for a particular experiment into one file for that experiment
- Ran the fastqc analysis : results were pretty bad
For quality control I have tried a couple of tools "cutadapt" and "trimmomatic" (for datasets which had Illumina as platform) running this command of cutadapt does not removes any adaptor sequences from the file.
/u1/tools/public/cutadapt/bin/cutadapt -q 10 -a GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG -b ACACTCTTTCCCTACACGACGCTCTTCCGATCT GSM602252.fastq > GSM602252_trim.fastq
and the fastqc results obtained on running fastqc on trimmed file are the same. Kindly let me know where I am going wrong and what should I do to correct it.
The data set that I am trying to analyse also has ABI-SOLID RNA-seq data and it has it in the form of fastq files. I have never analysed ABI-SOLID data but from what I have heard/read it has two file the .csqual file and .csfasta file but in sra it is in the form of fastq file although the fastq file is has numbers instead of sequences in place. I would really appreciate if any body could provide pointers as to what to do in cases like this.