Illumina-Fastq To Sanger-Fastq Conversion
9.1 years ago
peris ▴ 120

Hi All,

I have some old sequence files generated by CASAVA 1.7 and 1.8. I believe the are in illumina-fastq format. I want to align them now by bwa and then call variation by GATK. DO I need to convert them into Sanger fastq format for this? Whats is the best way to do this.

Thanks and regards.

BWA has "-I" parameter. This tells BWA that the input is in the Illumina 1.3+ read format (quality equals ASCII-64). I am not sure if your problem can be resolved using this parameter. I assume that the output bam file will have sanger fastq encoding. I may be wrong.

9.1 years ago
rtliu ★ 2.2k

Besides FastQC, one may use DetermineFastqQualityEncoding.pl from Macdonald lab to determines quality value encoding format in a given fastq file.

perl DetermineFastqQualityEncoding.pl Read1.fastq


It is usually safer to convert fastq to sanger-fastq encoding first, seqtk is one of such fast conversion tools: (how to install seqtk)

seqtk seq -Q64 -V  Read1.fastq > Read1.sanger.fastq

9.1 years ago
Ying W ★ 4.2k

According to the wiki page Illumina 1.8+ (I assume its talking about CASAVA) will be using sanger format. You could check this by either plotting quality value distribution or using a tool like FastQC. I believe, by default, all aligners expect sanger fastq encoding. There is some software listed on the wiki page to convert encoding formats.

