Hello, beginner in HT sequencing here!
I am analysing this GEO dataset which has been produced on an AB SOLiD platform. I downloaded the samples in .sra format and the reference genome in .fasta format. My project involves aligning the reads against the reference genome and calling the enriched peaks. However, I have run into a few problems and have confused myself along the way. Can a user who has had experience with AB SOLiD colorspace reads explain to me the correct way to analyse such data?
I first tried converting the .sra files to .csfasta and .qual files using abi-dump, then aligning them using bowtie against a colorspace index. This seemed to work fine until I tried to convert from .sam to .bam and an error about sequence lengths was thrown. I also tried using fastq-dump to convert from .sra to .fastq but I have read this isn't correct as converting from colorspace to basespace is errorsome. I've also followed the advice on this website which seemed promising, however the fastq output at the end, when assessed using FASTQC gave dreadful results and so I assumed there was a muck up in the conversion?
Ultimately, I have confused myself and would really appreciate any advice.
Thanks,
James
Hey James, Have a look at this post. It may be able to help you Error In Converting Sam To Bam By Samtools