Hi Everyone,
I have a genomic data for a specific strain of mouse. The data consists of short reads generated from both the SOLiD (~20X) and Illumina (~20X) platforms. I have already aligned the reads on the reference genome using BWA (Illumina) and Lifescope (SOLiD) respectively. Following are my questions:
1) What should be my approach for calling variants. Should I call the variants separately for BAM files generated from two different platforms or I can merge these two BAM files and then call for the variants ?
There are many tools which are platform dependent such as Dindel. But tools such as samtools SNP, Indel caller OR GATK are platform independent and in this case the two bam files (SOLiD and Illumina) could be merged and used to call the variants. The colour space reads have already been transformed into base space in the BAM file so I don't think there should be a problem.
2) If Dindel uses BAM files to call for short indels then why it can't be used on SOLiD bam files with transformed nucleotide sequence ? OR why it is not recommended?
3) What should be my approach ? Doing platform specific analysis including calling variants and then choose the common variants ? OR to create a high coverage BAM files by merging two independent BAM files from different platforms and use GATK/samtools ?
Thanks,
Ashutosh
Thanks JC. Your suggestion makes sense. Thanks for the paper too.
your welcome, good luck