Question

Snps From Genome-2-Genome Alignment

0

Entering edit mode

11.1 years ago

Darked89 4.6k

Hello,

I have mapped Illumina reads from two strains of a small eukaryote (say S1 and S2) plus and two reference genomes: G1 (closest to S1 and S2) and sister species G2. It seems that some chromosomes/chromosome parts in almost identical S1 and S2 are "borrowed" from G2, whereas other parts are more G1-like.

Using GATK (or VarScan) I am able call SNPs in S1 and S2 genomic data. Since G1 and G2 are quite close, I want to get G2 SNPs after performing G2 to G1 alignment.

I have MAF file which I can convert to SAM/BAM, but so far did not managed to get SNPs using VarScan or GATK. Parsing MAF and writing VCF would be a big time investment for me, so I am looking for other tools which can do the job. Another, quite likely much easier method would be to get some genomic reads from G2, map them and call SNPs the conventional way. Still, if I already have a G2 genome it should be possible to extract differences (SNPs) in a default (VCF) format.

Thanks for your help.

snp maf vcf • 3.5k views

ADD COMMENT • link updated 11.1 years ago by matted 7.8k • written 11.1 years ago by Darked89 4.6k

score 1 · Answer 1 · 2013-03-19

1

Entering edit mode

11.1 years ago

matted 7.8k

You can align the G2 chromosomes/contigs to G1 using a long-read compatible aligner like bwa-sw or the latest and greatest new tool bwa-mem (both here). You can call SNPs with samtools mpileup from the aligned bams. You should probably use the -B option and think carefully about other parameter choices. I've had reasonable success using this workflow for assembled contigs, but it may depend on the size of the chromosomes of G2. You'll also have to be careful with variant calling quality control thresholds, since this is obviously no longer "common" Illumina data where e.g. default coverage minimums should apply.

ADD COMMENT • link 11.1 years ago by matted 7.8k

2

Entering edit mode

<del>Don't</del> Do use -B for long query sequences. You can either parse raw mpileup or use bcftools just like what you do with short reads. bwa-mem will be better than bwa-sw, but right now bwa-sw is more stable. EDIT: mistake - please use -B (disable BAQ) for long sequences.

ADD REPLY • link 11.1 years ago by lh3 33k

0

Entering edit mode

Okay good, I was confused because I was only repeating your advice on another list.

ADD REPLY • link 11.1 years ago by matted 7.8k