Question: Snps From Genome-2-Genome Alignment
0
gravatar for Darked89
6.3 years ago by
Darked894.2k
Barcelona, Spain
Darked894.2k wrote:

Hello,

I have mapped Illumina reads from two strains of a small eukaryote (say S1 and S2) plus and two reference genomes: G1 (closest to S1 and S2) and sister species G2. It seems that some chromosomes/chromosome parts in almost identical S1 and S2 are "borrowed" from G2, whereas other parts are more G1-like.

Using GATK (or VarScan) I am able call SNPs in S1 and S2 genomic data. Since G1 and G2 are quite close, I want to get G2 SNPs after performing G2 to G1 alignment.

I have MAF file which I can convert to SAM/BAM, but so far did not managed to get SNPs using VarScan or GATK. Parsing MAF and writing VCF would be a big time investment for me, so I am looking for other tools which can do the job. Another, quite likely much easier method would be to get some genomic reads from G2, map them and call SNPs the conventional way. Still, if I already have a G2 genome it should be possible to extract differences (SNPs) in a default (VCF) format.

Thanks for your help.

maf vcf snp • 2.4k views
ADD COMMENTlink modified 6.3 years ago by matted7.0k • written 6.3 years ago by Darked894.2k
1
gravatar for matted
6.3 years ago by
matted7.0k
Boston, United States
matted7.0k wrote:

You can align the G2 chromosomes/contigs to G1 using a long-read compatible aligner like bwa-sw or the latest and greatest new tool bwa-mem (both here). You can call SNPs with samtools mpileup from the aligned bams. You should probably use the -B option and think carefully about other parameter choices. I've had reasonable success using this workflow for assembled contigs, but it may depend on the size of the chromosomes of G2. You'll also have to be careful with variant calling quality control thresholds, since this is obviously no longer "common" Illumina data where e.g. default coverage minimums should apply.

ADD COMMENTlink modified 6.3 years ago • written 6.3 years ago by matted7.0k
2

<del>Don't</del> Do use -B for long query sequences. You can either parse raw mpileup or use bcftools just like what you do with short reads. bwa-mem will be better than bwa-sw, but right now bwa-sw is more stable. EDIT: mistake - please use -B (disable BAQ) for long sequences.

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by lh331k

Okay good, I was confused because I was only repeating your advice on another list.

ADD REPLYlink written 6.3 years ago by matted7.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1160 users visited in the last hour