I have a list of SNPs identified using RADseq and mapped to the 6th Rattus norvegicus genome build (rn6). I would like to extract these SNPs from 12 whole genomes which were mapped to the 5th genome build (rn5). I was planning on: first using mpileup in samtools to convert the whole genome BAM to a VCF using the rn5 genome
samtools1.2 mpileup -v -o rat.vcf -f ./Rnor_5/rn5.fa rat.bam
Second, use LiftoverVCF in picard to convert the coordinates
java -jar /usr/local/picard-tools-2.1.0/picard.jar LiftoverVcf I=grat.vcf O=rat_lift6.vcf CHAIN=./Rnor_5/rn5ToRn6.over.chain.gz REJECT=rat_reject.vcf R=./Rnor_6/rn6.dict
Then use mpileup again with a BED list (-l) of the SNPs I want to extract after the files are in the correct genome annotation.
If there are better ways to solve this problem, I am open to suggestions.
However, the trouble I think I am running in to is that the headers for the rn5, rn6, and over.chain files are all different. For example chromosome 1 is alternately called:
User$ head rn5.dict @HD VN:1.5 SO:unsorted @SQ SN:chr1 LN:290094216 M5:9397441152f18c800456aa682b0935c6 UR:file:/Volumes/urbanevol/Rnor_5/rn5.fa User$ head rn6.dict @HD VN:1.5 SO:unsorted @SQ SN:CM000072.5 LN:282763074 M5:67cb6bbed9d7e4434bd6d277e3f883c2 UR:file:/Volumes/urbanevol/Rnor_6/GCA_000001895.4_Rnor_6.0_genomic.fna
In the over.chain file chr1 for rn6 is called chr1 and not CM000072.5; example:
chain 23677882575 chr1 290094216 + 0 290094216 chr1 282763074 + 0 282761901 1
Finally, in the BAM file for the downloaded genomes, chr 1 is called 1:
User$ samtools1.2 view -H rat.bam | head @HD VN:1.4 SO:coordinate @SQ SN:1 LN:290094216 UR:file:/nfs/srpipe_references/references/Rattus_norvegicus/Rnor_5.0/all/fasta/Rattus_norvegicus.Rnor_5.0.71.fasta AS:Rnor 5.0 M5:9397441152f18c800456aa682b0935c6 SP:Rattus norvegicus
My question is: how can I reconcile these different header names coming from different sources? samtools repeats errors that chromosomes from the input BAM cannot be identified, although it outputs a VCF. When I run the Picard command above, I recieve this error:
Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
Kind thanks for any advice.