Question: Contig name difference due to reference genome
gravatar for nuketbilgen
9 days ago by
United Kingdom
nuketbilgen30 wrote:

Hi everyone,

I have vcf files of 4 feline genomes, but in vcf header I see different contig names. I checked the reference genome file line, you can see it below.


Two of my genomes aligned to the first one, the other two aligned to the second one. I want to merge this vcfs and run an LD analysis but I can not.

How can I solve this? Thanks...

alignment next-gen genome • 92 views
ADD COMMENTlink modified 9 days ago • written 9 days ago by nuketbilgen30

Are they the same genome builds?

ADD REPLYlink written 9 days ago by genomax69k

A quick Google-search yielded: felCat9.fa (UCSC Genome Browser) and GCF_000181335.3_Felis_catus_9.0_genomic.fa (NCBI)

ADD REPLYlink modified 9 days ago • written 9 days ago by jean.elbers1.1k

exactly yes. When I split vcf files into chr by SnpSift split command, I got 40 files for felcat9.fa aligned files, and I got 426 files for NCBI one. I worry to lose important variants...

ADD REPLYlink written 9 days ago by nuketbilgen30

I think the biostar community needs more information to your post to help you, such as how the VCF files were produced. If the only difference is in naming, then a quick regular expression or search and replace command can replace the column 1 value from an old, undesired name to a new, desired name.

perl -pe "s/oldname/newname/g" input.vcf > output.vcf

Note that this above command assumes that oldname only occurs in the column1 of the VCF file.

ADD REPLYlink modified 9 days ago • written 9 days ago by jean.elbers1.1k

Hi again, vcf files generated by GATK haplotypecaller walker. Haplotype Calling java -jar GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T HaplotypeCaller -R all.chrs.con.fa -L TEST_Chr01 -I aligned_reads.sorted.dedup.bam --emitRefConfidence GVCF --variant_index_type LINEAR -- variant_index_parameter 128000 -o TEST_Chr01.gvcf

You can find the examples of the contig lines below. These contigs also have variations, and if file has variation on "contig=ID=chrA1_NW_019365239v1_random,length=46965>" same variation is located on "contig=<id=chra1_random,length=415283>" for the other two files. So the chr naming on the same positioned SNPs are different as well...

First two files contig example;





Other two files contig example;












contig=ID=chrA2,length=171471747> . . .

ADD REPLYlink modified 6 days ago • written 6 days ago by nuketbilgen30

I know its a long shot, but would you suggest that I merge the files according to their chrs? like this?

I=PasaHardFiltered.chrA1_NW_019365239v1_random.vcf I=PasaHardFiltered.chrA1_NW_019365240v1_random.vcf I=PasaHardFiltered.chrA1_NW_019365241v1_random.vcf I=PasaHardFiltered.chrA1_NW_019365243v1_random.vcf I=PasaHardFiltered.chrA1_NW_019365244v1_random.vcf I=PasaHardFiltered.chrA1_NW_019365246v1_random.vcf I=PasaHardFiltered.chrA1_NW_019365247v1_random.vcf I=PasaHardFiltered.chrA1_NW_019365248v1_random.vcf O=PasaHardFilteredchrA1random.vcf
ADD REPLYlink modified 9 days ago by genomax69k • written 9 days ago by nuketbilgen30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 743 users visited in the last hour