Question: Contig name difference due to reference genome
0
gravatar for nuketbilgen
9 days ago by
nuketbilgen30
United Kingdom
nuketbilgen30 wrote:

Hi everyone,

I have vcf files of 4 feline genomes, but in vcf header I see different contig names. I checked the reference genome file line, you can see it below.

reference=file:///ifswh1/BC_COM_P1/F18FTSEUHT0898/CATsxlR/analysis/index/GCF_000181335.3_Felis_catus_9.0_genomic.fa
reference=file:///ifshk5/BC_AS/BC_COM_P0/F19FTSEUHT0354/CATbelR/2016/result/index/felCat9.fa

Two of my genomes aligned to the first one, the other two aligned to the second one. I want to merge this vcfs and run an LD analysis but I can not.

How can I solve this? Thanks...

alignment next-gen genome • 92 views
ADD COMMENTlink modified 9 days ago • written 9 days ago by nuketbilgen30

Are they the same genome builds?

ADD REPLYlink written 9 days ago by genomax69k

A quick Google-search yielded: felCat9.fa (UCSC Genome Browser) and GCF_000181335.3_Felis_catus_9.0_genomic.fa (NCBI)

ADD REPLYlink modified 9 days ago • written 9 days ago by jean.elbers1.1k

exactly yes. When I split vcf files into chr by SnpSift split command, I got 40 files for felcat9.fa aligned files, and I got 426 files for NCBI one. I worry to lose important variants...

ADD REPLYlink written 9 days ago by nuketbilgen30

I think the biostar community needs more information to your post to help you, such as how the VCF files were produced. If the only difference is in naming, then a quick regular expression or search and replace command can replace the column 1 value from an old, undesired name to a new, desired name.

perl -pe "s/oldname/newname/g" input.vcf > output.vcf

Note that this above command assumes that oldname only occurs in the column1 of the VCF file.

ADD REPLYlink modified 9 days ago • written 9 days ago by jean.elbers1.1k

Hi again, vcf files generated by GATK haplotypecaller walker. Haplotype Calling java -jar GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T HaplotypeCaller -R all.chrs.con.fa -L TEST_Chr01 -I aligned_reads.sorted.dedup.bam --emitRefConfidence GVCF --variant_index_type LINEAR -- variant_index_parameter 128000 -o TEST_Chr01.gvcf

You can find the examples of the contig lines below. These contigs also have variations, and if file has variation on "contig=ID=chrA1_NW_019365239v1_random,length=46965>" same variation is located on "contig=<id=chra1_random,length=415283>" for the other two files. So the chr naming on the same positioned SNPs are different as well...

First two files contig example;

contig=ID=chrA1,length=242100913>

contig=ID=chrA1_random,length=415283>

contig=ID=chrA2,length=171471747>

contig=ID=chrA2_random,length=1187422>

Other two files contig example;

contig=ID=chrA1,length=242100913>

contig=ID=chrA1_NW_019365239v1_random,length=46965>

contig=ID=chrA1_NW_019365240v1_random,length=58068>

contig=ID=chrA1_NW_019365241v1_random,length=50743>

contig=ID=chrA1_NW_019365242v1_random,length=22574>

contig=ID=chrA1_NW_019365243v1_random,length=50951>

contig=ID=chrA1_NW_019365244v1_random,length=50765>

contig=ID=chrA1_NW_019365245v1_random,length=14920>

contig=ID=chrA1_NW_019365246v1_random,length=45003>

contig=ID=chrA1_NW_019365247v1_random,length=40320>

contig=ID=chrA1_NW_019365248v1_random,length=25974>

contig=ID=chrA2,length=171471747> . . .

ADD REPLYlink modified 6 days ago • written 6 days ago by nuketbilgen30

I know its a long shot, but would you suggest that I merge the files according to their chrs? like this?

I=PasaHardFiltered.chrA1_NW_019365239v1_random.vcf I=PasaHardFiltered.chrA1_NW_019365240v1_random.vcf I=PasaHardFiltered.chrA1_NW_019365241v1_random.vcf I=PasaHardFiltered.chrA1_NW_019365243v1_random.vcf I=PasaHardFiltered.chrA1_NW_019365244v1_random.vcf I=PasaHardFiltered.chrA1_NW_019365246v1_random.vcf I=PasaHardFiltered.chrA1_NW_019365247v1_random.vcf I=PasaHardFiltered.chrA1_NW_019365248v1_random.vcf O=PasaHardFilteredchrA1random.vcf
ADD REPLYlink modified 9 days ago by genomax69k • written 9 days ago by nuketbilgen30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 743 users visited in the last hour