Question: Parsing VCF Data
0
gravatar for gcooper1245
16 months ago by
gcooper124510
gcooper124510 wrote:

So I have about 150 of these VCF files and I forgot to parse the reference before running all 150. For downstream analysis with snpEff, I need to have the chromsome ID only contain JTAI01000001 -> JTAI01000053 in that collumn, not all that other junk. Does anyone have a way in which I could potentially substitute out everything but the middle JTAI01000001 part of these GVCF's so I can proceed with my analysis.

##contig=<ID=ENA|JTAI01000001|JTAI01000001.1,length=360176>
##contig=<ID=ENA|JTAI01000002|JTAI01000002.1,length=959544>
##contig=<ID=ENA|JTAI01000003|JTAI01000003.1,length=208220>
##contig=<ID=ENA|JTAI01000004|JTAI01000004.1,length=470636>
##contig=<ID=ENA|JTAI01000005|JTAI01000005.1,length=225370>
##contig=<ID=ENA|JTAI01000006|JTAI01000006.1,length=364413>
##contig=<ID=ENA|JTAI01000007|JTAI01000007.1,length=1279890>
##contig=<ID=ENA|JTAI01000008|JTAI01000008.1,length=18993>
##contig=<ID=ENA|JTAI01000009|JTAI01000009.1,length=291696>
##contig=<ID=ENA|JTAI01000010|JTAI01000010.1,length=821>
##contig=<ID=ENA|JTAI01000011|JTAI01000011.1,length=128648>
##contig=<ID=ENA|JTAI01000012|JTAI01000012.1,length=66483>
##contig=<ID=ENA|JTAI01000013|JTAI01000013.1,length=592675>
##contig=<ID=ENA|JTAI01000014|JTAI01000014.1,length=1554>
##contig=<ID=ENA|JTAI01000015|JTAI01000015.1,length=3499>
##contig=<ID=ENA|JTAI01000016|JTAI01000016.1,length=5436>
##contig=<ID=ENA|JTAI01000017|JTAI01000017.1,length=1198>
##contig=<ID=ENA|JTAI01000018|JTAI01000018.1,length=6108>
##contig=<ID=ENA|JTAI01000019|JTAI01000019.1,length=9709>
##contig=<ID=ENA|JTAI01000020|JTAI01000020.1,length=523589>
##contig=<ID=ENA|JTAI01000021|JTAI01000021.1,length=97817>
##contig=<ID=ENA|JTAI01000022|JTAI01000022.1,length=268453>
##contig=<ID=ENA|JTAI01000023|JTAI01000023.1,length=215216>
##contig=<ID=ENA|JTAI01000024|JTAI01000024.1,length=79716>
##contig=<ID=ENA|JTAI01000025|JTAI01000025.1,length=121647>
##contig=<ID=ENA|JTAI01000026|JTAI01000026.1,length=31279>
##contig=<ID=ENA|JTAI01000027|JTAI01000027.1,length=3130>
##contig=<ID=ENA|JTAI01000028|JTAI01000028.1,length=340737>
##contig=<ID=ENA|JTAI01000029|JTAI01000029.1,length=5801>
##contig=<ID=ENA|JTAI01000030|JTAI01000030.1,length=4981>
##contig=<ID=ENA|JTAI01000031|JTAI01000031.1,length=318753>
##contig=<ID=ENA|JTAI01000032|JTAI01000032.1,length=45350>
##contig=<ID=ENA|JTAI01000033|JTAI01000033.1,length=114418>
##contig=<ID=ENA|JTAI01000034|JTAI01000034.1,length=1682>
##contig=<ID=ENA|JTAI01000035|JTAI01000035.1,length=28211>
##contig=<ID=ENA|JTAI01000036|JTAI01000036.1,length=117188>
##contig=<ID=ENA|JTAI01000037|JTAI01000037.1,length=188157>
##contig=<ID=ENA|JTAI01000038|JTAI01000038.1,length=3440>
##contig=<ID=ENA|JTAI01000039|JTAI01000039.1,length=373676>
##contig=<ID=ENA|JTAI01000040|JTAI01000040.1,length=996>
##contig=<ID=ENA|JTAI01000041|JTAI01000041.1,length=618>
##contig=<ID=ENA|JTAI01000042|JTAI01000042.1,length=211284>
##contig=<ID=ENA|JTAI01000043|JTAI01000043.1,length=87165>
##contig=<ID=ENA|JTAI01000044|JTAI01000044.1,length=873289>
##contig=<ID=ENA|JTAI01000045|JTAI01000045.1,length=795>
##contig=<ID=ENA|JTAI01000046|JTAI01000046.1,length=590>
##contig=<ID=ENA|JTAI01000047|JTAI01000047.1,length=705>
##contig=<ID=ENA|JTAI01000048|JTAI01000048.1,length=1262>
##contig=<ID=ENA|JTAI01000049|JTAI01000049.1,length=1307>
##contig=<ID=ENA|JTAI01000050|JTAI01000050.1,length=766>
##contig=<ID=ENA|JTAI01000051|JTAI01000051.1,length=795>
##contig=<ID=ENA|JTAI01000052|JTAI01000052.1,length=724>
##contig=<ID=ENA|JTAI01000053|JTAI01000053.1,length=619>
##reference=file:///scratch/gwc32007/crypto_genomes/30976_hominis_genome.fasta
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  ERR1305010
 ENA|JTAI01000001|JTAI01000001.1    1   .   A   <NON_REF>   .   .   END=2   GT:DP:GQ:MIN_DP:PL   
 0:7:0:7:0,0
 ENA|JTAI01000001|JTAI01000001.1    3   .   A   <NON_REF>   .   .   END=3   GT:DP:GQ:MIN_DP:PL   
 0:7:99:7:0,298
 ENA|JTAI01000001|JTAI01000001.1    4   .   C   <NON_REF>   .   .   END=8   GT:DP:GQ:MIN_DP:PL   
 0:7:0:7:0,0
ENA|JTAI01000001|JTAI01000001.1 9   .   A   <NON_REF>   .   .   END=9   GT:DP:GQ:MIN_DP:PL   
0:7:99:7:0,300
ENA|JTAI01000001|JTAI01000001.1 10  .   C   <NON_REF>   .   .   END=11  GT:DP:GQ:MIN_DP:PL   
0:7:0:7:0,0
ENA|JTAI01000001|JTAI01000001.1 12  .   C   <NON_REF>   .   .   END=12  GT:DP:GQ:MIN_DP:PL   
0:7:99:7:0,284
 ENA|JTAI01000001|JTAI01000001.1    13  .   T   <NON_REF>   .   .   END=14  GT:DP:GQ:MIN_DP:PL   
0:7:0:7:0,0
 ENA|JTAI01000001|JTAI01000001.1    15  .   A   <NON_REF>   .   .   END=18  GT:DP:GQ:MIN_DP:PL   
0:7:99:7:0,276
parsing chromosome tag vcf • 355 views
ADD COMMENTlink modified 16 months ago by RamRS30k • written 16 months ago by gcooper124510
0
gravatar for Pierre Lindenbaum
16 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

bcftools annotate

Usage:   bcftools annotate [options] <in.vcf.gz>
(...)
       --rename-chrs <file>       rename sequences according to map file: from\tto
ADD COMMENTlink written 16 months ago by Pierre Lindenbaum131k

Pierre,

Which version do you see the from\tto description? The manual (both of them actually) says white-space separated.

ADD REPLYlink written 16 months ago by RamRS30k
1
Version: 1.9 (using htslib 1.9)
ADD REPLYlink written 16 months ago by Pierre Lindenbaum131k

That's odd - the manual on htslib.org still says white space separated, but when I run bcftools annotate on my local machine, I see the from\tto. Could it be that the online manual is not being maintained properly?

ADD REPLYlink written 16 months ago by RamRS30k
0
gravatar for RamRS
16 months ago by
RamRS30k
Baylor College of Medicine, Houston, TX
RamRS30k wrote:

This has been addressed multiple times on the forum. The best way to do this is to create a whitespace-separated file with new and old contig names like so:

ENA|JTAI01000001|JTAI01000001.1 JTAI01000001
ENA|JTAI01000002|JTAI01000002.1 JTAI01000002
..
..
ENA|JTAI01000053|JTAI01000053.1 JTAI01000053

and use that file with bcftools annotate --rename-chrs. See the bcftools manual to understand the exact syntax.

ADD COMMENTlink written 16 months ago by RamRS30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 896 users visited in the last hour