Filtering info column
2
0
Entering edit mode
5.8 years ago
taijc06 • 0

Hi all,

I have a vcf file with an info column like this:

##fileformat=VCFv4.3
##fileDate=20180421
##source=PLINKv2.00
##filedate=20180410
##contig=<ID=10,length=135524727>
##INFO= ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|TSL|APPRIS|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|REFSEQ_MATCH|SOURCE|GENE_PHENO|SIFT|PolyPhen|DOMAINS|miRNA|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_A F|AA_AF|EA_AF|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|MAX_AF|MAX_AF_POPS|CLIN_SIG|SOMATIC|PHENO|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE"
#CHROM  POS     ID      REF     ALT

I would like to only obtain the allele frequency (AF) data from the column. However, it is quite difficult for me to do so as all the data are clustered as one column. Are there any ways for me to overcome this? Thank you

sequencing vcf • 2.2k views
ADD COMMENT
0
Entering edit mode
5.8 years ago
NB ▴ 960

Hello, You can use bcftools to extract AF info. The command is something like this

bcftools query -f '%CHROM %POS %AF\n' input.vcf> ouput.vcf

You can read the manual for more info, incase AF tag is not present in your vcf INFO

ADD COMMENT
0
Entering edit mode

OP wants the information that is contained into the VEP INFO/CSQ field, not the INFO/AF

ADD REPLY
0
Entering edit mode

Sorry misunderstood the question, ignore my answer... not sure then... maybe generate the VEP output in a tab format to avoid the clustering and then extract the AF column.

ADD REPLY
0
Entering edit mode
5.8 years ago

using bioalcidaejdk : http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

the code contains a variable 'tools' which itself contains a parser for the VEP output. There are some duplicated lines if there is more than one transcript per variant.

 java -jar dist/bioalcidaejdk.jar -e 'println("CHROM\tPOS\tREF\tAF");stream().forEach(V->tools.getVepPredictions(V).stream().forEach(P->{println(V.getContig()+"\t"+V.getStart()+"\t"+V.getReference().getDisplayString()+"\t"+P.getByCol("AF"));}));'

CHROM   POS REF AF
21  26960070    G   0.0014
21  26960070    G   0.0014
21  26960070    G   0.0014
21  26965148    G   0.7324
21  26965148    G   0.7324
21  26965148    G   0.7324
21  26965172    T   0.0106
21  26965172    T   0.0106
21  26965172    T   0.0106
21  26965205    T   0.7324
21  26965205    T   0.7324
21  26965205    T   0.7324
21  26976144    A   0.0004
21  26976144    A   0.0004
21  26976144    A   0.0004
(...)
ADD COMMENT

Login before adding your answer.

Traffic: 2633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6