Sort a sub column within a column while keeping the feature (LINUX)
2
0
Entering edit mode
2.5 years ago
yash_verma • 0

I have a vcf file with these column headers:

#CHROM  POS     ID  REF   ALT   QUAL    FILTER  INFO    FORMAT     BS_25YES2E3  BS_G5B6AD28 BS_QCGPE1ZX

A sample feature within that vcf file

chr1    10450   .   T   C   27.94   VQSRTrancheSNP99.90to100.00+    AC=1;AF=0.167;AN=6;BaseQRankSum=-1.676e+00;ClippingRankSum=0.789;DP=102;ExcessHet=4.7712;FS=4.868;MLEAC=1;MLEAF=0.167;MQ=34.67;MQRankSum=-1.084e+00;PG=0,0,0;QD=1.55;ReadPosRankSum=-2.169e+00;SOR=0.707;VQSLOD=-1.050e+01;culprit=MQ;ANN=C|upstream_gene_variant|MODIFIER|**DDX11L1**|ENSG00000223972|Transcript|ENST00000450305|transcribed_unprocessed_pseudogene|||||||||||1560|1||SNV|HGNC|HGNC:37102||||chr1:g.10450T>C,C|upstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000456328|processed_transcript|||||||||||1419|1||SNV|HGNC|HGNC:37102|YES|||chr1:g.10450T>C,C|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000488147|unprocessed_pseudogene|||||||||||3954|-1||SNV|HGNC|HGNC:38034|YES|||chr1:g.10450T>C GT:AD:DP:FT:GQ:JL:JP:PL:PP  0/0:28,0:28:lowGQ:0:1:1:0,0,663:0,0,666 0/1:13,5:18:PASS:35:1:1:34,0,342:35,0,345   0/0:44,0:44:lowGQ:0:1:1:0,0,802:0,0,805

The portion in bold is what I want (DDX11L1). I want to sort the vcf file based on this sub column. This is under the info field under SYMBOL. The metadata for info field is:

##INFO=<ID=ANN,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|SIFT|HGVS_OFFSET|HGVSg">

Literally any help would be great. I want to be able to collapse variants by gene so if you have a simpler way of doing this, it would be great too.

na • 691 views
ADD COMMENT
2
Entering edit mode
2.5 years ago

The task you seek is overly specialized and narrow application, it is unlikely to find a tool that does it already.

Your best bet would be to write a simple parser in a programming language and do it yourself.

If you know a little programming it should be fairly straightforward.

ADD COMMENT
0
Entering edit mode
2.5 years ago
GokalpC ▴ 100

You may want to use bcftools splitvep function or something similar to convert your vcf to a tabular format. Then you can use the sort function from linux to sort according to any column you want.

ADD COMMENT

Login before adding your answer.

Traffic: 2478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6