Question: vcf file processing
0
gravatar for neeraj4biotech
2.0 years ago by
India
neeraj4biotech0 wrote:

I have a vcf file, have run SnpEff for annotation. I need to group these snps based on their belong genes. such as x, y and z snps belong to gene w, for all gene.

snp next-gen gene • 600 views
ADD COMMENTlink modified 2.0 years ago by genomax84k • written 2.0 years ago by neeraj4biotech0

Are you trying to extract them into separate files per gene or are you trying to run a burden test or something sophisticated?

ADD REPLYlink written 2.0 years ago by Vivek2.4k

Thanks Vivek for quick response. Have vcf file and bed/gff file as input file. Actually I want separate files per gene.

ADD REPLYlink written 2.0 years ago by neeraj4biotech0
1

There are more elegant solutions if you can do some scripting but here's a crude workflow:

If you have one line per gene in the bed file, you can initially split the BED file into one file per gene like this:

split -l 1 Genes.bed Genes-

Depending on the number of genes, you might produce a lot of files here.

Rename to bed extension

for file in `ls Genes-*`;do mv $file $file.bed;done

Then use Tabix to split your VCF

for bed in `ls Genes-*.bed`;do tabix variants.vcf -h -B $bed > variants-${bed}.vcf;done
ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Vivek2.4k

It always helps if you can post some example data. Use datamash to group by gene and collapse all SNPs.

output:

$ datamash -H -g 1 collapse 2 < snps.txt 
GroupBy(gene)   collapse(snp)
x   a,b,c
y   d,e
z   f,g,h

input:

$ cat snps.txt 
gene    snp
x   a
x   b
x   c
y   d
y   e
z   f
z   g
z   h

Install datamash either from here or from distro repos (for debian based; sudo apt install datamash -y; for conda, conda install datamash -y).

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by cpad011213k

Neeraj, can you post few lines of the data? I know it should be a standard vcf, still it helps !

ADD REPLYlink written 2.0 years ago by lakhujanivijay5.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 867 users visited in the last hour