Question: Bedtools intersect/bcftools annotate dublicate the SNPs
0
gravatar for User000
7 months ago by
User000390
User000390 wrote:

Hello,

I have a VCF file with SNPs and BED file with genes. I would like to extract only SNPs that reside on the genes. With the script below I could achieve this goal. However, in the output file some SNPs get doubled then and some are not:

chr1B   812345  .   G   A   214.635
chr1B   812345  .   G   A   214.635
chr1B   812413  .   A   T   601.948
chr1B   812413  .   A   T   601.948
chr1B   824119  .   T   C   349.381
chr1B   824169  .   T   C   241.202
chr1B   824175  .   C   T   225.089
chr1B   824186  .   C   G   1480.12
chr1B   824591  .   T   C   164.335

This is very strange, do you think there is some bug there? How to solve this? I am using bcftools version: 1.10.2-27-g9d66868 and bedtools version: v2.29.2

(SAMPLES,) =glob_wildcards('/path/to/{sample}.vcf.gz')
rule all:
    input: 
        expand("{sample}.flanking.vcf", sample=SAMPLES)

rule bedtools:
    input:
        invcf="/path/to/{sample}.vcf.gz",
        bedgz="/path/to/input.bed.gz"
    output:
        outvcf="{sample}.flanking.vcf"
    shell:
        """
        /Tools/bedtools2/bin/bedtools intersect -a {input.invcf} -b {input.bedgz} -header -wa | /Tools/bcftools/bcftools annotate -a {input.bedgz} -c CHROM,FROM,TO,GENE -h <(echo '##INFO=<ID=GENE,Number=1,Type=String,Description="Gene name">') > {output.outvcf}
        """
snp bcftools bedtools • 235 views
ADD COMMENTlink modified 7 months ago • written 7 months ago by User000390
1
gravatar for User000
7 months ago by
User000390
User000390 wrote:

I found a solution here: link. Hope could be useful for the community. So I modified my shell like this:

/Tools/bedtools2/bin/bedtools intersect -a {input.invcf} -b {input.bedgz} -u -header -wa | /Tools/bcftools/bcftools annotate -a {input.bedgz} -c CHROM,FROM,TO,GENE -l GENE:append -h <(echo '##INFO=<ID=GENE,Number=.,Type=String,Description="Gene name">') > {output.outvcf}

This would give just one single snp with all the genes that overlap for that snp.

ADD COMMENTlink written 7 months ago by User000390
1
gravatar for Carlo Yague
7 months ago by
Carlo Yague5.0k
Canada
Carlo Yague5.0k wrote:

Could it be that some SNP fall onto multiple genes ? Because of overlapping genes or sense/antisense gene pairs. To check this, I would just look at the output of

intersect -a {input.invcf} -b {input.bedgz} -header -wa -wb > out.bed

where the option -wb ensure that the full gene entry of input.bedgz is outputed for every match. Btw I'm not sure to see the point of using bcftools annotate here, when you already matched the SNP with the genes using bedtools.

ADD COMMENTlink written 7 months ago by Carlo Yague5.0k

oh, I use annotate only to append the gene name in the INFO field...

ADD REPLYlink written 7 months ago by User000390

The gene is exactly the same, the scores everything

ADD REPLYlink written 7 months ago by User000390
1

I can see that the gene is the same after you use bcftools annotate, but is it the same using bedtools interesct -wa -wb ? It shouldn't be the same then.

ADD REPLYlink written 7 months ago by Carlo Yague5.0k

you are right, there are overlapping genes, basically -wa -wb did not show anything, but I checked the positions on duplicated SNPs manually and see that the genes are overlapping (because I am using flanking +/- 1kb). Any ideas how to solve this? :/

ADD REPLYlink written 7 months ago by User000390

Sorry, I just realized it is the last column showing the genes! and it is showing the right genes, but I need this information (the name of the gene) to be inside the INFO field......how can I do that?

ADD REPLYlink modified 7 months ago • written 7 months ago by User000390
0
gravatar for User000
7 months ago by
User000390
User000390 wrote:

Found the solution: https://groups.google.com/forum/#!topic/bedtools-discuss/Vi1-PVGqH18

ADD COMMENTlink written 7 months ago by User000390
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1146 users visited in the last hour