SNP,INDEL counting per chromosomes in vcf
2
1
Entering edit mode
6.3 years ago

Hi all, how can I counting SNPs and INDELs per each chromosome in a vcf file?

snp next-gen SNP vcf • 5.3k views
ADD COMMENT
2
Entering edit mode
6.3 years ago

Using BEDOPS vcf2bed and standard Unix tools like wc to count lines:

$ vcf2bed --snvs < variants.vcf | wc -l
$ vcf2bed --insertions < variants.vcf | wc -l
$ vcf2bed --deletions < variants.vcf | wc -l

Cf. http://bedops.readthedocs.io/en/latest/content/reference/file-management/conversion/vcf2bed.html

The above does counts over all chromosomes. You need to do an extra step to count per chromosome.

To perform this count exercise quickly per-chromosome, per-variant class:

$ vcf2bed --snvs < variants.vcf > variants.snvs.bed
$ for chr in `bedextract --list-chr variants.snvs.bed`; do echo $chr; bedextract $chr variants.snvs.bed | wc -l; done

The bedextract tool is part of BEDOPS. If you install BEDOPS to get vcf2bed, you get bedextract, as well.

Repeat this procedure for --insertions and --deletions, replacing snvs accordingly, to count those classes.

ADD COMMENT
0
Entering edit mode

it is correct. I humbly thank you,

ADD REPLY
0
Entering edit mode

I run these commands and I have number of SNP per chromosomes furthermore I have many 'Un' (I think this is unknown chromosomes SNP). how can I split or cut mt,x and unknown chromosomes from vcf?

ADD REPLY
0
Entering edit mode

Use grep on the output of bedextract --list-chr. You can pass grep a file containing chromosome names you want to include (or, conversely, exclude) via grep -f.

$ for chr in `bedextract --list-chr variants.snvs.bed | grep -f wanted-chromosomes.txt -`; do ... ; done

Run man grep to see a full list of options.

ADD REPLY
0
Entering edit mode

many thanks but I want cut mt, unknown and X chromosomes SNPs from a vcf file.

I means, my prior vcf have many SNPs that these SNPs are in mitochondrial DNA, X chromosome and unknown chromosomes. I want to have a vcf file without mitochondrial , X chromosome and unknown chromosomes SNPs information.

ADD REPLY
0
Entering edit mode

You could use grep in the same way. It may help to review some examples of how it works so you can understand Unix streams and how grep can be used here.

ADD REPLY
0
Entering edit mode

Hi, I also tried using this script but gave me only the number of snvs for each chromosome and did not count the insertions and deletions??

In fact, the outputs related to insertions and deletions are empty and there is no data in it?

ADD REPLY
0
Entering edit mode

Are you sure you have insertions/deletions?

ADD REPLY
0
Entering edit mode

yes, Because when I use this command, i find 182456 deletions and 0 insertions.

insertions:

awk '! /\#/' variants.vcf | awk '{if(length($5) > 1) print}' | wc -l

insertions:

awk '! /\#/' variants.vcf | awk '{if(length($4) > 1 ) print}' | wc -l
ADD REPLY
0
Entering edit mode
6.3 years ago

using bioalcidaejdk: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

$ java -jar dist/bioalcidaejdk.jar -e 'stream().map(V->V.getType().name()+" "+V.getContig()).collect(Collectors.groupingBy(Function.identity(), Collectors.counting())).forEach((K,V)->{println(K+" : "+V);});'  input.vcf

SNP 1 : 44
INDEL 1 : 6
ADD COMMENT
0
Entering edit mode

thanks but Did not respond

ADD REPLY
0
Entering edit mode

Hi, i have been trying to do your scripts. But counted the only number of SNPs and did not count the InDels. ? Please see the photo in the attachment.

enter image description here

ADD REPLY
0
Entering edit mode

Read the part that begins with [WARN]

ADD REPLY
0
Entering edit mode

the warning is meaningless here.

ADD REPLY
0
Entering edit mode

Ah I see. It's probably my second guess then - no InDels to pick.

ADD REPLY

Login before adding your answer.

Traffic: 3146 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6