Question: SNP,INDEL counting per chromosomes in vcf
1
gravatar for siyavash_damdar
5 months ago by
siyavash_damdar20 wrote:

Hi all, how can I counting SNPs and INDELs per each chromosome in a vcf file?

snp next-gen vcf • 431 views
ADD COMMENTlink modified 5 months ago by Alex Reynolds24k • written 5 months ago by siyavash_damdar20
2
gravatar for Alex Reynolds
5 months ago by
Alex Reynolds24k
Seattle, WA USA
Alex Reynolds24k wrote:

Using BEDOPS vcf2bed and standard Unix tools like wc to count lines:

$ vcf2bed --snvs < variants.vcf | wc -l
$ vcf2bed --insertions < variants.vcf | wc -l
$ vcf2bed --deletions < variants.vcf | wc -l

Cf. http://bedops.readthedocs.io/en/latest/content/reference/file-management/conversion/vcf2bed.html

The above does counts over all chromosomes. You need to do an extra step to count per chromosome.

To perform this count exercise quickly per-chromosome, per-variant class:

$ vcf2bed --snvs < variants.vcf > variants.snvs.bed
$ for chr in `bedextract --list-chr variants.snvs.bed`; do echo $chr; bedextract $chr variants.snvs.bed | wc -l; done

The bedextract tool is part of BEDOPS. If you install BEDOPS to get vcf2bed, you get bedextract, as well.

Repeat this procedure for --insertions and --deletions, replacing snvs accordingly, to count those classes.

ADD COMMENTlink modified 5 months ago • written 5 months ago by Alex Reynolds24k

it is correct. I humbly thank you,

ADD REPLYlink modified 5 months ago • written 5 months ago by siyavash_damdar20

I run these commands and I have number of SNP per chromosomes furthermore I have many 'Un' (I think this is unknown chromosomes SNP). how can I split or cut mt,x and unknown chromosomes from vcf?

ADD REPLYlink written 5 months ago by siyavash_damdar20

Use grep on the output of bedextract --list-chr. You can pass grep a file containing chromosome names you want to include (or, conversely, exclude) via grep -f.

$ for chr in `bedextract --list-chr variants.snvs.bed | grep -f wanted-chromosomes.txt -`; do ... ; done

Run man grep to see a full list of options.

ADD REPLYlink written 5 months ago by Alex Reynolds24k

many thanks but I want cut mt, unknown and X chromosomes SNPs from a vcf file.

I means, my prior vcf have many SNPs that these SNPs are in mitochondrial DNA, X chromosome and unknown chromosomes. I want to have a vcf file without mitochondrial , X chromosome and unknown chromosomes SNPs information.

ADD REPLYlink written 5 months ago by siyavash_damdar20

You could use grep in the same way. It may help to review some examples of how it works so you can understand Unix streams and how grep can be used here.

ADD REPLYlink written 5 months ago by Alex Reynolds24k
0
gravatar for Pierre Lindenbaum
5 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum108k wrote:

using bioalcidaejdk: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

$ java -jar dist/bioalcidaejdk.jar -e 'stream().map(V->V.getType().name()+" "+V.getContig()).collect(Collectors.groupingBy(Function.identity(), Collectors.counting())).forEach((K,V)->{println(K+" : "+V);});'  input.vcf

SNP 1 : 44
INDEL 1 : 6
ADD COMMENTlink written 5 months ago by Pierre Lindenbaum108k

thanks but Did not respond

ADD REPLYlink written 5 months ago by siyavash_damdar20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1005 users visited in the last hour