Question: Count number of SNPs per chromosome in vcf file
1
gravatar for mostafarafiepour
2.3 years ago by
mostafarafiepour80 wrote:

Hi all,

I want to count the number of SNP for each chromosome in the raw VCF file. What is the best idea?

Best Regard

Mostafa

snp • 2.5k views
ADD COMMENTlink modified 2.3 years ago by WouterDeCoster44k • written 2.3 years ago by mostafarafiepour80

Normalize your VCF and then execute: Datamash is in most of the linux repos

$ grep -v '^#' test.vcf | datamash -sg 1 count 1

with awk:

$ awk '!/^#/ { a[$1]++ } END {for (i in a) print i,a[i]}' test.vcf
ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by cpad011214k

I have adapted your title to make it more descriptive of what you are asking.

ADD REPLYlink written 2.3 years ago by WouterDeCoster44k
6
gravatar for Pierre Lindenbaum
2.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:
grep -v "^#" in.vcf | cut -f 1 | sort | uniq -c
ADD COMMENTlink written 2.3 years ago by Pierre Lindenbaum131k

many thanks for your reply,

Does -f 1 mean the number of chromosomes?

ADD REPLYlink written 2.3 years ago by mostafarafiepour80

https://linux.die.net/man/1/cut

ADD REPLYlink written 2.3 years ago by Pierre Lindenbaum131k

I apologize but this script does not work?

ADD REPLYlink written 2.3 years ago by mostafarafiepour80

https://meta.stackexchange.com/questions/147616

ADD REPLYlink written 2.3 years ago by Pierre Lindenbaum131k
1

Pierre's script works for me, Moustafa:

grep -v "^#" test.vcf | cut -f 1 | sort | uniq -c
  16011 1
   7308 10
   9565 11
   9149 12
   3311 13
   5881 14
   5360 15
   7016 16
   8611 17
   2896 18
   9895 19
  11621 2
   3881 20
   2472 21
   3881 22
   9215 3
   7464 4
   7805 5
  10110 6
   7991 7
   6023 8
   6898 9
     37 MT
   3218 X
     21 Y

Chromosome 1 has 16011 variants... chromosome 9 has 6898, et cetera.

Your input VCF should be properly formatted and also be uncompressed.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Kevin Blighe67k

Yes i understood. Thank you very much for describing you.

ADD REPLYlink written 2.3 years ago by mostafarafiepour80

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLYlink written 2.3 years ago by Pierre Lindenbaum131k

Now, if i want to count the number of SNPs for each Breed, what is the best idea? i have 5 breed in the my raw vcf.

ADD REPLYlink written 2.2 years ago by mostafarafiepour80

try VCFstats from RTGtools. But that would be stats per sample, not per chromosome. If you want per chromosome, per sample, then you may have to write a script. mostafarafiepour

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by cpad011214k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1510 users visited in the last hour