getting chromosome lengths from a vcf file
1
0
Entering edit mode
7.9 years ago
outlier95 ▴ 30

I have a VCF file that includes variant and invariant sites for every locus. Is there an easy way to obtain the lengths for each these?

vcf chromosomes • 3.8k views
ADD COMMENT
0
Entering edit mode

Every invariant site? If that's the case, just use this logic: SELECT chr,(MAX(pos)-MIN(pos)) GROUP BY chr

ADD REPLY
2
Entering edit mode
7.9 years ago

Check the vcf header. I have lots of .vcf files with the lengths of each contig in them.

ADD COMMENT
0
Entering edit mode

Is there a way to get the lengths only for the contigs with reads mapped? I can edit my initial question if necessary.

ADD REPLY
0
Entering edit mode

A vcf doesn't really care about your read mapping. It's well past that point. If you want information about the reference genome, look there, or maybe the SAM header.

ADD REPLY
0
Entering edit mode

Well the contig lengths in the vcf swbarnes2 is referring to are all the reference contigs, regardless if reads supported a given contig (at least in my vcf file). I just want the lengths of all the contigs from the CHROM column.

ADD REPLY
2
Entering edit mode

You're saying there's more contigs in the VCF header than in your CHROM column? Then you can trust the VCF header and just use the subset you're interested in. Either way, going to an earlier stage is less error prone, look at the reference genome itself, or the SAM header for information related to those things. the VCF probably copied the data from there.

ADD REPLY

Login before adding your answer.

Traffic: 3383 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6