I have a VCF file that includes variant and invariant sites for every locus. Is there an easy way to obtain the lengths for each these?
Every invariant site? If that's the case, just use this logic: SELECT chr,(MAX(pos)-MIN(pos)) GROUP BY chr
SELECT chr,(MAX(pos)-MIN(pos)) GROUP BY chr
Check the vcf header. I have lots of .vcf files with the lengths of each contig in them.
Is there a way to get the lengths only for the contigs with reads mapped? I can edit my initial question if necessary.
A vcf doesn't really care about your read mapping. It's well past that point. If you want information about the reference genome, look there, or maybe the SAM header.
Well the contig lengths in the vcf swbarnes2 is referring to are all the reference contigs, regardless if reads supported a given contig (at least in my vcf file). I just want the lengths of all the contigs from the CHROM column.
You're saying there's more contigs in the VCF header than in your CHROM column? Then you can trust the VCF header and just use the subset you're interested in. Either way, going to an earlier stage is less error prone, look at the reference genome itself, or the SAM header for information related to those things. the VCF probably copied the data from there.