I'm working with vcf file, and that's how it looks like:
##info1 ##info2 ##info3 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT ID01 ID02 ID03 etc... 3 66894 rs9681213 0 1 . PASS . GT 0|1 0|1 0|1 etc... 3 95973 rs1400176 0 1 . PASS . GT 1|1 1|1 1|1 etc... 3 104972 rs990284 0 1 . PASS . GT 0|1 0|1 0|0 etc... 3 114133 rs954824 0 1 . PASS . GT 1|1 1|1 1|1 etc... and so on...
As you can see, the general format explained: - At the lines there are information about my central targets: SNPs and their alleles; - At the columns (after the 9th one) there are individuals with their respective alleles for each SNPs.
So, for each column's ID there are lines with the info I'm looking for...
I've a Perl script for extracting specific individuals (IDs) and I just realized there're missing values for some IDs and it's impairing my later analysis. Such script prints the empty values as they are (empty), and even if I use the vcftools program it prints a point (.) instead the empty, but it doesn't help me anyway. So I wanna know what IDs aren't codified.
Basically, I wanna print just the columns in which IDs present missing ("null" or "undefined") values, i.e. they're blank.
Once my files are huge in both directions, it's not easy to see manually what IDs haven't codification in their lines. From my limited knowledge, I believe the easiest way is to check the undefined value in the column and somehow print just the lines from that column.
So what I have to do is 1) to split the lines in order to get the columns (OK); 2) to print the first nine columns anyway through a loop (OK); 3) to check if there's any missing values at the columns (partially OK); 4) to print only the columns of which lines are data-missing (I got stuck here);
My problem at this last part is that, even "I discovering the columns with undefined value" (actually, the program did discover at this point, not me), I do not know what they are in order to print only them. I have to find a way to tell the program to print just the columns with missing data and I don't know how to specify those columns for it...
Could someone please help me out?