I'm looking for a way using a 28 sample merged vcf file to add a filter to the "FILTER" column, so I can remove later any SNP which does not have coverage of at least 1 read across 24 of the 28 samples I have in my one vcf file.
Anybody know of a way to do this? I think I have found a post about how to filter so that all samples must have at least 1 read but I don't want all, I want to allow that any 24 samples for each SNP must have coverage above 1. Plus I want to add to the filter column but I could live with just removing.
cat test.vcf | java -jar SnpSift.jar filter `seq 1 28 | awk '{ printf("%s (GEN[%d].DP>1) ",(NR==1?"":" & "), $1);}'`
I'm trying to mimic the method used here.
We then used the MAPS pipeline to select bases in the reference covered by at least one read at quality higher than 20 in a minimum number of samples. This number is determined by the MinLib parameter, which was set equal to the total number of samples in the batch minus four. For example, we used MinLib = 20 for batches of 24 samples and MinLib = 28 for batches of 32 samples. This number was selected to ensure that at least half of the lines in each capture including eight individuals had a minimum coverage of one read at quality higher than 20. This threshold showed a low number of false positives and was adopted for the complete project.
I've tried it and seems to work, shouldn't it be as below for one read or more coverage and 24 or more samples?
yes , you're right.