Hello everyone!
I've been doing some variant analysis and this includes some manual curation and step filtration for specific scores.
For example, in a decision tree:
Filter rare (1) | filter (impact is high) or (impact is moderate) (2) | | filter (clinvar_sig matches pathogenic) (3) filter ( clinvar_sig matches benign) (4)
And so on. I've been doing this manually by generating the filtered files and refiltering from that new file, which accelerates the filtering step because the number of samples and variants are reduced with each step. So for example, VCF file on step 3 is named 1_2_3_VCF because it has been though those steps. Accordingly, step 4 is named 1_2_4_VCF
But this has the problem that some files will contain some different samples, but not all different, with the same or different variants. In turn, this makes so I can't merge files with 400k samples with bcftools or, in fact, most tools.
Now I'm trying to do a way of flagging the variants with step names instead of doing steps based on previous steps.
So my question is:
Is there a way to make ensembl filter to flag variants in a custom field, then do another filter round in the same vcf file? The result would be -> filter rare = tag variants with 1, filter impact = tag variants with 2. If there is a variant in both steps, it would be tagged 1,2 for example.
Or maybe there is a better way of doing this that I have not thought before?