I have VEP-annotated VCF format and tab-delimited format. In tab-delimited format there is no information of genotypes. In VCF format, each annotation is stored in INFO/CSQ. I used bcftools
bcftools +split-vep crispr_pass_rsid_variant_annotated.vcf -f '%CHROM:%POS %CSQ\n' -d -A tab | head -n 2
This command split the consequences. but very few. replace with dots. It does not include the genotypes. I tried the other way as well. Extract the genotypes from the VCF file and try to merge them with the text file in R. The command is below. The problem comes. text file has many annotations for one variant. I filtered with Canonical "Yes," separating "SNV" and "Indels.". Still, there are many annotations for one variant. When I am merging it, it is reading SNV variants only. It skips indels and sequence alterations.
#Previous Steps
GATK Variant filteration Hard Filtering without QUAL, labelled the variants
bcftools view -f 'PASS' all.filtered.vcf.gz -Oz -o all_PASS_filtered.vcf.gz
Vep annotation and generate TXT file and vcf file
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%SAMPLE=%GT]\n' all_PASS_vep_annotated.vcf > pass_genotype.txt
grep -v "^##" all_PASS_vep_annotated.txt > all_PASS_vep_no_header.txt
vep_text <- read.delim("cell_line_annotated_TXT_no_header.txt", sep = "\t", header= TRUE)
genotypes <- fread("pass_genotype.txt", sep = "\t", header = FALSE)
colnames(genotypes) <- c("CHROM", "POS", "REF", "ALT", paste0("Sample", 1:(ncol(genotypes) - 4)))
colnames(vep_annotated2)[colnames(vep_annotated2) == "X.Uploaded_variation"] <- "Uploaded_variation"
genotypes[, Uploaded_variation := paste0(CHROM, "_", POS, "_", REF, "/", ALT)]
genotypes[, Location := paste0(CHROM, ":", POS)]
merged_data2 <- merge(vep_annotated2, genotypes, by = c("Uploaded_variation", "Location"))
I will be heartily thankful to everyone.