I have a list of genes. and i want to extract the snps and indels from my VCF file (that i generated using GATK pipeline ) from genes coordinates on . The list of genes coordinates:
Gene Name Accession_no. Start_Position End_Position Strand Rv0194 NC_000962.3 226878 230462 +
I was looking bedtools but it is asking for .bed format of genes nd as well .bed of bam files. how to do it ? or any other options/tools/scripts?
Like i tried tabix:
bgzip ERR038736_UnifiedGenotyper_variants_raw_snp.vcf tabix ERR038736_UnifiedGenotyper_variants_raw_snp.vcf.gz tabix ERR038736_UnifiedGenotyper_variants_raw_snp.vcf.gz AL123456.3:226878-230462 > Rv0194
and this gave me the variants like this:
AL123456.3 227098 . T C 6730.77 . AC=2;AF=1.00;AN=2;DP=172;Dels=0. AL123456.3 228069 . G A 7132.77 . AC=2;AF=1.00;AN=2;BaseQRankSum=- AL123456.3 228168 . G C 6682.77 . AC=2;AF=1.00;AN=2;DP=171;Dels=0.
But this is not a vcf file and i can only extract it one at a time. I want to extract all variants against a list of coordinates and store it in a vcf output.
Can anyone help me it this?