Entering edit mode
3 months ago
arya.sagittarius
•
0
The following are the steps Im following: First step to extract sample using bed file is this (here the bedfile is input bedfile converted to Hg38):
tabix -h -R Hg19_to_Hg38_sorted.bed.gz gnomad.genomes.v{g_version}.hgdp_tgp.chr{chr}.vcf.bgz | perl {vcftools} -c {sample_name} > {sample_name}_out.vcf'
output({sample_name}_out.vcf')
chr2 113982416 rs56177103 TATAAAATAAAATAAA T . PASS . GT:AAD:DAD:DAF:ADF 0/1:25519,4077:25519,4077:0.13776:0.13776
chr2 113982416 rs56177103 TATAAAATAAAATAAA T . PASS . GT:AAD:DAD:DAF:ADF 0/1:25519,4077:25519,4077:0.13776:0.13776
chr2 113982416 rs56177103 TATAAAATAAAATAAA T . PASS . GT:AAD:DAD:DAF:ADF 0/1:25519,4077:25519,4077:0.13776:0.13776
as my output file had repeated regions, inorder to extract the unique regions im using the same input bed file with intersect bed , but unable to get the unique reads. It gives the same repeated results. why is that so ? The following is the cmd that I had used:
bedtools/intersectBed -u -a {sample_name}_out.vcf' -b bed_filename > output.vcf
Was also wondering if doing sort|uniq gives the same result?