Multisample variant Processing
1
0
Entering edit mode
8.2 years ago
ruavol.bb • 0

I have a 928 non related vcd sample file. It has been annotated with appropriate RSID's. How do I query the file for the sample along with the correspond ID? Analysis requires the sample name the ref/alt allele and the RSID number. If I use vcd-subset, I get them sample but not the ID. Thanks for any directions.

snp • 1.4k views
ADD COMMENT
0
Entering edit mode

Thanks for the awk command. The problem I ran into is that the RSID information is in the INFO field. I need it in the ID column so when I query by sample I have the rs associated with that position. I also tried the GATK and output was 0 records processed. I also tried bcftools query and it is close but no line by sample.

ADD REPLY
0
Entering edit mode
8.2 years ago
gunzip -c input.vcf.gz |  awk -F '\t' '($0 ~ /^#/ || $3=="rs1234")'

?

or GATK selectvariants with --keepIDs https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.php#--keepIDs

--keepIDs / -IDs

List of variant IDs to select

If a file containing a list of IDs is provided to this argument, the tool will only select variants whose ID field is present in this list of IDs. The matching is done by exact string matching. The expected file format is simply plain text with one ID per line.

ADD COMMENT

Login before adding your answer.

Traffic: 2711 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6