Question: Multisample variant Processing
gravatar for
2.9 years ago by
ruavol.bb0 wrote:

I have a 928 non related vcd sample file.  It has been annotated with appropriate RSID's.  How do I query the file for the sample along with the correspond ID?  Analysis requires the sample name the ref/alt allele and the RSID number.  If I use vcd-subset, I get them sample but not the ID.  Thanks for any directions.

snp • 768 views
ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by ruavol.bb0

Thanks for the awk command.  The problem I ran into is that the RSID information is in the INFO field.  I need it in the ID column so when I query by sample I have the rs associated with that position.  I also tried the GATK and ouput was 0 records processed.  I also tried bcftools query and it is close but no line by sample.

ADD REPLYlink written 2.9 years ago by ruavol.bb0
gravatar for Pierre Lindenbaum
2.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:
gunzip -c input.vcf.gz |  awk -F '\t' '($0 ~ /^#/ || $3=="rs1234")'


or GATK selectvariants with --keepIDs

--keepIDs / -IDs

List of variant IDs to select

If a file containing a list of IDs is provided to this argument, the tool will only select variants whose ID field is present in this list of IDs. The matching is done by exact string matching. The expected file format is simply plain text with one ID per line.

ADD COMMENTlink modified 4 months ago by RamRS20k • written 2.9 years ago by Pierre Lindenbaum116k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1699 users visited in the last hour