Question: Multisample variant Processing
0
gravatar for ruavol.bb
2.7 years ago by
ruavol.bb0
ruavol.bb0 wrote:

I have a 928 non related vcd sample file.  It has been annotated with appropriate RSID's.  How do I query the file for the sample along with the correspond ID?  Analysis requires the sample name the ref/alt allele and the RSID number.  If I use vcd-subset, I get them sample but not the ID.  Thanks for any directions.

snp • 738 views
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by ruavol.bb0

Thanks for the awk command.  The problem I ran into is that the RSID information is in the INFO field.  I need it in the ID column so when I query by sample I have the rs associated with that position.  I also tried the GATK and ouput was 0 records processed.  I also tried bcftools query and it is close but no line by sample.

ADD REPLYlink written 2.7 years ago by ruavol.bb0
0
gravatar for Pierre Lindenbaum
2.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum114k wrote:
gunzip -c input.vcf.gz |  awk -F '\t' '($0 ~ /^#/ || $3=="rs1234")'

?

or GATK selectvariants with --keepIDs https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.php#--keepIDs

--keepIDs / -IDs

List of variant IDs to select

If a file containing a list of IDs is provided to this argument, the tool will only select variants whose ID field is present in this list of IDs. The matching is done by exact string matching. The expected file format is simply plain text with one ID per line.

ADD COMMENTlink modified 9 weeks ago by RamRS19k • written 2.7 years ago by Pierre Lindenbaum114k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 804 users visited in the last hour