Question: What Formats Are People Using For Pooled Sequencing Data?
0
gravatar for Zev.Kronenberg
5.4 years ago by
United States
Zev.Kronenberg11k wrote:

Hi everyone,

My lab is developing tools to identify genomic regions under selection in pooled data. Obviously we want to support many input formats. What format is your pooled data in?

tool • 1.0k views
ADD COMMENTlink written 5.4 years ago by Zev.Kronenberg11k
3
gravatar for Aaronquinlan
5.4 years ago by
Aaronquinlan9.9k
United States
Aaronquinlan9.9k wrote:

I've used FreeBayes to call variants in pooled data (20 pools of 10) in the past. The input data are in BAM format (each pool labelled using RG tag) and the FreeBayes calls are in VCF. FreeBayes uses a custom genotype string to describe the alleles in each pool. Each pool has 10 individuals and thus 20 chromosomes. The GT string reflects the presence (1) or absence (0) of the alternate allele on each of the 20 chromosomes. Is this what you are after?

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype">
##FORMAT=<ID=GL,Number=1,Type=Float,Description="Genotype Likelihood, log-scaled likeilhood of the data given the called genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=RA,Number=1,Type=Integer,Description="Reference allele observations">
##FORMAT=<ID=AA,Number=1,Type=Integer,Description="Alternate allele observations">
##FORMAT=<ID=SR,Number=1,Type=Integer,Description="Number of reference observations by strand, delimited by |: [forward]|[reverse]">
##FORMAT=<ID=SA,Number=1,Type=Integer,Description="Number of alternate observations by strand, delimited by |: [forward]|[reverse]">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  pool1-63    pool2-43    pool2-63    pool3-63    pool4-63    pool5-63    pool6-63    pool7-63
chr1    113735151   .   T   G   552.93  .   NS=8;DP=281;AC=8;AN=160;AF=0.05;RA=258;AA=18;SRF=257;SRR=1;SAF=18;SAR=0;SRB=0.99612;SAB=1;SRP=554.6;SAP=42.097;ABR=228;ABA=18;AB=0.912;ABP=400.79;RUN=5;MQM=59.444;BPL=848;BPR=268;RPL=16;RPR=2;RPP=26.655;LRB=0.51971;BVAR;SNP;TV  GT:GL:DP:RA:AA:SR:SA    1/1/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:-9.3854:64:57:6:57|0:6|0    0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:0:10:10:0:10|0:0|0  1/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:-1.0058:15:14:1:14|0:1|0    1/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:-1.6621:41:38:3:38|0:3|0    1/1/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:-9.0539:28:24:3:24|0:3|0    1/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:-1.5989:44:41:3:41|0:3|0    1/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:-15.859:58:54:2:53|1:2|0    0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:0:20:20:0:20|0:0|0
ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by Aaronquinlan9.9k

Would you be willing to share a line and the header? This is what I am after.

ADD REPLYlink written 5.4 years ago by Zev.Kronenberg11k
1

Updated post with a snip of the header and the first variant line.

ADD REPLYlink written 5.4 years ago by Aaronquinlan9.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 657 users visited in the last hour