What Formats Are People Using For Pooled Sequencing Data?
1
0
Entering edit mode
12.0 years ago

Hi everyone,

My lab is developing tools to identify genomic regions under selection in pooled data. Obviously we want to support many input formats. What format is your pooled data in?

sequencing • 2.6k views
ADD COMMENT
3
Entering edit mode
12.0 years ago

I've used FreeBayes to call variants in pooled data (20 pools of 10) in the past. The input data are in BAM format (each pool labelled using RG tag) and the FreeBayes calls are in VCF. FreeBayes uses a custom genotype string to describe the alleles in each pool. Each pool has 10 individuals and thus 20 chromosomes. The GT string reflects the presence (1) or absence (0) of the alternate allele on each of the 20 chromosomes. Is this what you are after?

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype">
##FORMAT=<ID=GL,Number=1,Type=Float,Description="Genotype Likelihood, log-scaled likeilhood of the data given the called genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=RA,Number=1,Type=Integer,Description="Reference allele observations">
##FORMAT=<ID=AA,Number=1,Type=Integer,Description="Alternate allele observations">
##FORMAT=<ID=SR,Number=1,Type=Integer,Description="Number of reference observations by strand, delimited by |: [forward]|[reverse]">
##FORMAT=<ID=SA,Number=1,Type=Integer,Description="Number of alternate observations by strand, delimited by |: [forward]|[reverse]">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  pool1-63    pool2-43    pool2-63    pool3-63    pool4-63    pool5-63    pool6-63    pool7-63
chr1    113735151   .   T   G   552.93  .   NS=8;DP=281;AC=8;AN=160;AF=0.05;RA=258;AA=18;SRF=257;SRR=1;SAF=18;SAR=0;SRB=0.99612;SAB=1;SRP=554.6;SAP=42.097;ABR=228;ABA=18;AB=0.912;ABP=400.79;RUN=5;MQM=59.444;BPL=848;BPR=268;RPL=16;RPR=2;RPP=26.655;LRB=0.51971;BVAR;SNP;TV  GT:GL:DP:RA:AA:SR:SA    1/1/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:-9.3854:64:57:6:57|0:6|0    0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:0:10:10:0:10|0:0|0  1/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:-1.0058:15:14:1:14|0:1|0    1/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:-1.6621:41:38:3:38|0:3|0    1/1/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:-9.0539:28:24:3:24|0:3|0    1/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:-1.5989:44:41:3:41|0:3|0    1/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:-15.859:58:54:2:53|1:2|0    0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:0:20:20:0:20|0:0|0
ADD COMMENT
0
Entering edit mode

Would you be willing to share a line and the header? This is what I am after.

ADD REPLY
1
Entering edit mode

Updated post with a snip of the header and the first variant line.

ADD REPLY

Login before adding your answer.

Traffic: 2948 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6