Question: Freebayes outputs only one individual out of many
0
gravatar for Fedster
27 days ago by
Fedster20
Fedster20 wrote:

Hi All,

I am running freebayes as:

freebayes -f Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa -C 5 -L p1list.txt --populations p1pops.txt> plate1.vcf

with p1list.txt (the path is the same for all files):

/path/to/S-176.sorted.bam
/path/to/S-177.sorted.bam
/path/to/S-178.sorted.bam
/path/to/S-179.sorted.bam
/path/to/S-180.sorted.bam
/path/to/S-181.sorted.bam
/path/to/S-182.sorted.bam
/path/to/S-184.sorted.bam
/path/to/S-185.sorted.bam
/path/to/S-186.sorted.bam
...

and p1pops.txt:

S-176.sorted    pop1
S-177.sorted    pop1
S-178.sorted    pop1
S-179.sorted    pop1
S-180.sorted    pop1
S-181.sorted    pop2
S-182.sorted    pop2
S-184.sorted    pop2
S-185.sorted    pop2
S-186.sorted    pop2
...

yet if I do

awk '{if ($1 == "#CHROM"){print NF-9; exit}}' plate1.vcf
1

which means most of my samples have been ignored or discarded. Freebayes is version: v1.2.0-2-g29c4002.

I'd be grateful for any idea of what is going on.

freebayes • 121 views
ADD COMMENTlink modified 25 days ago • written 27 days ago by Fedster20
1

With your awk command, you just print out the content of the 9th column before the last column, in the line where the first column is #CHROM. Saying this you will print out the sample name of the first sample in your vcf. I you havent't passed a sample name via ReadGroup during alignment, freebayes will just enumerate them. I guess this is not what you like to do?

fin swimmer

EDIT: Sorry, your command should print the number of samples and not the content of the column. Therefore there had to in $.

ADD REPLYlink modified 27 days ago • written 27 days ago by finswimmer7.9k

are you saying I should have set a ReadGroup tag to each bam file (corresponding to the IDs in the populations file?) for freebayes to correctly ID the bam files as belonging to different IDs?

ADD REPLYlink written 27 days ago by Fedster20

This would be best-practice, yes. But it isn't neccessary.

What's the output of:

$ grep "^#CHROM" plate1.vcf

fin swimmer

ADD REPLYlink written 27 days ago by finswimmer7.9k
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  unknown
ADD REPLYlink written 27 days ago by Fedster20

Ok, I tried to add RG tags and it is just basically a fail. I can add tags, but freebayes still does not like them. Given a file for ID x

samtools addreplacerg -r ID:x -r SM:x -o x.bam  x.sorted.bam

adds a tag (at least, I can see a tag in the file and I get no error), but to no avail:

ERROR(freebayes):  could not find SM: in @RG tag

so either there is no pleasing freebayes or samtools is not adding all the tags I am specifying.

ADD REPLYlink written 25 days ago by Fedster20

What's the output from:

$ samtools view -H x.bam

and

$ samtools view x.bam|head -n10

?

Have you used x.bam in your freebayes command?

fin swimmer

ADD REPLYlink written 25 days ago by finswimmer7.9k

technically I used not just x.bam but w whole lot of bams (sorted) generated by bowtie2. Using a real sample name:

samtools view -H S-177.sorted.bam |head
@HD VN:1.0  SO:coordinate
@SQ SN:MT   LN:15742
@SQ SN:groupI   LN:28185914
@SQ SN:groupII  LN:23295652
@SQ SN:groupIII LN:16798506
@SQ SN:groupIV  LN:32632948
@SQ SN:groupIX  LN:20249479
@SQ SN:groupV   LN:12251397
@SQ SN:groupVI  LN:17083675
@SQ SN:groupVII LN:27937443

samtools view -H S-177.sorted.bam |tail
@SQ SN:scaffold_1927    LN:175
@SQ SN:scaffold_1928    LN:149
@SQ SN:scaffold_1929    LN:138
@SQ SN:scaffold_1930    LN:134
@SQ SN:scaffold_1931    LN:132
@SQ SN:scaffold_1932    LN:107
@SQ SN:scaffold_1933    LN:67
@SQ SN:scaffold_1934    LN:60
@PG ID:bowtie2  PN:bowtie2  VN:2.3.4    CL:"/user/leuven/319/vsc31924/appl_taito/bowtie2-2.3.4-linux-x86_64/bowtie2-align-s --wrapper basic-0 -x /user/leuven/319/vsc31924/data/BOWTIE/tss -S S-177.sam -U S-177.flash.extendedFrags.fastq.gz"
@RG ID:S-177
ADD REPLYlink written 25 days ago by Fedster20
@RG ID:S-177

Here the sample name is missing. If you really used the command you used above this line should look like this:

@RG ID:x    SM:x

(Or whatever you take for x)

Try using ' around the the values in the samtools command (even if it wasn't necessary in my test)

fin swimmer

ADD REPLYlink written 25 days ago by finswimmer7.9k

and

samtools view S-177.sorted.bam |head -n5
K00335:147:HL3KJBBXX:7:1227:25286:38627 0   MT  6894    1   64M *   0   0   CTGCTAAACGTGAAGTTTTAGCAGTTGAAATAACAACAACAAACGTCGAGTGACTTCACGGCTG    JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ>JJJ    AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0XG:i:0    NM:i:0  MD:Z:64 YT:Z:UU RG:Z:S-177
 K00335:147:HL3KJBBXX:7:2225:26332:25668    0   MT  6894    1   64M *   0   0   CTGCTAAACGTGAAGTTTTAGCAGTTGAAATAACAACAACAAACGTCGAGTGACTTCACGGCTG    JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJ    AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0XG:i:0    NM:i:0  MD:Z:64 YT:Z:UU RG:Z:S-177
 K00335:147:HL3KJBBXX:7:2217:5852:2352  0   MT  8155    0   153M    *   0   0   CAGCCATCGCTATTGCCCTCCCCTGAGTCCTTCTTCCCACCCCTACTGCCCGATGAACAAGCAACCGATTTCTAGGTCTCCAGGGTTGATTTATTAATCGTTTCACACAACAGCTTCTCCTACCAGTAAACCTTGGAGGACATAAATGAGCAG   JJJJJJJJJJJJJJJJJJJAJJJFAJJJJJFFFFJFJ>FJJJJAFFA-FFJJJJFJJJJJJAJJJJJJJJFFFJAFAFFFJJJJJJJJJJFJJJJFJJAJFJJJJJJJJJJJFF<FA0FJ+%<0F<AJJJ5FAA0FJJJJJJJJJJJ7<-A--   AS:i:-37    XS:i:-37    XN:i:0  XM:i:7  XO:i:0  XG:i:0  NM:i:7  MD:Z:7T5C26T2C29G29T17T31   YT:Z:UU RG:Z:S-177
 K00335:147:HL3KJBBXX:7:1101:3315:43357 0   MT  14083   1   132M    *   0   0   CTGCTGAATATGCAAAAACAACTAATATTCCTCCTAAATAGATAAGAAACAAGACCAGGGATAAAAATGGACCCCCATGACTTACTAGCACGCCACAACCCATACCTGCTACCATCACCAACCCTAGAGCAG    JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJFJJJJFF<<A:JJ    AS:i:-22    XS:i:-22    XN:i:0  XM:i:4  XO:i:0XG:i:0    NM:i:4  MD:Z:25C26A56C16A5  YT:Z:UU RG:Z:S-177
 K00335:147:HL3KJBBXX:7:1103:13240:24261    0   MT  14083   1   132M    *   0   0   CTGCTGAATATGCAAAAACAACTAATATTCCTCCTAAATAGATAAGAAACAAGACCAGGGATAAAAATGGACCCCCATGACTTACTAGCACGCCACAACCCATACCTGCTACCATCACCAACCCTAGAGCAG    JJJJJJJJJJJJFFAFFJJJJJJJJJJJJJFJJA<JJJJAFFFFJFFJFJJJJFFFFFAJJAJJFAJJAJJFJJFAAAA<F<FF<<FJFFJJAJJJJJJJAJFJJJFJJJFFFAF<AFFF<FAAF<FA:*::    AS:i:-23    XS:i:-23    XN:i:0  XM:i:4  XO:i:0XG:i:0    NM:i:4  MD:Z:25C26A56C16A5  YT:Z:UU RG:Z:S-177
ADD REPLYlink modified 25 days ago • written 25 days ago by Fedster20
2
gravatar for h.mon
27 days ago by
h.mon21k
Brazil
h.mon21k wrote:

As finswimmer said, you need to add read group tags to each bam:

To call variants in a population of samples, each alignment must have a read group identifier attached to it (RG tag), and the header of the BAM file in which it resides must map the RG tags to sample names (SM). Furthermore, read group IDs must be unique across all the files used in the analysis. One read group cannot map to multiple samples. The reason this is required is that freebayes operates on a virtually merged BAM stream provided by the BamTools API. If merging the files in your analysis using bamtools merge would generate a file in which multiple samples map to the same RG, the files are not suitable for use in population calling, and they must be modified.

The freebayes readme also suggests a solution in case your bam don't have RG tags: you can stream your bam through bamaddrg and pipe directly to freebayes.

ADD COMMENTlink written 27 days ago by h.mon21k

finswimmer pointed out adding RG tags is best practice but technically not necessary. In any case I will give it a go as it is not working for me as it is.

ADD REPLYlink written 27 days ago by Fedster20

Technically it is not neccessary because even without a ReadGroup it is a valid bam file. But there are many downstream analyse programs that need this information, e.g. for joint variant calling, like you like to do, that need this information. So it is best practice to include this information from the beginning regardless what your final goal is.

Saying this you can of course use bamaddrg as a workaround. But I would recommend to fix your bam file by adding the readgroup with the sample name using samtools addreplacerg or picard AddOrReplaceReadGroups.

fin swimmer

ADD REPLYlink modified 27 days ago • written 27 days ago by finswimmer7.9k

One thing that is not clear from bamaddrg is whether the option -L (list of files) is then needed. Is it?

ADD REPLYlink written 27 days ago by Fedster20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1312 users visited in the last hour