Entering edit mode
                    4.2 years ago
        Michal Nevo
        
    
        ▴
    
    140
    Hey, I am looking for a way to add samples ID names to the FORMAT in my vcf file.
I have 10 sorted Bam files. I used Freebayes to create vcf files and my next step is merging all 10 files for VcfSampleCompare. And for that I need to define groups that match the sample ID in the vcf file but here is one of my vcf file:
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Number of observation for each 
allele">
##FORMAT=<ID=RO,Number=1,Type=Integer,Description="Reference allele observation 
count">
##FORMAT=<ID=QR,Number=1,Type=Integer,Description="Sum of quality of the reference 
observations">
##FORMAT=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observation count">
##FORMAT=<ID=QA,Number=A,Type=Integer,Description="Sum of quality of the alternate 
observations">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum depth in gVCF output 
block.">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  **unknown** 
NC_048323.1     461     .       G       T       28.0886 .AB=0;ABP=0;AC=2;AF=1;AN=2;AO=2;CIGAR=1X;DP=2;DPB=2;DPRA=0;EPP=3.0103;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=29;MQMR=0;NS=1;NUMALT=1;ODDS=6.46546;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=51;QR=0;RO=0;RPL=0;RPP=7.35324;RPPR=0;RPR=2;RUN=1;SAF=1;SAP=3.0103;SAR=1;SRF=0;SRP=0;SRR=0;TYPE=snp        GT:DP:AD:RO:QR:AO:QA:GL       1/1:2:0,2:0:0:2:51:-4.01203,-0.60206,0
And I want to fix it to be like that:
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/M1.sorted.bam   /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/M3.sorted.bam   /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/M5.sorted.bam  /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/M7.sorted.bam      /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/M9.sorted.bam   /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/F1.sorted.bam   /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/F2.sorted.bam   /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/F4.sorted.bam   /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/F6.sorted.bam     /storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/F8.sorted.bam
                    
                
                
bcftools merge http://samtools.github.io/bcftools/bcftools.html#merge (?)
Interesting.. Did you mean
--use-header FILE
use the VCF header in the provided text FILE ?
I mean that you can look up is tool and use it to combine VCF files.
Best way to merge multiple VCF files
Merging is not my problem, I did use bcftools merge. My problem is that the samples ID is unknown:
Probably the sample id is already missing in the original bam or vcf file, you could check that. There are some tools to add the sampleid to those files (dont know them on top of my head)
Think at this stage this can be an quick solution: bcftools merge; retaining sample names . Found a one liner to replace
**unknown**my command:
I get :
sed: -e expression #1, char 21: unknown option to `s'
using '\' before any '/' fixed it