bcftools merge; retaining sample names
2
0
Entering edit mode
7.7 years ago
Lee Katz ★ 3.1k

When I do bcftools merge, the headers do not retain the filenames. How can I specify filenames?

This is my command

bcftools merge vcf/unfiltered/*.vcf.gz -O z > msa/pooled.vcf.gz

However this is the relevant part of my header, despite the filenames I gave it. Is it just up to me to parse the mergeCommand line? Or is there a way to use bcftools query to get the right headers after the fact?

##bcftools_mergeVersion=0.2.0-rc7-47-g02a1fb3+htslib-0.2.0-rc7-36-g6e2ebc4
##bcftools_mergeCommand=merge -O z vcf/unfiltered/lambda_virus.fasta.wgsim.fastq.gz-lambda_virus.vcf.gz vcf/unfiltered/lambda_virus.fasta.wgsim.fastq.gz-reference.vcf.gz vcf/unfiltered/sample1.fastq.gz-lambda_virus.vcf.gz vcf/unfiltered/sample1.fastq.gz-reference.vcf.gz vcf/unfiltered/sample2.fastq.gz-lambda_virus.vcf.gz vcf/unfiltered/sample2.fastq.gz-reference.vcf.gz vcf/unfiltered/sample3.fastq.gz-lambda_virus.vcf.gz vcf/unfiltered/sample3.fastq.gz-reference.vcf.gz vcf/unfiltered/sample4.fastq.gz-lambda_virus.vcf.gz vcf/unfiltered/sample4.fastq.gz-reference.vcf.gz 
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample1 2:Sample1       3:Sample1       4:Sample1       5:Sample1       6:Sample1       7:Sample1       8:Sample1       9:Sample1       10:Sample1
bcftools vcf samtools merge • 8.7k views
ADD COMMENT
2
Entering edit mode
7.7 years ago

how about renaming the samples in all your input *.vcf before calling bcftools ?

ADD COMMENT
0
Entering edit mode

How do you do that? Just change the header from

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample1

to

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT lambda_virus.fasta

?

Or is there a Vcf.pm method? Or bcftools/vcftools method?

ADD REPLY
1
Entering edit mode
sed '/^#CHROM/s/Sample1/lambda_virus.fasta/' in.vcf > out.vcf
ADD REPLY
0
Entering edit mode

I can't figure out your sed magic but it essentially works, thanks! This is my full system call.

varscan.sh mpileup2cns $pileup --min-coverage $$settings{coverage} --min-coverage 10 --min-var-freq 0.75 --output-vcf 1 |\
    perl -lane 's/Sample1/\Q$vcf\E/; print;' |\
    bgzip -c > $vcf
ADD REPLY
1
Entering edit mode

Read the sed command so:

From the file in.vcf

In lines that begin with "#CHROM" (/^#CHROM)

substitute "Sample1" with "lambda_virus.fasta" (/s/Sample1/lambda_virus.fasta)

and write the output to "out.vcf" (>out.vcf)

Put together,

sed '/^#CHROM/s/Sample1/lambda_virus.fasta/' in.vcf > out.vcf
ADD REPLY
0
Entering edit mode

my command:

sed '/^#CHROM/s/unknown//storage/users/IsanaRNA/FISH_DATA/MappingToAcipenserRuthenusGenome/results/Bam_sorted/F1.sorted.bam/' five_contigs_cp.vcf > out.vcf

I get :

sed: -e expression #1, char 21: unknown option to `s'
ADD REPLY
0
Entering edit mode

You may need to use the -E option. Plus, if you have / characters in your expressions, don't use / as the sed-separator. Use something like | instead:

sed '|^#CHROM|s|...'
ADD REPLY
0
Entering edit mode
7.7 years ago
Lee Katz ★ 3.1k

Answer was essentially from Pierre: find and replace Sample1 with the correct name in each corresponding vcf file

ADD COMMENT
1
Entering edit mode

I moved my comment to an answer

ADD REPLY

Login before adding your answer.

Traffic: 1342 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6