Question

Retain file names as sample names while making pileup

0

Entering edit mode

6.7 years ago

bioinfo8 ▴ 230

Hi,

I have different samples (exomes) a and b. In each sample, there are 5-10 individuals.

1) I have used following code for variant calling:

 >samtools mpileup -ugf ref.fa a*_sorted.bam > a.bcf          # pileup of 5 individuals (5 bam)
 >bcftools call -vmO v a.bcf > a.vcf 
 >vcfutils.pl varFilter -Q 10 -d 10 -D 200 a.vcf > a_filtered.vcf

Similarly, b_filtered.vcf was also generated.

2) I have a list of 10 genes for which I am interested to find variants from these two datasets (a and b) and used bcftools for annotation:

>bgzip genes_10sorted.bed
>tabix -p bed genes_10sorted.bed.gz     
>bcftools annotate -a genes_10sorted.bed.gz -c CHROM,FROM,TO,GENE -h <(echo '##INFO=<ID=GENE,Number=1,Type=String,Description="Gene name">') a_filtered.vcf.gz > a_filtered_ann10.vcf

3) Now I can see the gene names in the filtered and annotated vcf file a_filtered_ann10.vcf but I can't figure out the sample names as they are indicated with ERS561518, ERS561535, ERS561560, ERS561566, ERS561638.

How can I retain the file names as sample names while making pileup and keep them throughout?

Any guidance in this regard would be appreciated.

Thanks!

vcf variant calling bcftools gene samtools • 1.7k views

ADD COMMENT • link updated 6.7 years ago by WouterDeCoster 47k • written 6.7 years ago by bioinfo8 ▴ 230

1

Entering edit mode

I edited the title to make it more specific. I guess you should modify the read groups of your bam file.

ADD REPLY • link 6.7 years ago by WouterDeCoster 47k

0

Entering edit mode

Yes, there is a SM tag in @RG = ERS561518 of my first sample.

Should I edit manually or is there any automatic way?

ADD REPLY • link 6.7 years ago by bioinfo8 ▴ 230

1

Entering edit mode

Use Picard AddOrReplaceReadGroups.

ADD REPLY • link 6.7 years ago by WouterDeCoster 47k