Samtools Mpileup On Single Bam With Multiple Samples
2
1
Entering edit mode
12.7 years ago
Danielk ▴ 640

I have a fastq file where reads are named after the sample they belong to, like so

@S1-HWI...
AAGATTG
+
qualqual
@S2-HWI...
GGTGAGG
+
qualqual
...

After aligning this data I want to call variants for each sample and would enjoy a VCF file as output. Now to my question:

Is there a way I can use samtools mpileup for the SNP calling? Can I use the RG/LB tags somehow?

This would be extremely convenient in order to handle multiple samples in a single fast file.

cheers

//Daniel

samtools bam sam mpileup barcode • 5.2k views
ADD COMMENT
1
Entering edit mode
12.7 years ago
Pablo ★ 1.9k

You should add the @RG header to your SAM file.

Then add the SM tag to each read (which should be easy since it's in the name). A small script like this should do the trick:

#!/usr/bin/perl
while( $l = <> ) { 
  chomp $l; 
  $l ~= /^S(\d+)/ or die "Cannot find SM\n";
  print "$l\t$1\n";
}

Assuming the script is called addSm.pl, you can do:

cat my.sam > ./addSm.pl > my_SM.sam

Here is the quote from the manual:

SAMtools acquires sample information from the SM tag in the @RG header lines.

I hope this helps.

ADD COMMENT
0
Entering edit mode
12.7 years ago
Swbarnes2 ★ 1.6k

I don't know if samtools mpileup will split a single .bam into multiple samples, but if you can get each sample into its own .bam, then you can input multiple .bams into mpileup, and it will put all the data side by side. This can be a way to look for SNPs that are common or unique to your samples. Unfortunately, one of the most useful measure, the DP4, is a combined entry. But you can always consult a single sample vcf to get that for a particular sample.

ADD COMMENT

Login before adding your answer.

Traffic: 1335 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6