Question: How to get a fake diploid sequence(each one from a population)
gravatar for jinzheng9406
6 months ago by
jinzheng94060 wrote:

I want to use PSMC to analysis the division time of two closely human population. For that, I need to construct a pseudo-diploid sequence. Now, I have these two human populations' bam file. How should I do?

Could I directly use below command?

samtools mpileup -C50 -uf <ref.fa> <file1.bam> <file2.bam> | bcftools view -c - | vcftools vcf2fq | gzip > diploid.fq.gz

I think this command may be wrong. Another way is getting haploid sequence from each bam file, and then use seqtk mergefa to merge the two haploid sequences. But I do not know how to get haploid sequence from the bam file.

Could someone give me some suggestions, please?

snp sequence alignment genome • 323 views
ADD COMMENTlink written 6 months ago by jinzheng94060

From samtools mpileup manual:

Generate VCF, BCF or pileup for one or multiple BAM files. Alignment records are grouped by sample (SM) identifiers in @RG header lines. If sample identifiers are absent, each input file is regarded as one sample.

If your BAMs do not keep sample (individual) information as @RG tags, your samtools command probably won't produce the correct result.

ADD REPLYlink modified 6 months ago • written 6 months ago by h.mon20k

Thank you.

My BAMs have the @RG tags, that's mean, if I input two BAMs one time, it can get the haploid sequence for each one?

ADD REPLYlink written 6 months ago by jinzheng94060
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1260 users visited in the last hour