Question: How to get a fake diploid sequence(each one from a population)
gravatar for jinzheng9406
3 months ago by
jinzheng94060 wrote:

I want to use PSMC to analysis the division time of two closely human population. For that, I need to construct a pseudo-diploid sequence. Now, I have these two human populations' bam file. How should I do?

Could I directly use below command?

samtools mpileup -C50 -uf <ref.fa> <file1.bam> <file2.bam> | bcftools view -c - | vcftools vcf2fq | gzip > diploid.fq.gz

I think this command may be wrong. Another way is getting haploid sequence from each bam file, and then use seqtk mergefa to merge the two haploid sequences. But I do not know how to get haploid sequence from the bam file.

Could someone give me some suggestions, please?

snp sequence alignment genome • 165 views
ADD COMMENTlink written 3 months ago by jinzheng94060

From samtools mpileup manual:

Generate VCF, BCF or pileup for one or multiple BAM files. Alignment records are grouped by sample (SM) identifiers in @RG header lines. If sample identifiers are absent, each input file is regarded as one sample.

If your BAMs do not keep sample (individual) information as @RG tags, your samtools command probably won't produce the correct result.

ADD REPLYlink modified 3 months ago • written 3 months ago by h.mon15k

Thank you.

My BAMs have the @RG tags, that's mean, if I input two BAMs one time, it can get the haploid sequence for each one?

ADD REPLYlink written 12 weeks ago by jinzheng94060
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1301 users visited in the last hour