Generating mpileup file using samtools
Entering edit mode
10 weeks ago
Ruqaiya • 0

Hello, I am working to reproduce variant calling and detection part of a paper.

enter image description here

I have asked chatgpt about it and it gave me

Step 1: Generate Pileup File      
samtools mpileup -uf reference.fa alignment.bam > output.pileup

    Step 2: Call Variants    
bcftools call -mv -Ov output.pileup > variants.vcf

    Step 3: Filter Variants
    bcftools filter -i 'QUAL > 50 && INFO/MQ > 30 && FMT/DP > 3 && FMT/GQ > 50 && (FMT/AO[0] == 0 || FMT/AO[0] == FMT/DP) && (FMT/AO[1] == 0 || FMT/AO[1] == FMT/DP) && (FMT/RO + FMT/AO[0] >= 0.8 * FMT/DP) && (FMT/RO + FMT/AO[1] >= 0.8 * FMT/DP)' variants.vcf > filtered_variants.vcf

This code and the first step doesn't work for me. WSL says

[warning] samtools mpileup option `u` is functional, but deprecated. Please switch to using bcftools mpileup in future.
[mpileup] 1 samples in 1 input files

I dont understand if I can use bcftools in the 1st step or not since the paper used samtools for the same. Thanks!

mpileup samtools • 516 views
Entering edit mode

It's been a long time since I did this sort of thing, but I have a vague recollection that the mpileup process was made simpler and/or rolled in to other tools (but I could be wrong) such that this process is somewhat obsolete.

My advice would be ignore ChatGPT and just refer to the actual software manuals and make sure you know your versions. Remember that GPT is trained on a lot of historical code and may not accurately reflect the best way to approach things.

If you're trying to reproduce that dataset exactly, down to the last variant call, you may need to pin the versions of software you use to those which the paper used (depends how old it is and how 'clean' the SNP signal is).

Entering edit mode

Seconding this. Apply current best practices (which is bcftools mpileup followed by something I forgot, see bcftools manual for variant calling) rather than sticking with deprecated approaches. Careful with ChatGPT. It's cool to get an idea, but it is not up to date with quickly evolving software such as samtools and others in the bioinformatics realm.

Entering edit mode

I just realised I didn't align my reads with the tools they used and used bowtie2 instead. I can't download the older version that is mentioned in the paper

Also, if i used bcftools instead of samtools unlike in the paper, will I be not able to produce the same data/result ?

Entering edit mode

Older versions of software are usually available via distribution tools or the websites. It may require a lot of digging, but its almost certainly out there somewhere.

You could try with bcftools and just see what kind of results you get. How important reproducing the paper perfectly is will depend on the question you're asking.

You could always also try and contact the authors and have them send you any intermediate files they have.

Entering edit mode

I didn't use the same tool as in the paper...


Login before adding your answer.

Traffic: 1629 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6