BAM dataset to Genotype data conversion using PLINK
1
0
Entering edit mode
7 weeks ago
BENYEBUGA • 0

What commands do I enter using PLINK on Ubuntu to call and convert this hg19 BAM data set (66 samples) into a Genotype data format (txt, csv...) ?

Data: https://www.ebi.ac.uk/ena/browser/view/PRJEB42975?show=reads

Pardon if I've worded the question awkwardly, I'm rather new at this. I can provide more detailed context if needed -- any help would be much appreciated!

  • bam and fastq.gz files from the link are also downloaded on my hard drive; I'm running on ubuntu with a Win10 OS.

Paper: https://www.biorxiv.org/content/10.1101/2021.02.17.431423v1

BAM Genotype hg19 PLINK • 282 views
ADD COMMENT
1
Entering edit mode
7 weeks ago
4galaxy77 ★ 1.3k

You must use a genotype caller in order to obtain genotypes from a .bam file. It's not possible to 'convert' .bam to genotypes.

There's a lot of options, but maybe using bcftools is the most simple. Take a read of this pipeline.

bcftools mpileup -Ou -f <ref.fa> <sample1.bam> <sample2.bam> <sample3.bam> | bcftools call -vmO z -o <study.vcf.gz>
ADD COMMENT
0
Entering edit mode

I'm getting the following error when executing:

[E::faidx_adjust_position] The sequence "1" was not found

Is this an issue with the header descriptions being misread (spaces, comments)? If so, what command can I use to correct the error downstream with bcftools?

  • I've read online this command might correct the issue but don't know where to place it within the pipeline command " trimreaddescriptions=t "

Thanks again for the pointers,

For reference, this is the command I'm using with the hg19.fa reference:

bcftools mpileup -Ou -f hg19.fa I19139.hg19.bam | bcftools call -vmO z -o Nubians.vcf.gz

ADD REPLY

Login before adding your answer.

Traffic: 1917 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6