Question

How to perform GWAS using BOLT-LMM iteratively for many phenotypes in bash

0

Entering edit mode

3.7 years ago

kl ▴ 10

Hello,

Does anyone know/have any code to perform GWAS using BOLT-LMM for many phenotypes iteratively in bash so it is more automated, rather than running a GWAS for each phenotype at a time?

gene genome • 2.5k views

ADD COMMENT • link updated 5 months ago by zx8754 11k • written 3.7 years ago by kl ▴ 10

zx8754 · Answer 1 · 2020-08-25

0

Entering edit mode

3.7 years ago

Sam ★ 4.7k

If all your samples have all the phenotype, the easiest way will be just generate one phenotype file containing all the phenotype information, then you can provide the all phenotype names to the --phenoCol. If, however, your samples contain some missing data, then you can still generate one phentoype file and then do

pheno=( "A" "B" "C" )
for i in `seq 1 ${#pheno[@]}`; do
    bolt-lmm --phenoCol ${pheno[${i}-1} .....;
done

Where you fill in the .... with other relevant commands.

ADD COMMENT • link 3.7 years ago by Sam ★ 4.7k

0

Entering edit mode

Hi Sam,

Thanks for your response. I was wondering if you can also advise on the following if you have used BOLT-LMM. I have a file with hard-called SNPS in .bim,.bed,.fam with my directly genotyped and imputed SNPs combined in these files. For example, chr1.bim has the directly genotyped and the imputed SNPs. For the flag, --modelsnps do we provide the .bim files all over again (they have been through QC before imputation so SNPs of poor quality etc have already been removed)? Do I need to provide files for the following arguments as they are all about dosages? I only had dosages when I downloaded my imputed data from Michigan server but I then converted them to plink format...

--dosageFile=EUR_subset.dosage.chr17first100 \
--dosageFile=EUR_subset.dosage.chr22last100.gz \
--dosageFidIidFile=EUR_subset.dosage.indivs \
--statsFileDosageSnps=example.dosageSnps.stats \
--impute2FileList=EUR_subset.impute2FileList.txt \
--impute2FidIidFile=EUR_subset.impute2.indivs \
--statsFileImpute2Snps=example.impute2Snps.stats \
--dosage2FileList=EUR_subset.dosage2FileList.txt \
--statsFileDosage2Snps=example.dosage2Snps.stats \

I've pasted the code I would use for my data type below. I would really appreciate your advice. Thanks!

SKELETON OF CODE I WOULD USE:
../bolt \
    --bfile=EUR_subset \
    --remove=EUR_subset.remove \
    --exclude=EUR_subset.exclude \
    --phenoFile=EUR_subset.pheno.covars \
    --phenoCol=PHENO \
    --covarFile=EUR_subset.pheno.covars \
    --covarCol=CAT_COV \
    --qCovarCol=QCOV{1:2} \
    --modelSnps=EUR_subset \
    --lmm \
    --LDscoresFile=../tables/LDSCORE.1000G_EUR.tab.gz \
    --numThreads=2 \
    --statsFile=example.stats

ADD REPLY • link updated 5 months ago by zx8754 11k • written 3.6 years ago by kl ▴ 10

0

Entering edit mode

Once you convert the file into plink format, you lost the dosage information. As a result of that, you can use the plink file as if you only got the genotype file.

ADD REPLY • link 3.6 years ago by Sam ★ 4.7k

0

Entering edit mode

Great thanks. For the --modelSnps file, can I include all the SNPs or would it be best to go into each of my dosage pre-processed chromosome files and include those with a good INFO score?

ADD REPLY • link 3.6 years ago by kl ▴ 10

0

Entering edit mode

Filtering will help to remove problematic SNPs, so do try to filter by INFO score first