How to perform GWAS using BOLT-LMM iteratively for many phenotypes in bash
1
0
Entering edit mode
3.7 years ago
kl ▴ 10

Hello,

Does anyone know/have any code to perform GWAS using BOLT-LMM for many phenotypes iteratively in bash so it is more automated, rather than running a GWAS for each phenotype at a time?

gene genome • 2.5k views
ADD COMMENT
0
Entering edit mode
3.7 years ago
Sam ★ 4.7k

If all your samples have all the phenotype, the easiest way will be just generate one phenotype file containing all the phenotype information, then you can provide the all phenotype names to the --phenoCol. If, however, your samples contain some missing data, then you can still generate one phentoype file and then do

pheno=( "A" "B" "C" )
for i in `seq 1 ${#pheno[@]}`; do
    bolt-lmm --phenoCol ${pheno[${i}-1} .....;
done

Where you fill in the .... with other relevant commands.

ADD COMMENT
0
Entering edit mode

Hi Sam,

Thanks for your response. I was wondering if you can also advise on the following if you have used BOLT-LMM. I have a file with hard-called SNPS in .bim,.bed,.fam with my directly genotyped and imputed SNPs combined in these files. For example, chr1.bim has the directly genotyped and the imputed SNPs. For the flag, --modelsnps do we provide the .bim files all over again (they have been through QC before imputation so SNPs of poor quality etc have already been removed)? Do I need to provide files for the following arguments as they are all about dosages? I only had dosages when I downloaded my imputed data from Michigan server but I then converted them to plink format...

--dosageFile=EUR_subset.dosage.chr17first100 \
--dosageFile=EUR_subset.dosage.chr22last100.gz \
--dosageFidIidFile=EUR_subset.dosage.indivs \
--statsFileDosageSnps=example.dosageSnps.stats \
--impute2FileList=EUR_subset.impute2FileList.txt \
--impute2FidIidFile=EUR_subset.impute2.indivs \
--statsFileImpute2Snps=example.impute2Snps.stats \
--dosage2FileList=EUR_subset.dosage2FileList.txt \
--statsFileDosage2Snps=example.dosage2Snps.stats \

I've pasted the code I would use for my data type below. I would really appreciate your advice. Thanks!

SKELETON OF CODE I WOULD USE:
../bolt \
    --bfile=EUR_subset \
    --remove=EUR_subset.remove \
    --exclude=EUR_subset.exclude \
    --phenoFile=EUR_subset.pheno.covars \
    --phenoCol=PHENO \
    --covarFile=EUR_subset.pheno.covars \
    --covarCol=CAT_COV \
    --qCovarCol=QCOV{1:2} \
    --modelSnps=EUR_subset \
    --lmm \
    --LDscoresFile=../tables/LDSCORE.1000G_EUR.tab.gz \
    --numThreads=2 \
    --statsFile=example.stats 
ADD REPLY
0
Entering edit mode

Once you convert the file into plink format, you lost the dosage information. As a result of that, you can use the plink file as if you only got the genotype file.

ADD REPLY
0
Entering edit mode

Great thanks. For the --modelSnps file, can I include all the SNPs or would it be best to go into each of my dosage pre-processed chromosome files and include those with a good INFO score?

ADD REPLY
0
Entering edit mode

Filtering will help to remove problematic SNPs, so do try to filter by INFO score first

ADD REPLY
0
Entering edit mode

Is there a way to create a separate job report for each phenotype?

ADD REPLY
0
Entering edit mode

That depends on your job submission system

ADD REPLY

Login before adding your answer.

Traffic: 1993 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6