Hi, I am very new to this area, and I am taking a class about bioinformatics. For an independent project assignment, I need to do a GWAS. I am using the bash terminal. I downloaded all the fastq I need, trimmed them, and converted them into bam/sam then vcf then bed/bim/fam etc. However, when I tried to perform GWAS in plink, I realized I dont have phenotype data. It supposed to have two phenotypes.
Basically there are two groups/phenotypes of fastq files, each containing 29 samples. Let's say they are group 1 and 2. For each group, I converted every fastq to sam then bam, then I combined 29 bam to one bam. Then I combined two bams (for the two groups) together to a vcf.gz. Then there is no phenotype data in the following plink files.
Would really appreciate any help! like which step I might have been wrong, or what I should do to incorporate the phenotype data. Ultimately this is only an assignment, so I dont have to be perfect at every detail (like the QC steps), and I am afraid I cannot understand too complicated codes. I just want to go to the end and get a Manhattan plot or something. If there is another pipeline to do so that's also fine.
Please don't put 'urgent' in all caps. Your question is no more important than anyone else's. The error is that you combined the .bams prior to variant calling. I think you should have called variants separately for each sample and then run a GWAS on those variants.
Sorry for the confusion and wording, and thank you so much for the response! I see your point, so I will try to create vcf files for the two groups seperately. What should I do after that? Is there a way to run plink with two vcf files? Or how should I combine the two vcf while incorporating the phenotypes?