PCa Biplot For RNA-Seq Study Samples With 1k Genome Reference Panel
5 weeks ago
ssankar3 • 0

Hey there,

I'm very new to bioinformatics in general, so I apologize for the confusion.

I have ~ 30 RNA seq samples corresponding to self-reported White-American and Black-Americans. Because it's self-reported, I want to examine the ancestry genetically (Note: I make no claims about the ethnicity of these individuals) and perform an admixture analysis. I hope to compare my samples against the 1k genome dataset.

So far, I've generated the 1k genome reference panel as a PLINK file according to this tutorial (Produce PCA bi-plot for 1000 Genomes Phase III - Version 2). However, I'm not sure where to begin with repeating the same for my study samples. I've repeated similar steps with my study samples (variant calling with samtools mpileup, then conversion to a bcf file, then conversion to a PLINK file). However, I'm not sure about how I would perform pruning on these study samples. Because PLINK uses founder information to calculate linkage disequilibrium (I'm confused about what this means), I don't think I'll be able to use the --indep flag. Would it be fine to just skip this step?

I then intend to filter both of the datasets with common variants between them, merge them together, then pca plot them. Then, I intend to use the ADMIXTURE program using the 1000 genome reference panel against my study samples.

I just want to confirm that my general workflow is sensible and that I'm not missing any glaring issues.



ethnicity snp hapmap • 119 views

