PCa Biplot For RNA-Seq Study Samples With 1k Genome Reference Panel
Entering edit mode
5 weeks ago
ssankar3 • 0

Hey there,

I'm very new to bioinformatics in general, so I apologize for the confusion.

I have ~ 30 RNA seq samples corresponding to self-reported White-American and Black-Americans. Because it's self-reported, I want to examine the ancestry genetically (Note: I make no claims about the ethnicity of these individuals) and perform an admixture analysis. I hope to compare my samples against the 1k genome dataset.

So far, I've generated the 1k genome reference panel as a PLINK file according to this tutorial (Produce PCA bi-plot for 1000 Genomes Phase III - Version 2). However, I'm not sure where to begin with repeating the same for my study samples. I've repeated similar steps with my study samples (variant calling with samtools mpileup, then conversion to a bcf file, then conversion to a PLINK file). However, I'm not sure about how I would perform pruning on these study samples. Because PLINK uses founder information to calculate linkage disequilibrium (I'm confused about what this means), I don't think I'll be able to use the --indep flag. Would it be fine to just skip this step?

I then intend to filter both of the datasets with common variants between them, merge them together, then pca plot them. Then, I intend to use the ADMIXTURE program using the 1000 genome reference panel against my study samples.

I just want to confirm that my general workflow is sensible and that I'm not missing any glaring issues.



ethnicity snp hapmap • 119 views

Login before adding your answer.

Traffic: 2550 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6