Infer ancestry for RNA-seq data
12 weeks ago

I generated VCF files with bcftools for 4 patient RNA-seq samples.

I was also able to generate bed, bim, and fam files with PLINK for these files.

I want some guidance on how to infer ancestry for these RNA-seq samples: How do I find the common SNPs among the 4 samples?

How do I combine the bed files for my data with bed files for reference data (e.g., the 1000 Genomes project)?

The head of my bim files for my RNA-seq samples look like this:

NC_000001.11    .       0       14653   T       C
NC_000001.11    .       0       14775   T       C
NC_000001.11    .       0       16141   T       C
NC_000001.11    .       0       16288   G      C
NC_000001.11    .       0       16298   T       C

The head of the reference bim files that I found look like this:

1       1:11008 0       11008   G       C
1       1:11012 0       11012   G       C
1       1:13110 0       13110   A       G
1       1:13116 0       13116   G       T
1       1:13118 0       13118   G       A

I'm not sure plink would be able to merge files for the variants identified.

