Question: 1000 genomes PCA for whole genome
0
gravatar for padakanti.sridevi
16 days ago by
padakanti.sridevi0 wrote:

I am trying to do PCA for the whole genome, is it possible with snp relate package of R to get the principal components for the whole genome ? If not please suggest some feasible methods

R 1000 genomes vcf • 262 views
ADD COMMENTlink modified 15 days ago by Kevin Blighe1.2k • written 16 days ago by padakanti.sridevi0
1
gravatar for Kevin Blighe
15 days ago by
Kevin Blighe1.2k
Republic of Ireland
Kevin Blighe1.2k wrote:

I have created a tutorial to do this. Take a look here: Produce PCA for 1000 Genomes Phase III in VCF format

ADD COMMENTlink modified 15 days ago • written 15 days ago by Kevin Blighe1.2k

Thank you for sharing. I tried to run this part of the code : plink --noweb --bfile /YourDir/1000Genomes/chr$chr.1kg.phase3.v5 --list-duplicate-vars

I get the error as :plink: "unknown option --noweb" plink: "unknown option --bcf" plink: "unknown option --keep-allele-order" plink: "unknown option --vcf -idspace-to"

. I downloaded plink-1.07

ADD REPLYlink modified 13 days ago • written 13 days ago by padakanti.sridevi0

It requires at least plink 1.9. I will add this note to the tutorial.

ADD REPLYlink written 13 days ago by Kevin Blighe1.2k

1.9 is working . Also, sed -i 's/.bim//g' /YourDir/ForMerge.list shows invalid command F.

ADD REPLYlink written 13 days ago by padakanti.sridevi0

This part below : cd /YourDir/ ; find /YourDir/ -name "*.bim" | grep -e "MAF" > /YourDir/ForMerge.list ; sed -i 's/.bim//g' /YourDir/ForMerge.list ;

Does it find .bim files from DupsRemoved directory ? I get a ForMerge.list with zero bytes

ADD REPLYlink written 13 days ago by padakanti.sridevi0

Hi friend,

The find command should list all *.bim files under /YourDir/

The grep command then extracts only those BIM files in the /YourDir/1000Genomes/MAF/ directory

The sed command just removes .bim from the ends of the files (required for when reading into plink)

You could also just write

find /YourDir/1000Genomes/MAF/ -name "*.bim" > /YourDir/ForMerge.list ;

sed -i 's/.bim//g' /YourDir/ForMerge.list ;

Does that work?

ADD REPLYlink modified 13 days ago • written 13 days ago by Kevin Blighe1.2k

I will try that ...thank you Kevin :)

ADD REPLYlink written 13 days ago by padakanti.sridevi0

I am using Mac , i tried sed -i' ' 's/.bim//g' ForMerge.list and it works. Iam trying to use plink --merge-list and i get this Error message : Problem with merge-list file line :should be either 2(PED/MAP) or 3(BED/BIM/FAM) fields.

ADD REPLYlink modified 12 days ago • written 13 days ago by padakanti.sridevi0

my ForMerge.list contains .bim files and MAF prune.out files

ADD REPLYlink modified 12 days ago • written 12 days ago by padakanti.sridevi0

I don't use MAC, apologies, but I will direct you to this thread about this specific issue of using sed on MAC: http://techslides.com/grep-awk-and-sed-in-bash-on-osx

I think that it is the -i switch/parameter that causes the issue. You could also not use the -i switch/parameter and then redirect the output to a new file. This switch/parameter just instructs sed to edit and save the file that you specify. Without it, it sends the modified version to stdout.

ForMerge.list should only contain bim files. Here is the version that I have on my laprop:

1000Genomes/MAF/chr11.1kg.phase3.v5.bim
1000Genomes/MAF/chr4.1kg.phase3.v5.bim
1000Genomes/MAF/chr22.1kg.phase3.v5.bim
1000Genomes/MAF/chr21.1kg.phase3.v5.bim
1000Genomes/MAF/chr2.1kg.phase3.v5.bim
1000Genomes/MAF/chr12.1kg.phase3.v5.bim
1000Genomes/MAF/chr13.1kg.phase3.v5.bim
1000Genomes/MAF/chrX.1kg.phase3.v5.bim
1000Genomes/MAF/chr6.1kg.phase3.v5.bim
1000Genomes/MAF/chr3.1kg.phase3.v5.bim
1000Genomes/MAF/chr8.1kg.phase3.v5.bim
1000Genomes/MAF/chr16.1kg.phase3.v5.bim
1000Genomes/MAF/chr14.1kg.phase3.v5.bim
1000Genomes/MAF/chr19.1kg.phase3.v5.bim
1000Genomes/MAF/chr10.1kg.phase3.v5.bim
1000Genomes/MAF/chr9.1kg.phase3.v5.bim
1000Genomes/MAF/chr5.1kg.phase3.v5.bim
1000Genomes/MAF/chr17.1kg.phase3.v5.bim
1000Genomes/MAF/chr18.1kg.phase3.v5.bim
1000Genomes/MAF/chr1.1kg.phase3.v5.bim
1000Genomes/MAF/chr15.1kg.phase3.v5.bim
1000Genomes/MAF/chr7.1kg.phase3.v5.bim
1000Genomes/MAF/chr20.1kg.phase3.v5.bim
ADD REPLYlink written 12 days ago by Kevin Blighe1.2k
0
gravatar for Sam
16 days ago by
Sam2.0k
London
Sam2.0k wrote:

If your genotype is in vcf format, you can convert it to plink format then use the --pca function from PLINK.

If you have large number of samples (which shouldn't be the case if you are using 1000G), then you can consider flashPCA

Note: If you are directly converting the vcf to plink format, make sure you handled the multi-allelic SNPs and you might also need to manually edit the bim file such that there isn't any duplicated SNP id

ADD COMMENTlink written 16 days ago by Sam2.0k

Thank you for reply :) I used the below code to convert the file to plink vcftools --gzvcf file.gz --out myfile --plink - tped. It gives me .tfam and .tped files. Will these two files be sufficient to proceed with PCA?

ADD REPLYlink modified 15 days ago • written 15 days ago by padakanti.sridevi0
1

As mentioned by GabrielMontenegro, you will need to do the following:

  1. clean the multi-allelic sites from your files
  2. remove or rename duplicated SNPs
  3. perform pruning using either --indep or --indep-pairwise
  4. Extract pruned SNPs with --extract
  5. perform PCA using --pca

(You can perform 4 and 5 together in one single command)

ADD REPLYlink written 15 days ago by Sam2.0k

Thank you . It was helpful

ADD REPLYlink written 13 days ago by padakanti.sridevi0

Can u please let me know how to go with cleaning the multi allelic sites. I am new to Genomics

ADD REPLYlink written 10 days ago by padakanti.sridevi0

You can read here. Though I do suggest you to follow Kevin's tutorial

ADD REPLYlink written 10 days ago by Sam2.0k

PLINK can handle VCF files. The problem as Sam mentioned is that you will first need to exclude multi-allelic sites. Also, before doing a PCA, you will have to remove SNPs on LD (i.e. prune your data). You can use the --indep or indep-pairwise function in PLINK too.

ADD REPLYlink written 15 days ago by GabrielMontenegro330
0
gravatar for padakanti.sridevi
15 days ago by
padakanti.sridevi0 wrote:

Will these two files be sufficient to proceed with PCA ?

ADD COMMENTlink written 15 days ago by padakanti.sridevi0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 746 users visited in the last hour