plink bim file stops at chrM
1
0
Entering edit mode
7 weeks ago
rturba ▴ 10

Hello! I am in need of some help. I have a VCF file that was generated using a reference genome where the chromosomes are named in roman numerals: chrI, chrII... chrM, chrV, etc. Which means that they are sorted alphabetically and not numerically, therefore my chromosomes have a silly order, with chrM listed in the middle for example (why lord!).

I've tried renaming them using only single digits and letters (1,2,3... M, X, Y) using bcftools annotate before I generate my bfiles using plink. The issue is that because my chrM was listed somewhere in the middle, when I try to make the bfiles, my BIM file stops when it reaches the M. This is the command I used:

bcftools norm -Ou -m -any $file.vcf.gz | bcftools norm -Ou -f$ref |
bcftools annotate -Ob -x ID \
-I +'%CHROM:%POS:%REF:%ALT' |
--keep-allele-order \
--const-fid \
--allow-extra-chr \
--make-bed \
--chr-set 24 \ #I also tried --output-chr M
--out \$file

Is there a simple way to address this in the plink command? I'm trying to figure out a way to sort my VCF so the chrM is listed last also, but so far it has been a struggle and I must be thinking about this wrong! Ugh D:

plink chr bim • 778 views
0
Entering edit mode
7 weeks ago

plink 1.x --make-bed automatically sorts the variants in an order that puts chrM at the end.

(This behavior was changed in plink 2.0; you can still sort with --make-bed --sort-vars, but if you don't include --sort-vars in the command line the original VCF order is preserved.)

0
Entering edit mode

Hi @chrchang523, thanks for the reply! I'm using plink 1.9 but the variants are not being sorted. At least what I have noticed is that in the BIM file the program gives the chrM the last number, but the order remains the same and the file ends there, so I only have: 1 (chrI), 2 (chrII), 3 (chrIII), 4 (chrIV), 9 (chrIX), M (chrM). My species has 24 chromosomes total (including chrM).

0
Entering edit mode

Please post or send me a VCF file that illustrates what you're talking about, along with the plink .log file.

0
Entering edit mode

Hi, @chrchang523. I went to check my VCF file and I've noticed that after my chrM I had renamed my chrUn as 0 while my reference file had it named U. I think that instead of skipping this part, the whole thing just stopped there, so I'm re-running this to check. I'm running into --memory issues now, so when I'm done I'll get back here to clarify if the issue still persists.

0
Entering edit mode

OK, so it was my fault! I had the chrUn renamed differently on my VCF and REF file so I think that is solved. However, now I'm trying to update de FIDs using the --update-ids command and I'm getting the error: Invalid chromosome code '28' on line 40749796 of .bim file. Which is my chrM. Weird is that I did --set-chr 24. Hmmmm... now I think I understand the instructions of the --chr-set. So I define only the number for autosomes, and the rest the program will recognize automatically as X, Y and M? And will it be treating my data as human, even though I've defined a different set?

0
Entering edit mode

So, how would you advise I treat my chrUn (unassigned)? Currently it's named just as U. Should I assign it a number and treat it as an autosome?

1
Entering edit mode

That is what the --allow-extra-chr flag is for.

0
Entering edit mode

Awesome! Thank you so much for the help. When I defined --chr-set 20 #20 autosomal (excluding chrUn), and the --allow-extra-chr, I was able to run the --update-ids command with no error.

0
Entering edit mode

Actually, (sorry @chrchang523, this seems like a never ending issue!), I've just checked my BIM output and there seems to be an issue with chrX. The output is like so:

21  X:2937481:C:T   0   2937481 T   C
21  X:2937493:C:CT  0   2937493 CT  C
21  21:2937504:AT:A 0   2937504 A   AT
21  21:2937731:A:G  0   2937731 G   A
21  21:2937776:C:T  0   2937776 T   C

When I renamed my chromosomes, I did have a chr21 and a chrX. It seems they are being conflated. Is there a way to prevent this?

1
Entering edit mode

That's due to your incorrect use of --chr-set 20.

0
Entering edit mode

Hmmmmmmmmmmmm... It's because the way this genome was defined was that chr19 is the chrX. So I renamed from chrXIX to X and therefore it skips the 19 altogether. I was counting as 20 total. Thanks so much!