Question

PLINK 1.9 and ADMIXTURE 1.3.0 Help with renaming

0

Entering edit mode

4 months ago

Jens • 0

Hello everyone,

I want to analyse my genomic data. I already created the .bim .bed and .fam files from PLINK. But for Admixture I have to renamed my chromsome names:

CM039442.1  --> 2
CM039443.1 --> 3
CM039444.1 --> 4
CM039445.1 --> 5
CM039446.1 --> 6
CM039447.1 --> 7
CM039448.1 --> 8
CM039449.1 --> 9
CM039450.1 --> 10

I tried a lot of different approaches, but eather i got invalid chr names, empty .bim files, use integers, no variants remeining or what ever. I would show you two of my approaches, i don´t know how to solve this problem.

The new file is always not accepted by Admixture.

One of my code approaches is followed:

 input_dir="/data/XY"
output_dir="$input_dir"

#Go to directory
cd "$input_dir" || { echo "Input not found"; exit 1; }

#Copy old .bim .bed .fam
cp filtered_genomedata.bim filtered_genomedata_renamed.bim
cp filtered_genomedata.bed filtered_genomedata_renamed.bed
cp filtered_genomedata.fam filtered_genomedata_renamed.fam

#Renaming old chromosome names to simple 1, 2, 3 ... (1 = ChrX = 51)
#FS=field seperator
#"\t" seperate only with tabulator
#OFS=output field seperator
#echo 'Renaming chromosomes in .bim file'
#awk 'BEGIN{FS=OFS="\t"; map["CM039442.1"]=2; map["CM039443.1"]=3; map["CM039444.1"]=4; map["CM039445.1"]=5; map["CM039446.1"]=6; map["CM039447.1"]=7; map["CM039448.1"]=8; map["CM039449.1"]=9; map["CM039450.1"]=10;}
#{if ($1 in map) $1 = map[$1]; print }' filtered_genomedata_renamed.bim > tmp && mv tmp filtered_genomedata_renamed.bim

#Creating a list of allowed chromosomes (2 to 10)
#END as a label in .txt
cat << END > allowed_chromosomes.txt
CM039442.1 2
CM039443.1 3
CM039444.1 4
CM039445.1 5
CM039446.1 6
CM039447.1 7
CM039448.1 8
CM039449.1 9
CM039450.1 10
END

#Names of the chromosomes and their numbers
#2 CM039442.1 2
#3 CM039443.1 3
#4 CM039444.1 4
#5 CM039445.1 5
#6 CM039446.1 6
#7 CM039447.1 7
#8 CM039448.1 8
#9 CM039449.1 9
#10 CM039450.1 10

#Second filter with only including chromosomes (renamed ones)
#NR=the running line number across all files
#FNR=the running line number only in the current file
echo 'Starting second filtering'
awk 'NR==FNR { chrom[$1]; next } ($1 in chrom)' allowed_chromosomes.txt filtered_genomedata_renamed.bim > filtered_genomedata_renamed.filtered.bim



echo 'Renaming chromosomes in .bim file'
awk 'BEGIN{FS=OFS="\t"} NR==FNR { map[$1]=$2; next } { if ($1 in map) $1=map[$1]; print }' allowed_chromosomes.txt filtered_genomedata_renamed.filtered.bim > filtered_genomedata.renamed.bim




#awk '$1 >= 2 && $1 <= 10' filtered_genomedata_renamed.bim > tmp_bim
cut -f2 filtered_genomedata.renamed.bim > Hold_SNPs.txt

#Creating new .bim .bed .fam data for using in admixture
#ATTENTION admixture cannot use letters
echo 'Creating new files for ADMIXTURE'
plink --bfile filtered_genomedata.renamed --extract Hold_SNPs.txt --make-bed --aec --threads 30 --out filtered_genomedata_admixture

if [ $? -ne 0 ]; then
echo 'PLINK failed. Go to exit.'
exit 1
fi

#Reading PLINK data .bed .bim .fam
#Finding the best K-value for calculation
echo 'Running ADMIXTURE K2...K10'
for K in $(seq 2 10); do
echo "Finding best ADMIXTURE K value K=$K"
admixture -j30 --cv filtered_genomedata_admixture.bed $K | tee "${output_dir}/log${K}.out"
done

echo "Log data for K value done"

Second apporach:
-----------------------------------------------------
input_dir="/data/XY"
output_dir="$input_dir"

cd "$input_dir" || { echo "Input directory not found"; exit 1; }

cp filtered_genomedata.bim filtered_genomedata_work.bim
cp filtered_genomedata.bed filtered_genomedata_work.bed
cp filtered_genomedata.fam filtered_genomedata_work.fam

cat << END > chr_map.txt
CM039442.1 2
CM039443.1 3
CM039444.1 4
CM039445.1 5
CM039446.1 6
CM039447.1 7
CM039448.1 8
CM039449.1 9
CM039450.1 10
END

plink --bfile filtered_genomedata_work --aec --update-chr chr_map.txt --make-bed --out filtered_genomedata_numericchr


head filtered_genomedata_numericchr.bim
cut -f1 filtered_genomedata_numericchr.bim | sort | uniq

cut -f2 filtered_genomedata_numericchr.bim > Hold_SNPs.txt

plink --bfile filtered_genomedata_numericchr --aec --extract Hold_SNPs.txt --make-bed --threads 30 --out filtered_genomedata_admixture

if [ $? -ne 0 ]; then
  echo "PLINK failed. Exiting."
  exit 1
fi

echo "Running ADMIXTURE K2...K10"
for K in $(seq 2 10); do
  echo "Running ADMIXTURE for K=$K"
  admixture -j30 --cv filtered_genomedata_admixture.bed $K | tee "${output_dir}/log${K}.out"
done

echo "ADMIXTURE analysis completed."

I am really lost and i don´t see the problem.

Thank you for any help.

Admixture error Plink debugging coding • 659 views

ADD COMMENT • link updated 4 months ago by Corentin ▴ 660 • written 4 months ago by Jens • 0

0

Entering edit mode

Hi, it would be helpful if you could give examples of the errors you are obtaining from Admixture.

Maybe try to just run update-chr on the original set of plink file and then launch Admixture with a specific value of k.

ADD REPLY • link 4 months ago by Corentin ▴ 660