PLINK 1.9 and ADMIXTURE 1.3.0 Help with renaming
0
0
Entering edit mode
6 weeks ago
Jens • 0

Hello everyone,

I want to analyse my genomic data. I already created the .bim .bed and .fam files from PLINK. But for Admixture I have to renamed my chromsome names:

CM039442.1  --> 2
CM039443.1 --> 3
CM039444.1 --> 4
CM039445.1 --> 5
CM039446.1 --> 6
CM039447.1 --> 7
CM039448.1 --> 8
CM039449.1 --> 9
CM039450.1 --> 10

I tried a lot of different approaches, but eather i got invalid chr names, empty .bim files, use integers, no variants remeining or what ever. I would show you two of my approaches, i don´t know how to solve this problem.

The new file is always not accepted by Admixture.

One of my code approaches is followed:

 input_dir="/data/XY"
output_dir="$input_dir"

#Go to directory
cd "$input_dir" || { echo "Input not found"; exit 1; }

#Copy old .bim .bed .fam
cp filtered_genomedata.bim filtered_genomedata_renamed.bim
cp filtered_genomedata.bed filtered_genomedata_renamed.bed
cp filtered_genomedata.fam filtered_genomedata_renamed.fam

#Renaming old chromosome names to simple 1, 2, 3 ... (1 = ChrX = 51)
#FS=field seperator
#"\t" seperate only with tabulator
#OFS=output field seperator
#echo 'Renaming chromosomes in .bim file'
#awk 'BEGIN{FS=OFS="\t"; map["CM039442.1"]=2; map["CM039443.1"]=3; map["CM039444.1"]=4; map["CM039445.1"]=5; map["CM039446.1"]=6; map["CM039447.1"]=7; map["CM039448.1"]=8; map["CM039449.1"]=9; map["CM039450.1"]=10;}
#{if ($1 in map) $1 = map[$1]; print }' filtered_genomedata_renamed.bim > tmp && mv tmp filtered_genomedata_renamed.bim

#Creating a list of allowed chromosomes (2 to 10)
#END as a label in .txt
cat << END > allowed_chromosomes.txt
CM039442.1 2
CM039443.1 3
CM039444.1 4
CM039445.1 5
CM039446.1 6
CM039447.1 7
CM039448.1 8
CM039449.1 9
CM039450.1 10
END

#Names of the chromosomes and their numbers
#2 CM039442.1 2
#3 CM039443.1 3
#4 CM039444.1 4
#5 CM039445.1 5
#6 CM039446.1 6
#7 CM039447.1 7
#8 CM039448.1 8
#9 CM039449.1 9
#10 CM039450.1 10

#Second filter with only including chromosomes (renamed ones)
#NR=the running line number across all files
#FNR=the running line number only in the current file
echo 'Starting second filtering'
awk 'NR==FNR { chrom[$1]; next } ($1 in chrom)' allowed_chromosomes.txt filtered_genomedata_renamed.bim > filtered_genomedata_renamed.filtered.bim



echo 'Renaming chromosomes in .bim file'
awk 'BEGIN{FS=OFS="\t"} NR==FNR { map[$1]=$2; next } { if ($1 in map) $1=map[$1]; print }' allowed_chromosomes.txt filtered_genomedata_renamed.filtered.bim > filtered_genomedata.renamed.bim




#awk '$1 >= 2 && $1 <= 10' filtered_genomedata_renamed.bim > tmp_bim
cut -f2 filtered_genomedata.renamed.bim > Hold_SNPs.txt

#Creating new .bim .bed .fam data for using in admixture
#ATTENTION admixture cannot use letters
echo 'Creating new files for ADMIXTURE'
plink --bfile filtered_genomedata.renamed --extract Hold_SNPs.txt --make-bed --aec --threads 30 --out filtered_genomedata_admixture

if [ $? -ne 0 ]; then
echo 'PLINK failed. Go to exit.'
exit 1
fi

#Reading PLINK data .bed .bim .fam
#Finding the best K-value for calculation
echo 'Running ADMIXTURE K2...K10'
for K in $(seq 2 10); do
echo "Finding best ADMIXTURE K value K=$K"
admixture -j30 --cv filtered_genomedata_admixture.bed $K | tee "${output_dir}/log${K}.out"
done

echo "Log data for K value done"

Second apporach:
-----------------------------------------------------
input_dir="/data/XY"
output_dir="$input_dir"

cd "$input_dir" || { echo "Input directory not found"; exit 1; }

cp filtered_genomedata.bim filtered_genomedata_work.bim
cp filtered_genomedata.bed filtered_genomedata_work.bed
cp filtered_genomedata.fam filtered_genomedata_work.fam

cat << END > chr_map.txt
CM039442.1 2
CM039443.1 3
CM039444.1 4
CM039445.1 5
CM039446.1 6
CM039447.1 7
CM039448.1 8
CM039449.1 9
CM039450.1 10
END

plink --bfile filtered_genomedata_work --aec --update-chr chr_map.txt --make-bed --out filtered_genomedata_numericchr


head filtered_genomedata_numericchr.bim
cut -f1 filtered_genomedata_numericchr.bim | sort | uniq

cut -f2 filtered_genomedata_numericchr.bim > Hold_SNPs.txt

plink --bfile filtered_genomedata_numericchr --aec --extract Hold_SNPs.txt --make-bed --threads 30 --out filtered_genomedata_admixture

if [ $? -ne 0 ]; then
  echo "PLINK failed. Exiting."
  exit 1
fi

echo "Running ADMIXTURE K2...K10"
for K in $(seq 2 10); do
  echo "Running ADMIXTURE for K=$K"
  admixture -j30 --cv filtered_genomedata_admixture.bed $K | tee "${output_dir}/log${K}.out"
done

echo "ADMIXTURE analysis completed."

I am really lost and i don´t see the problem.

Thank you for any help.

Admixture error Plink debugging coding • 484 views
ADD COMMENT
0
Entering edit mode

Hi, it would be helpful if you could give examples of the errors you are obtaining from Admixture.

Maybe try to just run update-chr on the original set of plink file and then launch Admixture with a specific value of k.

ADD REPLY

Login before adding your answer.

Traffic: 4622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6