Entering edit mode
6 weeks ago
Jens
•
0
Hello everyone,
I want to analyse my genomic data. I already created the .bim .bed and .fam files from PLINK. But for Admixture I have to renamed my chromsome names:
CM039442.1 --> 2
CM039443.1 --> 3
CM039444.1 --> 4
CM039445.1 --> 5
CM039446.1 --> 6
CM039447.1 --> 7
CM039448.1 --> 8
CM039449.1 --> 9
CM039450.1 --> 10
I tried a lot of different approaches, but eather i got invalid chr names, empty .bim files, use integers, no variants remeining or what ever. I would show you two of my approaches, i don´t know how to solve this problem.
The new file is always not accepted by Admixture.
One of my code approaches is followed:
input_dir="/data/XY"
output_dir="$input_dir"
#Go to directory
cd "$input_dir" || { echo "Input not found"; exit 1; }
#Copy old .bim .bed .fam
cp filtered_genomedata.bim filtered_genomedata_renamed.bim
cp filtered_genomedata.bed filtered_genomedata_renamed.bed
cp filtered_genomedata.fam filtered_genomedata_renamed.fam
#Renaming old chromosome names to simple 1, 2, 3 ... (1 = ChrX = 51)
#FS=field seperator
#"\t" seperate only with tabulator
#OFS=output field seperator
#echo 'Renaming chromosomes in .bim file'
#awk 'BEGIN{FS=OFS="\t"; map["CM039442.1"]=2; map["CM039443.1"]=3; map["CM039444.1"]=4; map["CM039445.1"]=5; map["CM039446.1"]=6; map["CM039447.1"]=7; map["CM039448.1"]=8; map["CM039449.1"]=9; map["CM039450.1"]=10;}
#{if ($1 in map) $1 = map[$1]; print }' filtered_genomedata_renamed.bim > tmp && mv tmp filtered_genomedata_renamed.bim
#Creating a list of allowed chromosomes (2 to 10)
#END as a label in .txt
cat << END > allowed_chromosomes.txt
CM039442.1 2
CM039443.1 3
CM039444.1 4
CM039445.1 5
CM039446.1 6
CM039447.1 7
CM039448.1 8
CM039449.1 9
CM039450.1 10
END
#Names of the chromosomes and their numbers
#2 CM039442.1 2
#3 CM039443.1 3
#4 CM039444.1 4
#5 CM039445.1 5
#6 CM039446.1 6
#7 CM039447.1 7
#8 CM039448.1 8
#9 CM039449.1 9
#10 CM039450.1 10
#Second filter with only including chromosomes (renamed ones)
#NR=the running line number across all files
#FNR=the running line number only in the current file
echo 'Starting second filtering'
awk 'NR==FNR { chrom[$1]; next } ($1 in chrom)' allowed_chromosomes.txt filtered_genomedata_renamed.bim > filtered_genomedata_renamed.filtered.bim
echo 'Renaming chromosomes in .bim file'
awk 'BEGIN{FS=OFS="\t"} NR==FNR { map[$1]=$2; next } { if ($1 in map) $1=map[$1]; print }' allowed_chromosomes.txt filtered_genomedata_renamed.filtered.bim > filtered_genomedata.renamed.bim
#awk '$1 >= 2 && $1 <= 10' filtered_genomedata_renamed.bim > tmp_bim
cut -f2 filtered_genomedata.renamed.bim > Hold_SNPs.txt
#Creating new .bim .bed .fam data for using in admixture
#ATTENTION admixture cannot use letters
echo 'Creating new files for ADMIXTURE'
plink --bfile filtered_genomedata.renamed --extract Hold_SNPs.txt --make-bed --aec --threads 30 --out filtered_genomedata_admixture
if [ $? -ne 0 ]; then
echo 'PLINK failed. Go to exit.'
exit 1
fi
#Reading PLINK data .bed .bim .fam
#Finding the best K-value for calculation
echo 'Running ADMIXTURE K2...K10'
for K in $(seq 2 10); do
echo "Finding best ADMIXTURE K value K=$K"
admixture -j30 --cv filtered_genomedata_admixture.bed $K | tee "${output_dir}/log${K}.out"
done
echo "Log data for K value done"
Second apporach:
-----------------------------------------------------
input_dir="/data/XY"
output_dir="$input_dir"
cd "$input_dir" || { echo "Input directory not found"; exit 1; }
cp filtered_genomedata.bim filtered_genomedata_work.bim
cp filtered_genomedata.bed filtered_genomedata_work.bed
cp filtered_genomedata.fam filtered_genomedata_work.fam
cat << END > chr_map.txt
CM039442.1 2
CM039443.1 3
CM039444.1 4
CM039445.1 5
CM039446.1 6
CM039447.1 7
CM039448.1 8
CM039449.1 9
CM039450.1 10
END
plink --bfile filtered_genomedata_work --aec --update-chr chr_map.txt --make-bed --out filtered_genomedata_numericchr
head filtered_genomedata_numericchr.bim
cut -f1 filtered_genomedata_numericchr.bim | sort | uniq
cut -f2 filtered_genomedata_numericchr.bim > Hold_SNPs.txt
plink --bfile filtered_genomedata_numericchr --aec --extract Hold_SNPs.txt --make-bed --threads 30 --out filtered_genomedata_admixture
if [ $? -ne 0 ]; then
echo "PLINK failed. Exiting."
exit 1
fi
echo "Running ADMIXTURE K2...K10"
for K in $(seq 2 10); do
echo "Running ADMIXTURE for K=$K"
admixture -j30 --cv filtered_genomedata_admixture.bed $K | tee "${output_dir}/log${K}.out"
done
echo "ADMIXTURE analysis completed."
I am really lost and i don´t see the problem.
Thank you for any help.
Hi, it would be helpful if you could give examples of the errors you are obtaining from Admixture.
Maybe try to just run
update-chr
on the original set of plink file and then launch Admixture with a specific value of k.