Hello, I want to download Aquaporin 1 Gene sequence for all the 1000 individuals from 1000 genomes project. I have tried a lot . I tried using bcf tools ,vcf tools but it gives me some error . The location for the Aquaporin 1 gene is chromosome 7: 30911853-30925516. I have first downloaded the vcf file for the particular region as :-
bcftools view -Oz -r 7:30911853-30925516 "http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.chr7.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz">aqp1.1000g.vcf.gz tabix -p vcf aqp1.1000g.vcf.gz
Then I downloaded the reference fasta sequnce from :- http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/ and named as human_ref.fa.gz.
Then I indexed fasta file as:
samtools faidx human_human_ref.fa.gz
and then build each sample's sequence by changing the reference with those variants.
#!/bin/bash for sample in `bcftools view -h aqp1.1000g.vcf.gz | grep "^#CHROM" | cut -f10-`; do bcftools view -c1 -Oz -s $sample -o 1000g.$sample.vcf.gz aqp1.1000g.vcf.gz tabix -p vcf 1000g.$sample.vcf.gz samtools faidx human_ref.fa.gz 7:30911853-30925516 | bcftools consensus 1000g.$sample.vcf.gz -o 1000g.aqp1.$sample.fa done
But this is giving me error as :-
Note: the --sample option not given, applying all records regardless of the genotype [W::fai_get_val] Reference 7:30911853-30925516 not found in FASTA file, returning empty sequence [faidx] Failed to fetch sequence in 7:30911853-30925516 Applied 0 variants