Concensus from 1000 genome project
0
0
Entering edit mode
21 months ago
Peerzada • 0

Hello, I want to download Aquaporin 1 Gene sequence for all the 1000 individuals from 1000 genomes project. I have tried a lot . I tried using bcf tools ,vcf tools but it gives me some error . The location for the Aquaporin 1 gene is chromosome 7: 30911853-30925516. I have first downloaded the vcf file for the particular region as :-

bcftools view -Oz -r 7:30911853-30925516 "http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.chr7.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz">aqp1.1000g.vcf.gz
tabix -p vcf aqp1.1000g.vcf.gz

Then I downloaded the reference fasta sequnce from :- http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/ and named as human_ref.fa.gz.

Then I indexed fasta file as:

samtools faidx human_human_ref.fa.gz

and then build each sample's sequence by changing the reference with those variants.

 #!/bin/bash

for sample in `bcftools view -h aqp1.1000g.vcf.gz | grep "^#CHROM" | cut -f10-`; do 
  bcftools view -c1 -Oz -s $sample -o 1000g.$sample.vcf.gz aqp1.1000g.vcf.gz
  tabix -p vcf 1000g.$sample.vcf.gz
  samtools faidx human_ref.fa.gz 7:30911853-30925516 | bcftools consensus 1000g.$sample.vcf.gz -o 
  1000g.aqp1.$sample.fa
done

But this is giving me error as :-

Note: the --sample option not given, applying all records regardless of the genotype
[W::fai_get_val] Reference 7:30911853-30925516 not found in FASTA file, returning empty sequence
[faidx] Failed to fetch sequence in 7:30911853-30925516
Applied 0 variants
bcftools 1000genomes • 705 views
ADD COMMENT
0
Entering edit mode

it may be important for the reference sequence names to exactly match e.g. both should say either chr7 or just 7

ADD REPLY
0
Entering edit mode

I used the chr 7 for both the files and the error now comes as :

Note: the --sample option not given, applying all records regardless of the genotype
Warning: Sequence "chr7" not in 1000g.HG00111.vcf.gz
Applied 0 variants

Note: the --sample option not given, applying all records regardless of the genotype
Warning: Sequence "chr7" not in 1000g.HG00112.vcf.gz
Applied 0 variants

Error is for all samples

ADD REPLY

Login before adding your answer.

Traffic: 2643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6