Hello Everyone, I am a computer science and working on a bioinformatics project. I want to download SCN9a Gene sequence for all the 1000 individuals from 1000 genomes project. I have been struggling a lot . I tried using SRA toolkit and other stuff but it didn't work out . Requesting you guys to hep me out. The location for the SCN9a gene is Molecular Location on chromosome 2: base pairs 166,195,185 to 166,375,987
if only fasta sequences are to be obtained, I would download all variants for that region
bcftools view -Oz -r 2:166195185-166375987 ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz > SCN9a.1000g.vcf.gz tabix -p vcf SCN9a.1000g.vcf.gz
then the reference must be downloaded to be indexed locally (pity that 1000g doesn't have that index remotely, because the reference could be queried directly rather than downloaded)
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz samtools faidx human_g1k_v37.fasta.gz
and then build each sample's sequence by changing the reference with those variants
for sample in `bcftools view -h 1000g.SCN9a.vcf.gz | grep "^#CHROM" | cut -f10-`; do bcftools view -c1 -Oz -s $sample -o 1000g.$sample.vcf.gz SCN9a.1000g.vcf.gz tabix -p vcf 1000g.$sample.vcf.gz samtools faidx human_g1k_v37.fasta.gz 2:166195185-166375987 \ | bcftools consensus 1000g.$sample.vcf.gz -o 1000g.SCN9a.$sample.fa done
to do all this you will "only" need samtools and bcftools in an unix-like environment connected to internet.
Dear you can download whole genome using SRA tool kit and and you are saying you are knowing the position; so can extract the sequences from that position. You can try some perl or other script for that. Hope it will help you.