Question: downloading bam files of phase 3 of 1000 genomes project
1
gravatar for Ana
10 months ago by
Ana130
Ana130 wrote:

Hello everyone,

I am new in 1000 genomes project data. I want to download all bam files belonging to phase3, can anyone guide me how can I download all of them (from the command line?). Do you have any estimation how long it is going to take?

I want to compute the depth of coverage only for some specific intervals, not the entire genome. Is there any way that I could do it without downloading the data? I could find this, but not sure if it is relevant to what I want to do?

samtools view -b  ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/HG01375/alignment/HG01375.mapped.ILLUMINA.bwa.CLM.low_coverage.20120522.bam 2:1,000,000-2,000,000 | genomeCoverageBed -ibam stdin -bg > coverage.bg

I would appreciate if anyone could guide me.

ADD COMMENTlink modified 10 months ago by Pierre Lindenbaum117k • written 10 months ago by Ana130
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/

data has moved

ADD REPLYlink written 10 months ago by Pierre Lindenbaum117k
3
gravatar for Pierre Lindenbaum
10 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum117k wrote:

you wrote:

samtools view -b  ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/HG01375/alignment/HG01375.mapped.ILLUMINA.bwa.CLM.low_coverage.20120522.bam 2:1,000,000-2,000,000 |  (...)

you want:

 samtools -bu view 'http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG01375/alignment/HG01375.mapped.ILLUMINA.bwa.CLM.low_coverage.20120522.bam' "2:1000000-2000000" | (...)
ADD COMMENTlink written 10 months ago by Pierre Lindenbaum117k

Hi Pierre, Thanks, so you mean I can use you command above without downloading the bam files? Can I also run it through loops for all of the bam files? there are 2504 individuals

ADD REPLYlink modified 10 months ago • written 10 months ago by Ana130

Thanks, so you mean I can use you command above without downloading the bam files?*

yes, nevertheless the index is downloaded (*.bai)

Can I also run it through loops for all of the bam files? there are 2504 individuals

http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-7.html

ADD REPLYlink written 10 months ago by Pierre Lindenbaum117k

to download the data, I just directly typed in ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/ but could not download it!

ADD REPLYlink written 10 months ago by Ana130

get the paths from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/current.tree

ADD REPLYlink written 10 months ago by Pierre Lindenbaum117k

For the loop I am trying this, but still I get warning message "no such file or directory"

 for file in http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG*/alignment/*.bam;
    do /data/programs/samtools-1.3.1/samtools view -c "${file}" 2:1000000-2000000
    done

Am I doing something wrong here?

ADD REPLYlink modified 10 months ago • written 10 months ago by Ana130

try:

 wget -q -O - "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/current.tree" | cut -f 1 | grep '.bam$' | while read B; do echo -n "$B " && ~/packages/samtools/samtools view -c "http://ftp.1000genomes.ebi.ac.uk/vol1/$B" "2:1000000-2000000"  && rm *.bam.bai ; done
ADD REPLYlink written 10 months ago by Pierre Lindenbaum117k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 823 users visited in the last hour