Question: downloading bam files of phase 3 of 1000 genomes project
1
gravatar for Ana
14 months ago by
Ana170
Ana170 wrote:

Hello everyone,

I am new in 1000 genomes project data. I want to download all bam files belonging to phase3, can anyone guide me how can I download all of them (from the command line?). Do you have any estimation how long it is going to take?

I want to compute the depth of coverage only for some specific intervals, not the entire genome. Is there any way that I could do it without downloading the data? I could find this, but not sure if it is relevant to what I want to do?

samtools view -b  ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/HG01375/alignment/HG01375.mapped.ILLUMINA.bwa.CLM.low_coverage.20120522.bam 2:1,000,000-2,000,000 | genomeCoverageBed -ibam stdin -bg > coverage.bg

I would appreciate if anyone could guide me.

ADD COMMENTlink modified 14 months ago by Pierre Lindenbaum120k • written 14 months ago by Ana170
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/

data has moved

ADD REPLYlink written 14 months ago by Pierre Lindenbaum120k
3
gravatar for Pierre Lindenbaum
14 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

you wrote:

samtools view -b  ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/HG01375/alignment/HG01375.mapped.ILLUMINA.bwa.CLM.low_coverage.20120522.bam 2:1,000,000-2,000,000 |  (...)

you want:

 samtools -bu view 'http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG01375/alignment/HG01375.mapped.ILLUMINA.bwa.CLM.low_coverage.20120522.bam' "2:1000000-2000000" | (...)
ADD COMMENTlink written 14 months ago by Pierre Lindenbaum120k

Hi Pierre, Thanks, so you mean I can use you command above without downloading the bam files? Can I also run it through loops for all of the bam files? there are 2504 individuals

ADD REPLYlink modified 14 months ago • written 14 months ago by Ana170

Thanks, so you mean I can use you command above without downloading the bam files?*

yes, nevertheless the index is downloaded (*.bai)

Can I also run it through loops for all of the bam files? there are 2504 individuals

http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-7.html

ADD REPLYlink written 14 months ago by Pierre Lindenbaum120k

to download the data, I just directly typed in ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/ but could not download it!

ADD REPLYlink written 14 months ago by Ana170

get the paths from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/current.tree

ADD REPLYlink written 14 months ago by Pierre Lindenbaum120k

For the loop I am trying this, but still I get warning message "no such file or directory"

 for file in http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG*/alignment/*.bam;
    do /data/programs/samtools-1.3.1/samtools view -c "${file}" 2:1000000-2000000
    done

Am I doing something wrong here?

ADD REPLYlink modified 14 months ago • written 14 months ago by Ana170

try:

 wget -q -O - "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/current.tree" | cut -f 1 | grep '.bam$' | while read B; do echo -n "$B " && ~/packages/samtools/samtools view -c "http://ftp.1000genomes.ebi.ac.uk/vol1/$B" "2:1000000-2000000"  && rm *.bam.bai ; done
ADD REPLYlink written 14 months ago by Pierre Lindenbaum120k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1306 users visited in the last hour