Question: [Resolved] Local alternative to galaxy "Extract Genomic DNA using coordinates" tool
0
gravatar for giroudpaul
4.1 years ago by
giroudpaul70
European Union
giroudpaul70 wrote:

Hello,

For a simple script I am writing, I need to extract the genomic data using coordinates, but I would need to do it locally on my computer.

Is the galaxy tool downloadable ? Is there an alternative ? It seems that bedtools can do something like this, but then I need the fasta for mm9 ? Where can I get this ?

Thanks

galaxy • 1.2k views
ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by giroudpaul70

Yes, getfasta of BEDtools can do it. mm9 FASTA sequence can be downloaded from UCSC.

ADD REPLYlink modified 11 days ago by RamRS25k • written 4.1 years ago by Tej Sowpati250

Is it in the mm9.2bit file ? How do I extract it ? It say to use their twoBitToFa tool, but I don't get how to install it

ADD REPLYlink written 4.1 years ago by giroudpaul70

No, you need the ChromFa.tar.gz file, which when uncompressed will give you one fasta file per chromosome. You can then create a master fasta file by concatenating all the files into one using 'cat' command.

ADD REPLYlink written 4.1 years ago by Tej Sowpati250
1
gravatar for Alex Reynolds
4.1 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

To get mm9 FASTA files via the command-line:

$ wget http://hgdownload.cse.ucsc.edu/goldenPath/mm9/bigZips/mm9.2bit
$ wget http://hgdownload.cse.ucsc.edu/admin/exe/macOSX.x86_64/twoBitToFa
$ chmod +x ./twoBitToFa
$ for i in `seq 1 19` X Y M; do echo "converting chr$i"; ./twoBitToFa -seq=chr$i mm9.2bit chr$i.fa; done

If you are using Linux, get the twoBitToFa Kent tool with the following URL:

$ wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/twoBitToFa

Install samtools. On OS X, if you have Homebrew installed, you could use brew install samtools. On Ubuntu, you might run sudo apt-get install samtools. Or on a RedHat-like Linux, you might run sudo yum install samtools.

Index the FASTA files with samtools faidx:

$ for i in `seq 1 19` X Y M; do echo "indexing chr$i"; samtools faidx chr$i.fa; done

Then query coordinates with samtools faidx. Here is a convenience Perl script I wrote that wraps around samtools, which reads stranded or unstranded BED from standard input and writes FASTA to standard output:

To use this script, e.g.:

$ ./bed2faidxsta.pl < foo.bed > foo.fa
ADD COMMENTlink modified 11 days ago by RamRS25k • written 4.1 years ago by Alex Reynolds29k
0
gravatar for Ian
4.1 years ago by
Ian5.6k
University of Manchester, UK
Ian5.6k wrote:

The following link should also be helpful: Perl To Retrieve Sequences From Ucsc Genome Browser

ADD COMMENTlink modified 11 days ago by RamRS25k • written 4.1 years ago by Ian5.6k
0
gravatar for Matt Shirley
4.1 years ago by
Matt Shirley9.2k
Cambridge, MA
Matt Shirley9.2k wrote:

pyfaidx has a script for this that is easy to install and works well: https://github.com/mdshw5/pyfaidx#cli-script-faidx

ADD COMMENTlink modified 11 days ago by RamRS25k • written 4.1 years ago by Matt Shirley9.2k
0
gravatar for swbarnes2
4.1 years ago by
swbarnes27.1k
United States
swbarnes27.1k wrote:

samtools faidx can do it too.

ADD COMMENTlink written 4.1 years ago by swbarnes27.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1950 users visited in the last hour