Question: [Resolved] Local alternative to galaxy "Extract Genomic DNA using coordinates" tool
0
gravatar for giroudpaul
3.3 years ago by
giroudpaul50
European Union
giroudpaul50 wrote:

Hello,

For a simple script I am writing, I need to extract the genomic data using coordinates, but I would need to do it locally on my computer.

Is the galaxy tool downloadable ? Is there an alternative ? It seems that bedtools can do something like this, but then I need the fasta for mm9 ? Where can I get this ?

Thanks

galaxy • 1.0k views
ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by giroudpaul50

Yes, getfasta of BEDtools can do it. mm9 FASTA sequence can be downloaded from UCSC

ADD REPLYlink written 3.3 years ago by Tej Sowpati250

Is it in the mm9.2bit file ? How do I extract it ? It say to use their twoBitToFa tool, but I don't get how to install it

ADD REPLYlink written 3.3 years ago by giroudpaul50

No, you need the ChromFa.tar.gz file, which when uncompressed will give you one fasta file per chromosome. You can then create a master fasta file by concatenating all the files into one using 'cat' command.

ADD REPLYlink written 3.3 years ago by Tej Sowpati250
1
gravatar for Alex Reynolds
3.3 years ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

To get mm9 FASTA files via the command-line:

$ wget http://hgdownload.cse.ucsc.edu/goldenPath/mm9/bigZips/mm9.2bit
$ wget http://hgdownload.cse.ucsc.edu/admin/exe/macOSX.x86_64/twoBitToFa
$ chmod +x ./twoBitToFa
$ for i in `seq 1 19` X Y M; do echo "converting chr$i"; ./twoBitToFa -seq=chr$i mm9.2bit chr$i.fa; done

If you are using Linux, get the twoBitToFa Kent tool with the following URL:

$ wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/twoBitToFa

Install samtools. On OS X, if you have Homebrew installed, you could use brew install samtools. On Ubuntu, you might run sudo apt-get install samtools. Or on a RedHat-like Linux, you might run sudo yum install samtools.

Index the FASTA files with samtools faidx:

$ for i in `seq 1 19` X Y M; do echo "indexing chr$i"; samtools faidx chr$i.fa; done

Then query coordinates with samtools faidx. Here is a convenience Perl script I wrote that wraps around samtools, which reads stranded or unstranded BED from standard input and writes FASTA to standard output:

To use this script, e.g.:

$ ./bed2faidxsta.pl < foo.bed > foo.fa

 

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Alex Reynolds27k
0
gravatar for Ian
3.3 years ago by
Ian5.3k
University of Manchester, UK
Ian5.3k wrote:

The following link should also be helpful:
Perl To Retrieve Sequences From Ucsc Genome Browser

 

 

ADD COMMENTlink written 3.3 years ago by Ian5.3k
0
gravatar for Matt Shirley
3.3 years ago by
Matt Shirley8.9k
Cambridge, MA
Matt Shirley8.9k wrote:

pyfaidx has a script for this that is easy to install and works well: https://github.com/mdshw5/pyfaidx#cli-script-faidx

ADD COMMENTlink written 3.3 years ago by Matt Shirley8.9k
0
gravatar for swbarnes2
3.3 years ago by
swbarnes25.0k
United States
swbarnes25.0k wrote:

samtools faidx can do it too.

ADD COMMENTlink written 3.3 years ago by swbarnes25.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1295 users visited in the last hour