How to get Gene Sequence from BAM file on Galaxy.
1
1
Entering edit mode
5.8 years ago
Pedro Morell ▴ 10

I have several BAM files from different species for the chromosome where the gene I want to study is located, so how do I select the region where this gene is supposed to be located? 

 

Thank you.

genome gene Galaxy • 2.6k views
ADD COMMENT
0
Entering edit mode

Am I correct in guessing that you want the consensus sequence of the gene given the alignments? BTW, you might want to post this on the galaxy site.

ADD REPLY
0
Entering edit mode

I have the sequence of the model my samples are mapped on, I want to do variant calling on a certain gene, but don't know how to trim my files to the region where it's supposed to be.

ADD REPLY
2
Entering edit mode
5.8 years ago

Hello,

BEDTools: Intersect BAM alignments with intervals will perform this operation. For the second input use a BED/Interval file that contains the gene location *based on the same reference genome as the first BED input. UCSC could be a source for such a BED file, along with others. See the tools under Get Data for built-in data fetching options or locate the data elsewhere and upload.

*If this not known, it is possible that the LiftOver function could be used to transform coordinates from one genome build to another (even across genomes). The output will be based on genome alignment concordance, so this is not necessarily the actual location of the gene in the target genome. That said, it can be a useful method to gene hunt. LiftOver can be used in Galaxy or at UCSC in web format or line-command using UCSC's data and tools. Only genomes from UCSC have this specific flavor of coordinate mapping data.

Another approach is to use Multiple sequence alignment (MAF) data and gene hunt from there. This is very useful if of the target genomes happen to be included in UCSC's suite of databases, but other MAF data can be used. By "gene hunt" I mean that the gene location is not known in other genomes, but the original gene of interest is, and that known gene is in a genome that has a track named Conservation at UCSC or can be found in another MAF data source that includes both genomes.

How-to access Conservation data: Extract the gene (specifically transcript(s)) in BED format from the UCSC Table Browser (a keyword search by gene name is possible). Use that as input with the tools in the group Fetch Alignments/Sequences. Some genomes have MAF data built in, but you can also upload MAFs to use with the tools (these accept input from the history, making each very flexible). The MAF does not have to come from UCSC, but must meet the MAF file format specification. 

From either approach, downstream data conversion (extract sequences, etc) is all possible within Galaxy and most on the public Main server at http://usegalaxy.org. The Galaxy Main Tool Shed has even more tools for use in a local or cloud Galaxy.

For an example of using the Fetch Alignments/Sequences tools along with some downstream manipulations, see protocol #5 in this prior publication (includes a video): https://usegalaxy.org/u/galaxyproject/p/using-galaxy-2012

Few links:

http://usegalaxy.org - public Main Galaxy server
http://usegalaxy.org/galaxy101 - Tutorial covering basic data manipulations. The first could easily be adjusted for your type of query. Convert the BAMs to BED format and perform the coordinate intersection from there.
http://galaxyproject.org - wiki, see Learn and Support for tutorials, file format help, other usage
http://genome.ucsc.edu - public UCSC Genome browser server that includes many tools and their help docs
https://biostar.usegalaxy.org - Galaxy biostars forum. This question has been asked previously in a few different ways. Maybe search and see what is helpful and ask questions about Galaxy usage if you need more assistance?

Jen, Galaxy team (who also formerly worked at UCSC :) )

ADD COMMENT

Login before adding your answer.

Traffic: 2913 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6