How to get Fasta Sequences with their gene name for many genes?
2
2
Entering edit mode
9.4 years ago
seta ★ 1.9k

Hi all,

I have a list of many gene name (about 5000) and I want to retrieve their corresponding fasta sequence. To this end, I try to use biomart, but it give me just nucleotide sequence. In fact, we cannot select more than one option in the "Attributes" part and we choose "sequence" in this part, so I cannot have their associated gene name for these sequences. Could you explain me how to have both nucleotide sequence and their related gene name. any suggestion would be highly appreciated.

gene sequence • 5.5k views
ADD COMMENT
3
Entering edit mode
9.4 years ago

I think this query is what you are looking for. You just need to select the gene name in the header information for the returned data file.

ADD COMMENT
2
Entering edit mode
9.4 years ago
Manvendra Singh ★ 2.2k

you can have your gene in bed file (if not then download genes.gtf and convert it to genes.bed

first coloumn 'Chr' , second coloumn 'Start' third coloumn 'End' and then 'strand'

if you have whole genome fasta e.g. genome.fa

just use bedtools to extract fasta

getFastaFromBed -fi genome.fa -bed genes.bed -fo genes.fasta.out

details are here

http://bedtools.readthedocs.org/en/latest/content/tools/getfasta.html

ADD COMMENT

Login before adding your answer.

Traffic: 2842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6