Question: how to get the geneNames or bedfile from fasta sequences
gravatar for Kai_Qi
7 weeks ago by
Chicago, IL
Kai_Qi100 wrote:


I have got a fasta file from a bedfile. Then I analyzed the fasta file and many motifs from the fasta sequeces (about 400bp each).

Now I used grep -i motif fastafile >> new_fastafile to get all the sequences that contains the motif. The structure of the new_fastafile is like this (not all the content of the head command):

$ head new_fastafile.fasta 

How can I get the coordinates or geneNames so that I can know which genes these sequences come from? Thanks,

rna-seq chip-seq gene genome • 174 views
ADD COMMENTlink modified 7 weeks ago by xiaoguang50 • written 7 weeks ago by Kai_Qi100
gravatar for xiaoguang
7 weeks ago by
xiaoguang50 wrote:

you can use ncbi-blast to realign this motif sequence to your reference.

ADD COMMENTlink written 7 weeks ago by xiaoguang50

Thanks for your reply. I did not express the situation fully. For example, I got a motif like this: GGTNNAAA, I can not blast it in NCBI. Second, I got the motif from fltered fasta files.

ADD REPLYlink written 7 weeks ago by Kai_Qi100

I think you must have one complete reference genome, otherwise you can not get the filtered fasta file. you first download local ncbi-blast program and using your reference to make database. Then you can blast your motif to your reference database to get coordinates

ADD REPLYlink written 7 weeks ago by xiaoguang50

I see what you mean. I will have a try on the advice. I have retried to get a new fasta file yesterday. I used grep to get the header of the fasta file into a csv or txt file.

The format in the output txt file is like this:


I am wondering how to convert the txt file into bed, so that I can tried bed with GTF.

I have used cut -f but does not work well.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Kai_Qi100

probably you need python or R?Maybe you should parse your coordinate to chromosome, start,end,strand. then make a six column file delimited by "\t", which contains chromosome,start,end,name,score and strand.

ADD REPLYlink written 7 weeks ago by xiaoguang50

also ,excel can help you

ADD REPLYlink written 7 weeks ago by xiaoguang50
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1135 users visited in the last hour