Question: how to get the geneNames or bedfile from fasta sequences
0
gravatar for Kai_Qi
7 weeks ago by
Kai_Qi100
Chicago, IL
Kai_Qi100 wrote:

Hi:

I have got a fasta file from a bedfile. Then I analyzed the fasta file and many motifs from the fasta sequeces (about 400bp each).

Now I used grep -i motif fastafile >> new_fastafile to get all the sequences that contains the motif. The structure of the new_fastafile is like this (not all the content of the head command):

$ head new_fastafile.fasta 
GTAAGTGGCACCCTGCCAGAGATCCCTCTCTGCCCTGGGTCTCATGCCTTCCTTTCTGCACCTCCAGACAATTTCTGCTGCCCCTAGGTCCCAGATTTCAGCTGTCCAGATGTCCAGGCCTTTTAAAGGGTCTAGGCAGGGGGTCCTACTGCTCACACAGTCCTCCCACTGGCTGTTATGTTTAAAATCCTAACCTGGC
GTAGGTGTGGACGACAGACAGCTGGGTGGCATGAGAATGCAGGTGCCAGGCGAACTAGAGGGTGGTGCTGGGTGCGTCGTACCATCGGGAGAAGATCCCCTCCCCCTCAGCCTCTGCTGAAAGCAACAAGGGAACCCCTAAAAGAAGGGCTAAGAAGGTATGCACAAGATACTGGGTCTTCCCCAAGAATGGGGCTGGA
GTGGGTAGCCTGGGGACCCCTAGCACCCCAGCCTTCACCACCATCACCTTCATCGCCACCATTACTGCGCTCACCTCCGGCTTGATCACTCAGTGTCATCCTGTGCTGGACGCTGTGCTGGGCCACCATGCCATGTTAAGTCATCCTGCCTCTCATACCATCATCACCTTGTTCACCTGTCAGGGGAGATGTAGGGGAG

How can I get the coordinates or geneNames so that I can know which genes these sequences come from? Thanks,

rna-seq chip-seq gene genome • 174 views
ADD COMMENTlink modified 7 weeks ago by xiaoguang50 • written 7 weeks ago by Kai_Qi100
0
gravatar for xiaoguang
7 weeks ago by
xiaoguang50
xiaoguang50 wrote:

you can use ncbi-blast to realign this motif sequence to your reference.

ADD COMMENTlink written 7 weeks ago by xiaoguang50

Thanks for your reply. I did not express the situation fully. For example, I got a motif like this: GGTNNAAA, I can not blast it in NCBI. Second, I got the motif from fltered fasta files.

ADD REPLYlink written 7 weeks ago by Kai_Qi100
1

I think you must have one complete reference genome, otherwise you can not get the filtered fasta file. you first download local ncbi-blast program and using your reference to make database. Then you can blast your motif to your reference database to get coordinates

ADD REPLYlink written 7 weeks ago by xiaoguang50

I see what you mean. I will have a try on the advice. I have retried to get a new fasta file yesterday. I used grep to get the header of the fasta file into a csv or txt file.

The format in the output txt file is like this:

>7:127020805-127021004(-)

I am wondering how to convert the txt file into bed, so that I can tried bed with GTF.

I have used cut -f but does not work well.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Kai_Qi100

probably you need python or R?Maybe you should parse your coordinate to chromosome, start,end,strand. then make a six column file delimited by "\t", which contains chromosome,start,end,name,score and strand.

ADD REPLYlink written 7 weeks ago by xiaoguang50

also ,excel can help you

ADD REPLYlink written 7 weeks ago by xiaoguang50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1135 users visited in the last hour
_