Question

how to get the geneNames or bedfile from fasta sequences

0

Entering edit mode

3.4 years ago

Kai_Qi ▴ 130

Hi:

I have got a fasta file from a bedfile. Then I analyzed the fasta file and many motifs from the fasta sequeces (about 400bp each).

Now I used grep -i motif fastafile >> new_fastafile to get all the sequences that contains the motif. The structure of the new_fastafile is like this (not all the content of the head command):

$ head new_fastafile.fasta 
GTAAGTGGCACCCTGCCAGAGATCCCTCTCTGCCCTGGGTCTCATGCCTTCCTTTCTGCACCTCCAGACAATTTCTGCTGCCCCTAGGTCCCAGATTTCAGCTGTCCAGATGTCCAGGCCTTTTAAAGGGTCTAGGCAGGGGGTCCTACTGCTCACACAGTCCTCCCACTGGCTGTTATGTTTAAAATCCTAACCTGGC
GTAGGTGTGGACGACAGACAGCTGGGTGGCATGAGAATGCAGGTGCCAGGCGAACTAGAGGGTGGTGCTGGGTGCGTCGTACCATCGGGAGAAGATCCCCTCCCCCTCAGCCTCTGCTGAAAGCAACAAGGGAACCCCTAAAAGAAGGGCTAAGAAGGTATGCACAAGATACTGGGTCTTCCCCAAGAATGGGGCTGGA
GTGGGTAGCCTGGGGACCCCTAGCACCCCAGCCTTCACCACCATCACCTTCATCGCCACCATTACTGCGCTCACCTCCGGCTTGATCACTCAGTGTCATCCTGTGCTGGACGCTGTGCTGGGCCACCATGCCATGTTAAGTCATCCTGCCTCTCATACCATCATCACCTTGTTCACCTGTCAGGGGAGATGTAGGGGAG

How can I get the coordinates or geneNames so that I can know which genes these sequences come from? Thanks,

RNA-Seq genome gene ChIP-Seq • 868 views

ADD COMMENT • link updated 3.4 years ago by xiaoguang ▴ 140 • written 3.4 years ago by Kai_Qi ▴ 130

score 0 · Answer 1 · 2020-12-02

0

Entering edit mode

3.4 years ago

xiaoguang ▴ 140

you can use ncbi-blast to realign this motif sequence to your reference.

ADD COMMENT • link 3.4 years ago by xiaoguang ▴ 140

0

Entering edit mode

Thanks for your reply. I did not express the situation fully. For example, I got a motif like this: GGTNNAAA, I can not blast it in NCBI. Second, I got the motif from fltered fasta files.

ADD REPLY • link 3.4 years ago by Kai_Qi ▴ 130

1

Entering edit mode

I think you must have one complete reference genome, otherwise you can not get the filtered fasta file. you first download local ncbi-blast program and using your reference to make database. Then you can blast your motif to your reference database to get coordinates

ADD REPLY • link 3.4 years ago by xiaoguang ▴ 140

0

Entering edit mode

I see what you mean. I will have a try on the advice. I have retried to get a new fasta file yesterday. I used grep to get the header of the fasta file into a csv or txt file.

The format in the output txt file is like this:

>7:127020805-127021004(-)

I am wondering how to convert the txt file into bed, so that I can tried bed with GTF.

I have used cut -f but does not work well.

ADD REPLY • link 3.4 years ago by Kai_Qi ▴ 130

0

Entering edit mode

probably you need python or R？Maybe you should parse your coordinate to chromosome, start,end,strand. then make a six column file delimited by "\t", which contains chromosome,start,end,name,score and strand.