How to extract whole gene sequence with 5' and 3' UTR from genome
1
0
Entering edit mode
7.3 years ago
Bioinfonext ▴ 460

I found that in Radish some of the CDS (coding) gene sequences is very similar to each others so I want to extract whole gene sequence including 5' or 3' UTR from genome so that I can differentiate these genes from each other.

This is the link for Radish genome database:

http://radish-genome.org/Data_resource/

Rs.1.0 Gene :gff file Rs.1.0 CDS: coding region of genes Rs.1.0 Chromosome : complete genome seq

Orthologus gene in Arabidopsis Similar coding transcript in Radish Wox4 Rs267160, Rs429330 ANT Rso41500,Rs041540, Rs148980

Please suggest how I can do this?

genome • 2.2k views
ADD COMMENT
3
Entering edit mode
7.3 years ago
Dan D 7.4k

Step 1:

Download and index the genome with samtools faidx:

samtools faidx [radish.fa]

Step 2:

Extract the region of interest:

samtools faidx [radish.fa] [contig_name:start_pos-end_pos]

The result will be printed to STDOUT.

ADD COMMENT
0
Entering edit mode

Error while running above cammand:

[root@psgl UTR_ANALYSIS]# /home/yog/software/samtools-1.3.1/samtools faidx Rs_1.0.chromosomes.fix.fasta
[root@psgl UTR_ANALYSIS]# /home/yog/software/samtools-1.3.1/samtools faidx [Rs267160:32800241-32801875]
[fai_build] fail to open the FASTA file [Rs267160:32800241-32801875]
Could not build fai index [Rs267160:32800241-32801875].fai
[root@psgl UTR_ANALYSIS]#
ADD REPLY
0
Entering edit mode
[root@psgl UTR_ANALYSIS]# /home/yog/software/samtools-1.3.1/samtools faidx Rs_1.0.chromosomes.fix.fasta

[root@psgl UTR_ANALYSIS]# /home/yog/software/samtools-1.3.1/samtools faidx [Rs267160:32800241-32801875]
[fai_build] fail to open the FASTA file [Rs267160:32800241-32801875]

Could not build fai index [Rs267160:32800241-32801875].fai
ADD REPLY
0
Entering edit mode

Yikes! Do not run common programs as user root. That is bad practice. You could do some serious damage to your system, if you make a mistake.

Otherwise the error is clear. Samtools is not able to open those strangely named fasta files from local directory. Provide full path if needed.

ADD REPLY
0
Entering edit mode

Thanks,

I think there is some problem in fasta file. can you suggest how can I format fasta file.

ADD REPLY
0
Entering edit mode

Fasta is about the simplest file format that there is. What kind of formatting issue do you think there is with your files? Where (or what program) did you get them from?

ADD REPLY
0
Entering edit mode

Earlier during indexing fasta file I got problems like white spaces...but this is not the case in this file.

I am not able to understand what is the problem.

ADD REPLY
0
Entering edit mode

When I use vi cammand to view fa.fai file

root@psgl UTR_ANALYSIS]# vi Radish_chro.fasta.fai
R1      26309735        4       100     101
R2      43799612        26572841        100     101
R3      29132933        70810454        100     101
R4      50002108        100234721       100     101
R5      45943323        150736855       100     101
R6      53636577        197139616       100     101
R7      27187321        251312563       100     101
R8      29681327        278771762       100     101
R9      38354807        308749907       100     101
RUS00001        13014   347488273       100     101
RUS00002        56597   347501428       100     101
RUS00005        12068   347558601       100     101
RUS00007        15522   347570800       100     101
ADD REPLY
0
Entering edit mode

Yes, after that when I used the below cammand: it shows error

[root@psgl UTR_ANALYSIS]# /home/yog/software/samtools-1.3.1/samtools faidx [Rs267160:32800241-32801875]

[fai_build] fail to open the FASTA file [Rs267160:32800241-32801875]

Could not build fai index [Rs267160:32800241-32801875].fai

ADD REPLY
2
Entering edit mode

@DanD omitted a crucial option from the command posted above. You need to provide the fasta file that you want to extract the interval from in your command. Can you try?

samtools faidx Radish_chro.fasta Rs267160:32800241-32801875

If you want it captured to a file then

samtools faidx Radish_chro.fasta Rs267160:32800241-32801875 > my_seq.fa

ADD REPLY
0
Entering edit mode

Thanks a lot, I am able to retrieve the sequence, instead of gene name Rs267160 I type the Chromosome number R5, and the position of the the sequence on the chromosome:

/home/yog/software/samtools-1.3.1/samtools faidx Radish_chro.fasta R5:32800241-32801875

ADD REPLY

Login before adding your answer.

Traffic: 1777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6