Question: How to extract whole gene sequence with 5' and 3' UTR from genome
0
gravatar for Bioinfonext
3.2 years ago by
Bioinfonext200
Korea
Bioinfonext200 wrote:

I found that in Radish some of the CDS (coding) gene sequences is very similar to each others so I want to extract whole gene sequence including 5' or 3' UTR from genome so that I can differentiate these genes from each other.

This is the link for Radish genome database:

http://radish-genome.org/Data_resource/

Rs.1.0 Gene :gff file Rs.1.0 CDS: coding region of genes Rs.1.0 Chromosome : complete genome seq

Orthologus gene in Arabidopsis Similar coding transcript in Radish Wox4 Rs267160, Rs429330 ANT Rso41500,Rs041540, Rs148980

Please suggest how I can do this?

genome • 1.2k views
ADD COMMENTlink modified 3.2 years ago by Dan D7.0k • written 3.2 years ago by Bioinfonext200
3
gravatar for Dan D
3.2 years ago by
Dan D7.0k
Tennessee
Dan D7.0k wrote:

Step 1:

Download and index the genome with samtools faidx:

samtools faidx [radish.fa]

Step 2:

Extract the region of interest:

samtools faidx [radish.fa] [contig_name:start_pos-end_pos]

The result will be printed to STDOUT.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Dan D7.0k

Error while running above cammand:

[root@psgl UTR_ANALYSIS]# /home/yog/software/samtools-1.3.1/samtools faidx Rs_1.0.chromosomes.fix.fasta
[root@psgl UTR_ANALYSIS]# /home/yog/software/samtools-1.3.1/samtools faidx [Rs267160:32800241-32801875]
[fai_build] fail to open the FASTA file [Rs267160:32800241-32801875]
Could not build fai index [Rs267160:32800241-32801875].fai
[root@psgl UTR_ANALYSIS]#
ADD REPLYlink modified 3.2 years ago by genomax80k • written 3.2 years ago by Bioinfonext200
[root@psgl UTR_ANALYSIS]# /home/yog/software/samtools-1.3.1/samtools faidx Rs_1.0.chromosomes.fix.fasta

[root@psgl UTR_ANALYSIS]# /home/yog/software/samtools-1.3.1/samtools faidx [Rs267160:32800241-32801875]
[fai_build] fail to open the FASTA file [Rs267160:32800241-32801875]

Could not build fai index [Rs267160:32800241-32801875].fai
ADD REPLYlink modified 3.2 years ago by genomax80k • written 3.2 years ago by Bioinfonext200

Yikes! Do not run common programs as user root. That is bad practice. You could do some serious damage to your system, if you make a mistake.

Otherwise the error is clear. Samtools is not able to open those strangely named fasta files from local directory. Provide full path if needed.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by genomax80k

Thanks,

I think there is some problem in fasta file. can you suggest how can I format fasta file.

ADD REPLYlink written 3.2 years ago by Bioinfonext200

Fasta is about the simplest file format that there is. What kind of formatting issue do you think there is with your files? Where (or what program) did you get them from?

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by genomax80k

Earlier during indexing fasta file I got problems like white spaces...but this is not the case in this file.

I am not able to understand what is the problem.

ADD REPLYlink written 3.2 years ago by Bioinfonext200

When I use vi cammand to view fa.fai file

root@psgl UTR_ANALYSIS]# vi Radish_chro.fasta.fai
R1      26309735        4       100     101
R2      43799612        26572841        100     101
R3      29132933        70810454        100     101
R4      50002108        100234721       100     101
R5      45943323        150736855       100     101
R6      53636577        197139616       100     101
R7      27187321        251312563       100     101
R8      29681327        278771762       100     101
R9      38354807        308749907       100     101
RUS00001        13014   347488273       100     101
RUS00002        56597   347501428       100     101
RUS00005        12068   347558601       100     101
RUS00007        15522   347570800       100     101
ADD REPLYlink modified 3.2 years ago by genomax80k • written 3.2 years ago by Bioinfonext200

Yes, after that when I used the below cammand: it shows error

[root@psgl UTR_ANALYSIS]# /home/yog/software/samtools-1.3.1/samtools faidx [Rs267160:32800241-32801875]

[fai_build] fail to open the FASTA file [Rs267160:32800241-32801875]

Could not build fai index [Rs267160:32800241-32801875].fai

ADD REPLYlink written 3.2 years ago by Bioinfonext200
2

@DanD omitted a crucial option from the command posted above. You need to provide the fasta file that you want to extract the interval from in your command. Can you try?

samtools faidx Radish_chro.fasta Rs267160:32800241-32801875

If you want it captured to a file then

samtools faidx Radish_chro.fasta Rs267160:32800241-32801875 > my_seq.fa

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by genomax80k

Thanks a lot, I am able to retrieve the sequence, instead of gene name Rs267160 I type the Chromosome number R5, and the position of the the sequence on the chromosome:

/home/yog/software/samtools-1.3.1/samtools faidx Radish_chro.fasta R5:32800241-32801875

ADD REPLYlink written 3.2 years ago by Bioinfonext200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1962 users visited in the last hour