Question: list of gene or transcript IDs and their length
0
gravatar for Fereshteh
21 months ago by
Fereshteh2.7k
Fereshteh2.7k wrote:

hi,

how i can get a list of gene or transcript IDs and their length in Saccharomyces cerevisiae please?

thank yout

myposts sequence gene • 1.2k views
ADD COMMENTlink modified 21 months ago by EagleEye4.7k • written 21 months ago by Fereshteh2.7k
2
gravatar for EagleEye
21 months ago by
EagleEye4.7k
Sweden
EagleEye4.7k wrote:

A: Converting gtf format to bed format

 

If you have GTF file from gencode, above mentioned shell script should work in both gene level and transcript level.

 

 

 

 

 

 

 

ADD COMMENTlink written 21 months ago by EagleEye4.7k

yes, right, gtf file contains both

ADD REPLYlink written 21 months ago by Fereshteh2.7k
1
gravatar for Fereshteh
21 months ago by
Fereshteh2.7k
Fereshteh2.7k wrote:

with featurecounts using bam and gtf file we can get the all of gene IDs and the length but i should select coding genes among them

ADD COMMENTlink modified 21 months ago • written 21 months ago by Fereshteh2.7k
5
gravatar for trausch
21 months ago by
trausch730
Germany
trausch730 wrote:

In R using Bioconductor this should work (for ENSEMBL genes):

> library(GenomicAlignments)

> library(TxDb.Scerevisiae.UCSC.sacCer3.ensGene)

> txdb = TxDb.Scerevisiae.UCSC.sacCer3.ensGene

> txlen = transcriptLengths(txdb, with.utr5_len=TRUE, with.utr3_len=TRUE)

> head(txlen)

 

ADD COMMENTlink written 21 months ago by trausch730
1

You can also use transcripts(txdb) to get all transcript coordinates.

ADD REPLYlink written 21 months ago by Giovanni M Dall'Olio25k

thank you for your answer

ADD REPLYlink written 21 months ago by Fereshteh2.7k
1
gravatar for Prakki Rama
21 months ago by
Prakki Rama2.0k
Singapore
Prakki Rama2.0k wrote:

From ENSEMBL

ftp://ftp.ensembl.org/pub/release-83/fasta/saccharomyces_cerevisiae/cds/

From NCBI - RefSeq

paste "Saccharomyces cerevisiae"[porgn:__txid4932] in NCBI browser. Select mRNA and RefSeq for list of genes from the left side menu.

Below the search bar, click Send to, Choose destination: File, Format as: Fasta

Sequences length - You can get sequences length using a AWK oneliner:

awk '/^>/ {if (sqlen){print sqlen}; print ;sqlen=0;next; } { sqlen = sqlen +length($0)}END{print sqlen}' file.fa

 

ADD COMMENTlink modified 21 months ago • written 21 months ago by Prakki Rama2.0k

thank you very much,

i need transcript or gene IDs and the length and i don't need the fasta sequence of them

something like below i need

genesID geneslength
R0010W 1272
R0020C 1122
R0030W 546
R0040C 891
YAL069W

 315

but NCBI-refseq only has 8 IDs

ADD REPLYlink modified 21 months ago • written 21 months ago by Fereshteh2.7k
1

Check this screenshot of what I see in NCBI gene list for Saccharomyces cerevisiae  - http://imgur.com/DxeyoY1

Once you download 

awk '/^>/ {if (sqlen){print sqlen}; print ;sqlen=0;next; } { sqlen = sqlen +length($0)}END{print sqlen}' filename.fa | paste - -
ADD REPLYlink modified 21 months ago • written 21 months ago by Prakki Rama2.0k

thank you for paying attention,

i mapped my reads on cdna instead of genome fasta with bowtie2 then i need something like all of coding sequend IDs and their length

what i got with your kindly tip is like below

>gi|891176844|ref|NM_001305015.2| Saccharomyces cerevisiae S288c hypothetical protein partial mRNA
5410
>gi|891176612|ref|NM_001310667.1| Saccharomyces cerevisiae S288c hypothetical protein partial mRNA
333

 

ADD REPLYlink written 21 months ago by Fereshteh2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 813 users visited in the last hour