Question: list of gene or transcript IDs and their length
0
gravatar for F
2.3 years ago by
F3.0k
Iran
F3.0k wrote:

hi,

how i can get a list of gene or transcript IDs and their length in Saccharomyces cerevisiae please?

thank yout

myposts sequence gene • 1.7k views
ADD COMMENTlink modified 2.3 years ago by EagleEye5.0k • written 2.3 years ago by F3.0k
2
gravatar for EagleEye
2.3 years ago by
EagleEye5.0k
Sweden
EagleEye5.0k wrote:

A: Converting gtf format to bed format

 

If you have GTF file from gencode, above mentioned shell script should work in both gene level and transcript level.

 

 

 

 

 

 

 

ADD COMMENTlink written 2.3 years ago by EagleEye5.0k

yes, right, gtf file contains both

ADD REPLYlink written 2.3 years ago by F3.0k
1
gravatar for F
2.3 years ago by
F3.0k
Iran
F3.0k wrote:

with featurecounts using bam and gtf file we can get the all of gene IDs and the length but i should select coding genes among them

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by F3.0k
5
gravatar for trausch
2.3 years ago by
trausch910
Germany
trausch910 wrote:

In R using Bioconductor this should work (for ENSEMBL genes):

> library(GenomicAlignments)

> library(TxDb.Scerevisiae.UCSC.sacCer3.ensGene)

> txdb = TxDb.Scerevisiae.UCSC.sacCer3.ensGene

> txlen = transcriptLengths(txdb, with.utr5_len=TRUE, with.utr3_len=TRUE)

> head(txlen)

 

ADD COMMENTlink written 2.3 years ago by trausch910
1

You can also use transcripts(txdb) to get all transcript coordinates.

ADD REPLYlink written 2.3 years ago by Giovanni M Dall'Olio26k

thank you for your answer

ADD REPLYlink written 2.3 years ago by F3.0k
1
gravatar for Prakki Rama
2.3 years ago by
Prakki Rama2.1k
Singapore
Prakki Rama2.1k wrote:

From ENSEMBL

ftp://ftp.ensembl.org/pub/release-83/fasta/saccharomyces_cerevisiae/cds/

From NCBI - RefSeq

paste "Saccharomyces cerevisiae"[porgn:__txid4932] in NCBI browser. Select mRNA and RefSeq for list of genes from the left side menu.

Below the search bar, click Send to, Choose destination: File, Format as: Fasta

Sequences length - You can get sequences length using a AWK oneliner:

awk '/^>/ {if (sqlen){print sqlen}; print ;sqlen=0;next; } { sqlen = sqlen +length($0)}END{print sqlen}' file.fa

 

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Prakki Rama2.1k

thank you very much,

i need transcript or gene IDs and the length and i don't need the fasta sequence of them

something like below i need

genesID geneslength
R0010W 1272
R0020C 1122
R0030W 546
R0040C 891
YAL069W

 315

but NCBI-refseq only has 8 IDs

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by F3.0k
1

Check this screenshot of what I see in NCBI gene list for Saccharomyces cerevisiae  - http://imgur.com/DxeyoY1

Once you download 

awk '/^>/ {if (sqlen){print sqlen}; print ;sqlen=0;next; } { sqlen = sqlen +length($0)}END{print sqlen}' filename.fa | paste - -
ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Prakki Rama2.1k

thank you for paying attention,

i mapped my reads on cdna instead of genome fasta with bowtie2 then i need something like all of coding sequend IDs and their length

what i got with your kindly tip is like below

>gi|891176844|ref|NM_001305015.2| Saccharomyces cerevisiae S288c hypothetical protein partial mRNA
5410
>gi|891176612|ref|NM_001310667.1| Saccharomyces cerevisiae S288c hypothetical protein partial mRNA
333

 

ADD REPLYlink written 2.3 years ago by F3.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1588 users visited in the last hour