Question: Quantification of repeats expression
0
gravatar for valerie
18 months ago by
valerie50
valerie50 wrote:

Hi guys,

I want to calculate repeats expression in my RNAseq data. I've obtained bam files using TopHat and now I need gtf file for repeats to calculate the counts. Where can I download it?

Thanks!

rna-seq repeats ngs • 557 views
ADD COMMENTlink modified 18 months ago by Constantine210 • written 18 months ago by valerie50
2

Remember to get a GTF file that matches your genome. If your genome came from Ensembl then you need to get the GTF from ensembl. Chromosome identifiers may otherwise not match.

BTW: Repeat tracks are under "Variation and Repeats" group in UCSC table browser.

ADD REPLYlink modified 18 months ago • written 18 months ago by genomax51k
3
gravatar for Constantine
18 months ago by
Constantine210
USA
Constantine210 wrote:

Go to UCSC

https://genome.ucsc.edu/

and under Tools > Table Browser

Choose your genome and track (ideally RefSeq genes), and select "Output format: GTF"

ADD COMMENTlink modified 18 months ago • written 18 months ago by Constantine210

Thank you! I need, repeats, why RefSeq genes? Is it correct to choose 'Variation and Repeats' as group and 'RepeatMasker' as track?

ADD REPLYlink written 18 months ago by valerie50
1

Yes. See my comment above.

ADD REPLYlink written 18 months ago by genomax51k

Thank you! I used mouse mm10 genome for TopHat and will use mm10 here again.

ADD REPLYlink written 18 months ago by valerie50

Did the genome come from UCSC or Ensembl or someplace else? Also keep in mind the "multi-hits" setting for TopHat. Since you are interested in repeats that setting may affect your results significantly.

ADD REPLYlink modified 18 months ago • written 18 months ago by genomax51k

Actually I downloaded an archive with genome, Bowtie2 indexes and other files here: ftp://ussd-ftp.illumina.com/Mus_musculus/UCSC/mm10/ So it is UCSC as far as I understand

ADD REPLYlink written 18 months ago by valerie50

That is correct. So you are fine with getting the repeats GTF from UCSC.

ADD REPLYlink written 18 months ago by genomax51k

Thank you for your help!

ADD REPLYlink written 18 months ago by valerie50

Hi Valerie,

I have a similar project, working on SSR repeats. Could you please kindly tell me what is your workflow for doing the work?

ADD REPLYlink modified 18 months ago • written 18 months ago by seta1000

Hi Seta,

I simply use tophat2 to map the reads to reference genome. Then I sort my reads using samtools and use htseq-count to obtain counts from bam file. On this stage I needed gtf file we discussed here. Then you can the apply any normalization to counts, I prefer DeSeq. Let me know if still you have questions.

ADD REPLYlink written 17 months ago by valerie50

Hi friend, Thank you very much for your explanation. As you mentioned "repeat" in the title of your question, I thought that you have a specific way for surveying these regions. Now, I found that you follow the common way.

ADD REPLYlink modified 17 months ago • written 17 months ago by seta1000
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1424 users visited in the last hour