Question: Quantification of repeats expression
0
gravatar for valerie
22 months ago by
valerie50
valerie50 wrote:

Hi guys,

I want to calculate repeats expression in my RNAseq data. I've obtained bam files using TopHat and now I need gtf file for repeats to calculate the counts. Where can I download it?

Thanks!

rna-seq repeats ngs • 743 views
ADD COMMENTlink modified 22 months ago by Constantine210 • written 22 months ago by valerie50
2

Remember to get a GTF file that matches your genome. If your genome came from Ensembl then you need to get the GTF from ensembl. Chromosome identifiers may otherwise not match.

BTW: Repeat tracks are under "Variation and Repeats" group in UCSC table browser.

ADD REPLYlink modified 22 months ago • written 22 months ago by genomax59k
3
gravatar for Constantine
22 months ago by
Constantine210
USA
Constantine210 wrote:

Go to UCSC

https://genome.ucsc.edu/

and under Tools > Table Browser

Choose your genome and track (ideally RefSeq genes), and select "Output format: GTF"

ADD COMMENTlink modified 22 months ago • written 22 months ago by Constantine210

Thank you! I need, repeats, why RefSeq genes? Is it correct to choose 'Variation and Repeats' as group and 'RepeatMasker' as track?

ADD REPLYlink written 22 months ago by valerie50
1

Yes. See my comment above.

ADD REPLYlink written 22 months ago by genomax59k

Thank you! I used mouse mm10 genome for TopHat and will use mm10 here again.

ADD REPLYlink written 22 months ago by valerie50

Did the genome come from UCSC or Ensembl or someplace else? Also keep in mind the "multi-hits" setting for TopHat. Since you are interested in repeats that setting may affect your results significantly.

ADD REPLYlink modified 22 months ago • written 22 months ago by genomax59k

Actually I downloaded an archive with genome, Bowtie2 indexes and other files here: ftp://ussd-ftp.illumina.com/Mus_musculus/UCSC/mm10/ So it is UCSC as far as I understand

ADD REPLYlink written 22 months ago by valerie50

That is correct. So you are fine with getting the repeats GTF from UCSC.

ADD REPLYlink written 22 months ago by genomax59k

Thank you for your help!

ADD REPLYlink written 22 months ago by valerie50

Hi Valerie,

I have a similar project, working on SSR repeats. Could you please kindly tell me what is your workflow for doing the work?

ADD REPLYlink modified 22 months ago • written 22 months ago by seta1.1k

Hi Seta,

I simply use tophat2 to map the reads to reference genome. Then I sort my reads using samtools and use htseq-count to obtain counts from bam file. On this stage I needed gtf file we discussed here. Then you can the apply any normalization to counts, I prefer DeSeq. Let me know if still you have questions.

ADD REPLYlink written 22 months ago by valerie50

Hi friend, Thank you very much for your explanation. As you mentioned "repeat" in the title of your question, I thought that you have a specific way for surveying these regions. Now, I found that you follow the common way.

ADD REPLYlink modified 22 months ago • written 22 months ago by seta1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1262 users visited in the last hour