Question

Help: RNA-seq analysis

0

Entering edit mode

7.9 years ago

lucilepain • 0

Hello everyone,

My post will probably look "basic" to many of you but either, let's go. I am really beginner (I have the foundations of the modules available via Coursera) so my terminology may be incorrect.

I have to analyze RNA-seq data in order to arrive at the top differentially expressed genes between my samples disease/health. The analyst who was in charge of performing the RNA-seq with my samples is going to reads alignment and sent me data aligned in .bed .bam and .bai format.

When i look for analysis tools I know that I must proceed to the assembly, then the quantification of the expression of the transcripts to arrive at the differential splicing and expression. I found a huge amount of tools, commands, environments to use but it's still nebulous to me. So I wanted to use Galaxy (Cufflinks-> cuffmerge-> Cuffdiff) to get my results but apparently my files are too heavy for this tool :/ Would anyone have suggestions of pipelines, worflow or even better tools / environment that are better than others to analyze this type of data? or that are used in routine in analysis? Every suggestion or advice is welcome.

Thank you very much for your help.

rna-seq • 2.0k views

ADD COMMENT • link updated 7.9 years ago by mforde84 ★ 1.4k • written 7.9 years ago by lucilepain • 0

score 2 · Accepted Answer · 2017-08-14

2

Entering edit mode

7.9 years ago

mforde84 ★ 1.4k

https://www.bioconductor.org/help/workflows/rnaseqGene/

ADD COMMENT • link 7.9 years ago by mforde84 ★ 1.4k

0

Entering edit mode

Thanks, i already read that page but what remained unclear to me, was the use of files .bed and .bai if , according to the page, only my .bam are required. Any idea?

Thank you very much for your help.

ADD REPLY • link 7.9 years ago by lucilepain • 0

1

Entering edit mode

bam files are compressed alignment files and bai files are indices on those files. you shouldn't really have to work directly with bai, though occasionally you will have to generate them explicitly for other programs to use (e.g., IGV). Bed files are interval files which group alignments and read pileups by genomic coordinates, you will use these for specific applications and I'm not entirely sure what your pipeline will need them for. Ultimately you can generate them from bam files. Also theres some general documenation of file formats from UCSC : https://genome.ucsc.edu/FAQ/FAQformat

to view bam files in plain text you can use samtools:

samtools view *.bam | head

to generate bai files you can use samtools:

samtools index *.bam

to generate bed files you can use bedtools:

bedtools bamtobed ...

ADD REPLY • link 7.9 years ago by mforde84 ★ 1.4k