Rna-seq for differential expression analysis
0
0
Entering edit mode
7.9 years ago

Hi,

I am a beginner in bioinformatics and I would like to learn the basic pipeline for quantifying expression of rna sequences from transcriptomic data. I am using a bacterial transcriptome for model. Here is what I have done so far:

I got the transcriptomic data off NCBI, the file was in SRA format, so I used fastq-dump in the sra toolkit to convert sra file into two fastq files (since the data was paired end).

I used bowtie2 to align the fastq files to a reference genome for the same bacteria I found off NCBI (the fasta file ends in .ffn)

I converted the sam to bam, sorted, and indexed bam file using sam tools.

I know I need to use something like bedtools to quantify the bam file, but I am not sure how exactly to go about this. Also, can someone make sure the reference genome I am using for this pipeline is correct? The link to it is ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Clostridium_difficile_630_uid57679/ and the file is NC_009089.ffn. Any help is appreciated!

Update: I was able to get the read counts using bedtools! Now I have a file in this format:

gi|126697566|ref|NC_009089.1|:1-1320    0    1320    7904


Basically it tells me the location(start 0, end 1320) and the count (7904).

How do I take this data and figure out which exact genes they relate to? I want to get the read counts for another strain of same bacteria and see which genes are differentially expressed between the two.

Thank you

diffrential-expression transcriptome • 2.8k views
0
Entering edit mode

Maybe you can try using HTSeq? That will give you the per gene expression count. However, you will have to find the GTF file for your bacteria to use it

0
Entering edit mode

Thank you for the reply, I have found the gff file for the bacteria. I will try using HTSeq now!

0
Entering edit mode