Question

Help with my differential expression analysis (homework)

0

Entering edit mode

3.8 years ago

gatoxos • 0

Hi! I would like some advice ... please I have to do the job in Bioconductor, with the edgeR package.

I have to do a differential expression analysis job ... compare the expression of two samples of the Aedes aegypti mosquito (my teacher provided us with two fastq files) Until now, I have already used Burrows-wheeler aligner to generate an Index with a reference genome (from the same Aedes, I got it from www.vectorbase.org) and align both fastq files by the "sampe" method, I got a SAM file. It looks like this :

According to the edgeR manual, I need to have a count table that contains the genes and the count, but I don't know how to generate such a count table.

I tried the Rsubread package, but it looks very different from the one used in the example:

I asked the teacher, but he said that the problem is that I need a tool that works or works to create count tables for miRNA or ncRNA .... but I can't find a suitable one. Could you help me?

I also saw out there that it can be done in Excel ... could you give me some manual?

RNA-Seq alignment fastq Bioconductor • 1.1k views

ADD COMMENT • link updated 3.8 years ago by h.mon 35k • written 3.8 years ago by gatoxos • 0

1

Entering edit mode

Try using STAR for your alignment! For the count tables, Rsubreads is perfect for the job, just take a look into the manual, it's pretty easy! I'll leave my code here, maybe it works for you...

library(Rsubread)
files <-c("path/to/file_1.bam", "path/to/file_2.bam", "path/to/file_3.bam", "path/to/file_4.bam", "path/to/file_5.bam", "path/to/file_6.bam")
counts <- featureCounts(files,annot.ext="/home/public/RNA_Seq/GenRef/Homo_sapiens.GRCh38.88.chr.gtf",isGTFAnnotationFile=TRUE,isPairedEnd=TRUE,reportReads=TRUE, nthreads=16, tmpDir="/home/bruno/RNA_seq/temp_files/")
count <- counts$counts
colnames(count) <- c("name_1","name_2","name_3","name_4", "name_5", "name_6")
annot <- counts$annotation
condition <- c("SS","SS","SS","SC","SC","SC")
colData <- data.frame(
  row.names = colnames(count),
  condition = condition,
  samples = colnames(count)
)

Just keep in mind that this part is suitable for my project (Sickle cell disease), you need to read the manual and see what you need to change for your reality.

EdgeR is also pretty easy to use, the manual contains everything that you need for your job! Just read it carefully.

Hope it helps!

ADD REPLY • link 3.8 years ago by brunobsouzaa ▴ 830

1

Entering edit mode

That is a good catch.

If you have a genome reference, you should use a program like TopHat2, STAR, HISAT2, etc.

If you have a transcriptome alignment, then you could use Bowtie2 or BWA (or quantify reads without aligning, with a program like Salmon or Kallisto).

That said, my understanding that this is for a homework assignment. It is not appropriate for us to complete the homework assignment for you, since you need the experience of doing this on your own to set appropriate expectations in the future.

One caveat could be if the teacher was wrong, or another could be that they were not providing enough support for the class. The former could require assistance from the community (but is hopefully rare). The later is probably an issue that would need to be brought up with the graduate school (either in terms of the teacher or you picking a program that is the best fit for yourself).

If the teacher can't answer the question, is there a teaching assistant?

ADD REPLY • link 3.8 years ago by Charles Warden 8.2k

1

Entering edit mode

I agree with you... Hope OP realizes that my code will NOT work for what he is intended but, will force him to at least read the documentation and do a more deep search into the subject!

ADD REPLY • link 3.8 years ago by brunobsouzaa ▴ 830

score 3 · Answer 1 · 2020-06-24

3

Entering edit mode

3.8 years ago

Charles Warden 8.2k

In general, I don't think Biostars is supposed to be providing support for specific homework questions.

I can't see the images that you tried to provide, but what you quantify depends upon what annotations you use.

Also, there are separate kits for miRNA-Seq, and even that may require additional pre-processing (such as read trimming). In other words, if you have miRNA-Seq samples, then I agree that you need to use miRNA annotations for counts. However, if you don't have miRNA-Seq data, then I think you are really only supposed to be quantifying precursor miRNA at best (and I think those tend to have more variation / noise than other genes). In other words, the insert size is important for what you can quantify.

ADD COMMENT • link 3.8 years ago by Charles Warden 8.2k

1

Entering edit mode

Upvote for the first sentence.

ADD REPLY • link 3.8 years ago by e.rempel ★ 1.1k

0

Entering edit mode

Also, I can see the pictures now. So, thank you for fixing that.

ADD REPLY • link 3.8 years ago by Charles Warden 8.2k