Question

How to compare bulk RNA seq data demultiplexed in different ways?

0

Entering edit mode

28 days ago

bioinfo ▴ 150

Hello,

I was trying to demultiplex some bulk RNA seq data and I was trying a few different parameters.

I compared two ways and then i did a scatterplot of the gene tpm values after aligning with kallisto and assigning gene names with tximport. I am trying to understand how to interpret the scatter plot so I have attached it here. To me that looks like the data tends to be quite similar but there are a few outliers. Does that seem right? Also, how can I add the gene names for the genes that are on the bottom of the graph which seem to be quite different between the 2 ways?

This is the code for how I prepared the scatter plot:

# Extract the column names 
way1_columns <- grep("^way1", colnames(merged_tpm), value = TRUE)
way2_columns <- grep("^way2", colnames(merged_tpm), value = TRUE)

# Loop through the columns and generate plots
for (i in seq_along(way1_columns)) {
  way1 <- way1_columns[i]
  way2 <- way2_columns[i]
  wd_number <- gsub("way1", "", adapters)  # Extract WD number from column name

  plot <- ggplot(data = merged_tpm, aes(x = log(.data[[way1]]), y = log(.data[[way2]]))) +
    geom_point() +
    labs(title = paste("Scatter Plot ", wd_number), 
         x = paste("Log", way1), 
         y = paste("Log", way2))+
    stat_cor(method = "pearson")

enter image description here

RNA-seq • 297 views

ADD COMMENT • link updated 24 days ago by Ram 43k • written 28 days ago by bioinfo ▴ 150

0

Entering edit mode

demultiplex some bulk RNA seq data and I was trying a few different parameters

Can you clarify? Are you demultiplexing on the indexes included in the sample or something else?

ADD REPLY • link 28 days ago by GenoMax 141k

0

Entering edit mode

I am demultiplexing on the indexes. I was just having issues with the adapters removal. The software is supposed to remove the adapter but when I checked fastqc there still seems to be adapters there. I tried another sequence (regular adapter sequence+ 3 more bases) and then fastqc was showing that the adapters were removed. I think the results seem pretty similar and I was thinking of using the second sequence to demultiplex. I get a 0.1% increase in alignment rate for some of my samples when using the second adapter sequence.

ADD REPLY • link 28 days ago by bioinfo ▴ 150

0

Entering edit mode

I am demultiplexing on the indexes. I was just having issues with the adapters removal.

Illumina indexes and adapters have not much to do with each other as far as demultiplexing goes. Illumina indexes are read in separate cycle(s) when compared to main sequencing cycles. So at best you will recover some additional data when you allow for errors in the indexes but that should not affect adapter detection and removal.

ADD REPLY • link 28 days ago by GenoMax 141k

0

Entering edit mode

It does seem that specifying the second adapter sequence removes the adapters as show on fastqc. Do you think that I should not trust when fastqc shows that there are still adapters with the original sequence? The adapters are specified on the samplesheet so the adapter trimming is done while generating fastq files. Also, do you think the data on the scatterplot look similar enough?

ADD REPLY • link 28 days ago by bioinfo ▴ 150

0

Entering edit mode

Also, how can I add the gene names for the genes that are on the bottom of the graph which seem to be quite different between the 2 ways?

plotly allows you to convert your static plots to interactive ones. It is handy in many ways, for example you will not have to worry about overlapping labels.

library(plotly)
ggplotly(plot)

ADD REPLY • link 28 days ago by Haci ▴ 680