Question

RNA-seq low alignment rate/ please help with the parameters

0

Entering edit mode

4.5 years ago

Max • 0

Hi, I'm totaly new to this field so I'm greatful for any tips. Also please excuse any mistakes.

I've gotten human RNA-seq raw data for analysis. The experiment has been performed on an Illumina Instrument in paired end mode (hence two files per sample) with the NuGEN reagents and should be rRNA depleted and directional.

first question: nevertheless the information that it's directional I made a strandness test which suggests the usage of an unstranded protocol.

The alignment with HISAT2 in paired end mode with the parameter "second strand" (thats what I read to be used for NuGen) yields:

16666260 reads; of these:
  16666260 (100.00%) were paired; of these:
    10321799 (61.93%) aligned concordantly 0 times
    6052815 (36.32%) aligned concordantly exactly 1 time
    291646 (1.75%) aligned concordantly >1 times
    ----
    10321799 pairs aligned concordantly 0 times; of these:
      31463 (0.30%) aligned discordantly 1 time
    ----
    10290336 pairs aligned 0 times concordantly or discordantly; of these:
      20580672 mates make up the pairs; of these:
        19949833 (96.93%) aligned 0 times
        473079 (2.30%) aligned exactly 1 time
        157760 (0.77%) aligned >1 times
40.15% overall alignment rate

With TopHat it yields:

Left reads:
          Input     :  16666260
           Mapped   :   6172883 (37.0% of input)
            of these:    472887 ( 7.7%) have multiple alignments (97823 have >20)
Right reads:
          Input     :  16666260
           Mapped   :   6172883 (37.0% of input)
            of these:    472887 ( 7.7%) have multiple alignments (97823 have >20)
37.0% overall read mapping rate.

Aligned pairs:   6172883
     of these:    472887 ( 7.7%) have multiple alignments
                  230048 ( 3.7%) are discordant alignments
35.7% concordant pair alignment rate.

I used trimmomatic for careful trimming in order to get better results: but the improvement is only 1%. Do you suggest to do some stronger trimming?

because of the strandness test I tried to run it as unstranded with HISAT2: 
16666260 reads; of these:
  16666260 (100.00%) were paired; of these:
    10321799 (61.93%) aligned concordantly 0 times
    6052815 (36.32%) aligned concordantly exactly 1 time
    291646 (1.75%) aligned concordantly >1 times
    ----
    10321799 pairs aligned concordantly 0 times; of these:
      31463 (0.30%) aligned discordantly 1 time
    ----
    10290336 pairs aligned 0 times concordantly or discordantly; of these:
      20580672 mates make up the pairs; of these:
        19949833 (96.93%) aligned 0 times
        473079 (2.30%) aligned exactly 1 time
        157760 (0.77%) aligned >1 times
40.15% overall alignment rate

Now, do these percentages indicate, that the data is

poor? Or is there something else I can do?

also: I month later I got a second batch of data: with the following message:

"...requested two lanes of sequencing. I actually stopped the second lane as after the first batch of sequences came out one smaple was massively over-represented. I remeasured everything (same results), and re-pooled with significantly lower mix of this one sample, it didn't seem to change things too much unfortunately. But this is why there was a delay in the delivery of the second lane."

do you think these are replicates? Or do they belong together, so that I have to combine the data so that 4 files represent one sample?

many thanks for any help in this mess!

RNA-Seq rna-seq • 2.4k views

ADD COMMENT • link updated 4.5 years ago by V ▴ 380 • written 4.5 years ago by Max • 0

0

Entering edit mode

NuGEN reagents and should be rRNA depleted and directional.

Since you have only ~40% alignment it is very likely that you still have rRNA in your samples (assuming that you have no rRNA in your reference genome). If you do have rRNA in your reference then you will want to grab a sample of reads that do not align and blast them at NCBI to make sure you don't have contamination in your data. You can also align your data to human rDNA repeat to see if you are able to detect RNA.

If you do see rRNA then the kit may not have worked as advertised and you will be within your right to contact the facility to see what they can do to help.

I made a strandness test which suggests the usage of an unstranded protocol.

Which test?

I month later I got a second batch of data:

Are these the same samples run again or biological replicates that were processed separately (which would be a bad idea).

I remeasured everything (same results), and re-pooled with significantly lower mix of this one sample, it didn't seem to change things too much unfortunately.

That does not sound very good. Your samples may not be of very good quality (I am going to give the facility benefit of doubt that they know what they are doing).

do you think these are replicates?

You will know/need to find that out.

ADD REPLY • link 4.5 years ago by GenoMax 141k

score 1 · Answer 1 · 2019-10-19

How big were your fragments? It was Xbp-paired end sequencing? If you did something like 100bp or 75bp sequencing you can attempt to trim them more (e.g down to 50bp) to see if it somehow improves it.

Based from what you are writing below about the comment the sequencing facility said about the second batch I have to assume that you submitted low quality (or low quantity - or both) of total RNA. If they pooled the same amounts of starting RNA, and somehow one sample in the pooling manages to grossly over-amplify it indicates to me that there is something in the other samples that is incompatible with the chemistry and doesnt let them amplify appropriately.

When isolating total RNA do you run it on a bioanalyser to see if you get good peaks? If not then you definitely should. The sequencing facility should probably do this as well after they construct libraries.

My suggestion would be maybe try a different total RNA isolation kit. For low quantity RNA samples (like FACS sorted cells) I would have a look at the RNA isolation kits from Zymo research.

Try overtrimming slightly more to see if it improves your stats. If not I would go back to the wet lab protocols and try to refine them. Hope this helps.