4
1
Entering edit mode
3.4 years ago

dear all;

Plot

RNA-Seq • 2.6k views
0
Entering edit mode

My other question is about trimming instruction in the SENSE mRNA-Seq Library Prep kit V2 protocol. I tried to trim the reads again, this time by following the instruction in the protocol mentioning the removal of 9 nucleotide from R1 and 6 nucleotides from R2 reads. So I removed 9 nucleotides from 5' side of R1 and 6 nucleotides from 3' side of of R2. But it did not make sens and the problem of adapter content graph persists. Would you please guide me how can I solve this problem? and How can trim these reads?

0
Entering edit mode

Can you post images (or full FastQC report) for a representative sample so we can see what is going on?

Also keep in mind that FastQC is not indicating "absolute failures". There are limits that FastQC author had to set for various test parameters (and they are set for normal genomic sequencing) so having a test "fail" (red X) does not automatically mean that your data is bad. You need to consider the context of your experiment when looking at FastQC results.

0
Entering edit mode

0
Entering edit mode

That is showing that your data still has some Illumina universal adapters and needs to be trimmed. Using bbduk.sh from BBMap suite try. adapters.fa file is included in the resources directory of BBMap suite. Use the correct PATH for it in command below.

bbduk.sh -Xmx1g in1=read1.fq.gz in2=read2.fq.gz out1=clean1.fq.gz out2=clean2.fq.gz ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo

0
Entering edit mode

0
Entering edit mode
3.4 years ago
GenoMax 99k

You appear to be confusing trimming Illumina adapters and the specific trimming that Lexogen has recommended. Instructions from lexogen appear to remove specific adapters/nucleotides their kit must be adding to the fragments. You must not have trimmed your reads correctly with Trimmomatic to remove the Illumina adapter. Based on that graph your reads after proper trimming will have a range of lengths (i.e. they won't remain all 101 bp).

0
Entering edit mode
3.4 years ago
michael.ante ★ 3.6k

Hi rahmati.razieh83,

Lexogen's mRNA-Seq protocol uses random-primer. AFAIK, they suggest to remove them, if you want to use Tophat2 or something similar sensitive. If you use STAR or BBmap, you can keep the complete read but you should increase the allowed numbers of mismatches.

FastQC needs a certain length to identify adapter sequences. This is apparently 12 or 13 nts. Regardless of how you trim the reads' start, the detection of adapter sequences will start at this position. Denote, the graph of the adapter content also ends before the actual cycle number is reached.

I hope this helped a bit.

Cheers,

Michael

0
Entering edit mode

thanks a lot for your comment. According to what you recommended me, I decided to keep Lxogen primer but use bbmap instead of Tophat2 to map the reads. I assembled paired end reads with trinity and now I have a trinity.fasta file. for the next step I need to map paired end reads to de novo assembled trinity.fasta file and count the reads. But I do not know what is the command to map these paired end reads by using bbmap in the way that allow numbers of mismatches as you recommended me. would you please guide me?

0
Entering edit mode

The trinity assembly does not include reads with the lexogen primer, correct? It is not clear from this post.

0
Entering edit mode

before assembly I trimmed adapters but not Lexogen primers. So I think this assembly contains Lexogen primers

0
Entering edit mode

Do you think that is a good assembly? You do not want extraneous sequences that don't belong to your genome in your assembly.

0
Entering edit mode

According to what Michael has suggested I can keep the primers. What is your suggestion?

0
Entering edit mode

I was mentioning the trimming in regard to the aligner. For de-novo assembly, I would remove the primer-sequences since they have a higher chance of incorporated mismatches. I don't know how trinity can cope with that.

0
Entering edit mode

I do not know how trinity can solve the problem of primers. So regarding the comments maybe it is better to remove primers from the reads. My question is that which software is more efficient for primer removal? according to Lexogen protocol I must remove 9 nucleotides from 5' side of R1 and 6 nucleotides from 3' side of of R2. How can I trim these primers? would you please guide me

0
Entering edit mode

You can use bbduk.sh from BBMap suite to remove specific bases from ends of the reads.

 bbduk.sh -Xmx20g in=R1.fastq.gz out=R1_trimmed.fastq.gz ftl=8


bbduk.sh -Xmx20g in=R2.fastq.gz out=R2_trimmed.fastq.gz ftr2=5

0
Entering edit mode

Many thanks for the command. these commands are for removal of specific number of nucleotides from the end of the read? Is there any other command to remove specific number of nucleotides from the start of the read as I need to remove 9 nucleotide from the 5' side of R1?

0
Entering edit mode

Yes. Two commands above are for removing 9 bases from front of Read 1 and 6 bases from end of Read 2. BBMap uses 0-based counting. Verify that correct number of bases are getting removed by commands above.

0
Entering edit mode

Thanks a lot for your help

0
Entering edit mode
3.4 years ago
chen ★ 2.1k

You can use fastp to preprocess your Illumina sequencing data (no matter RNASeq / DNASeq, no matter PE/SE). It can trim adapters automatically for both PE and SE data, which means that you don't have to input the adapter sequences.

Besides trimming adapters, this tool also performs quality filtering and other operations to improve your data quality. And most of the features are automated. All you have to do is to install fastp, and run:

fastp -i in.R1.fq.gz -I in.R2.fq.gz -o out.R1.fq.gz -O out.R2.fq.gz


This tool is very fast (written in C++, with multithreading supported), you can get it from: https://github.com/OpenGene/fastp

0
Entering edit mode

Where is fastp getting the adapter sequences from? OP is using a kit that has specific instructions about removing additional bases from front/end of reads.

0
Entering edit mode

For paired end data, adapters are removed by finding the insert length (cycles beyond insert length are known as adapters).

For single end data, the adapter can be specified in the command line, or detected automatically if not specified. I developed an algorithm to detect adapter sequence by doing a simple assembly for the high frequency last 10 bp. See my code: https://github.com/OpenGene/fastp/blob/master/src/evaluator.cpp (string Evaluator::evaluateRead1Adapter() ). The detected adapter may be a bit shorter than the real one, but it's enough to trim most adapters.