Question: RNA seq FASTX quality trimming
0
gravatar for Rahul
3.7 years ago by
Rahul30
India
Rahul30 wrote:

Hello,

I have filtered my illumina pair-end reads (Forward lib-24 million reads, Reverse Lib 24 million) using FASTX_Quality_Filter by applying the Q20 score to 90 percent of bases. (75 bp reads, insert size 200 bp)

But after filtering, I am observing around 18 million reads in a forward library and 20 million reads in a reverse library. I can see here 2 million bases difference between two libraries. Can I use above libraries for making transcriptome assembly purpose given that the number of reads are unequal?

Regards Rahul

ADD COMMENTlink modified 3.7 years ago by Brian Bushnell16k • written 3.7 years ago by Rahul30
4
gravatar for Brian Bushnell
3.7 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

fastx-toolkit is not pair-aware and should never be used for paired reads. There are many modern tools (such as BBDuk, which I wrote) that properly handle paired reads, and will give you paired reads as output, along with singletons in which the mate was discarded.

Q20 is too high for RNA-seq filtering (or pretty much anything), anyway - that will increase the bias of your output. Trimming to, say, Q10 is a much better idea.

ADD COMMENTlink written 3.7 years ago by Brian Bushnell16k

FastX is not for paired end data, its for single end.

You can also try Cutadapt

ADD REPLYlink written 3.7 years ago by ####190
1

Can I use trimmomatic/ printseq? for pair end reads

Thanks

ADD REPLYlink written 3.7 years ago by Rahul30
2
gravatar for geek_y
3.7 years ago by
geek_y10.0k
Barcelona
geek_y10.0k wrote:

Simple answer is "Yes, you can". Just check how the program that you are going to use treats the singleton reads ( i.e 2 million extra reads in one of the file ) and how to input them.

P.S My answer was to original question, wether we can use singletons for assembly along with paired-end reads. The context ( and title ?) of the question changed later.

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by geek_y10.0k

Thank you very for much for giving comments on my query. I am using Soapdenovo trans (iplant Collaborative site) for assembling reads with default a default parameter.

I have got around 50% completeness report of CEGMA when I tried assembly (scaffolding) with trimmed and quality filter reads. On other occasion when I tried assembly with raw reads, I got 81% CEGMA completeness report.Hence, I am in confusion whether I am giving right or wrong input. After ensuring proper cleanup steps still my results are not up to the mark.

ADD REPLYlink written 3.7 years ago by Rahul30

I don't think that's the best practice, though...

ADD REPLYlink written 3.7 years ago by Brian Bushnell16k
1

I edited my answer. The original question was different. It was about using singleton reads in assembly.

ADD REPLYlink written 3.7 years ago by geek_y10.0k
2
gravatar for Antonio R. Franco
3.7 years ago by
Spain. Universidad de Córdoba
Antonio R. Franco4.2k wrote:

If using Illumina data, try to compare the results you get using fastq_quality_trimmer Maybe it will let your files synchronized with the same number of sequences just because it will make sequences shorter, preserving more sequences with high quality

ADD COMMENTlink written 3.7 years ago by Antonio R. Franco4.2k

Thanks for your valuable comments and suggestions...

A) After assembling scaffolds from the trimmed and quality filter reads, I am getting the following ratio Average_number_of_contigs_per_scaffold :-1.0

B) For the untrimmed raw reads.... Average_number_of_contigs_per_scaffold :-1.2-1.4

C) Assembly in published paper showing around... Average_number_of_contigs_per_scaffold :-1.9

I don't know whether the problem in my scaffolds is due to input reads or else.....?

Any suggestion will be highly appreciated...

Regards Rahul

ADD REPLYlink written 3.7 years ago by Rahul30

If you give some attention to the assemblathon 2 contest, you will notice that the number of contigs depends upon the source of the DNA. In Assemblathon 2 you will read that some assemblers works better with fish and not with the boa. The contrary happens with a different assembler. Source of DNA, and in particular its complexity and number of repeated sequences play a key role in the formation of contigs and scaffolds. If statistics of the "publisher paper" rely or was done with a different genome, I believe you cannot compare

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Antonio R. Franco4.2k

The publisher used Soapdenvo 2 for assembling. I am trying assembly with soapdenovo trans on same published reads with almost same parameters except the quality trimming parameters.

ADD REPLYlink written 3.7 years ago by Rahul30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1716 users visited in the last hour