QC of RNA seq
4.5 years ago
Arash

Hi colleagues , I have some problems about QC of RNA seq data.I already have some chicken RNA seq data.When I did QC on data with FastQC .I have some errors in :

1- Sequence Duplication Levels 2- Kmer Content 3- GC content Now, I have some questions :

a: The mean of the error b: How can I trim them? (My major question) C:i have to trim both pair reads separately?

I did not get well what is your doubt but, you can get information about the QC erros here: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

And I recomment using trimmomatic for trimming : http://www.usadellab.org/cms/?page=trimmomatic

Dear Farbod,

Thanks so much for your nice reply. Regarding trimmomatic , can I instal on Windows base system?

Hi,

I have not used it on Windows and there is nothing in their manual BUT in this post they used it in Windows 8.

In addition you can use the Galaxy embedded Trimmomatic from your windows system.

If the assembler of your choice is Trinity (it is Linux based) you can easily use Trimmomatic and in silico normalization just by adding their options (--trimmomatic / --normalize_reads ).

Hi, For glaxy , I have problems in file upload.so time consuming...is it the correct way? Or other way is avalaible?

Dear Arash,

Yes, you are right but you can also use FTP upload method or you can enjoy your local Galaxy, too.

But I guess (maybe) running some tools such as FASTX-toolkit (FASTA/Q Trimmer: usage: fastx_trimmer [-h] [-f N] [-l N] [-z] [-v] [-i INFILE] [-o OUTFILE]) and Trimmomatic would be easier than installing and running local Galaxy.

And by the way, according to the suggestion of many Biostars experts here, maybe you do not need any data manipulation at all !

Sorry for my stupid questions...I am a beginner ...so , I have to learn more

Not at all !

The question about how to manage the computational resources and how to do a job in a faster way is a natural question that everybody will ask sooner or later!

Take care Arash jan

Farbod Jan, may i have your email Add?

We prefer to have all discussions via the community to keep all interesting results open for everyone. Perhaps someone else can benefit from the answer as well.

You seem to be responding to wrong person but that aside yes you can use it on any Windows version since it is a java program. You could also take a look at bbduk from BBMap suite.

Take a look at the blog posts here from Dr. Simon Andrews, author of FastQC. While you provide no details of the "failures" you are noticing my guess is these posts should address most of your worries.

4.5 years ago
Asaf

Since you don't mention adapter content problem in your samples then trimming is unnecessary. Duplicates and overrepresented k-mers are expected in RNAseq data. Uneven GC content is also very common in such libraries. I think you have nothing to be worried about if you have good quality reads. Carry on the processing and make you get a nice mapping to the reference genome. Good luck!

4.5 years ago
Farbod

Dear Arash, Hi.

About the sequence duplication levels and GC content, sometimes they reflect the characteristics of species you have used (e.g duplication rate is up in most fish and plants), and about the K-mer content sometimes it is because of barcodes/adapters you have used in sequencing procedures or poor libraries.

For trimming (if needed) you can use FASTX (which is very easy to use) or Trimmomatic software (which the later one is embedded in some assembler same as Trinity, too).

Additionaly you can check the "Common reasons for warnings" part of each title in this page.

If you are using Galaxy, you can find some useful tools under "NGS: QC and manipulation".

By the way, remember to keep a backup of your data when begin to trimming them :-)

Take care.

Dear Farbod, Thanks so much!My sample is for chicken and virus