RNA_seq raw-data from open source processing
1
0
Entering edit mode
4.8 years ago
Rimma ▴ 30

Hello, can you please explain to me why when I download SRA file with RNA-seq raw data (SRR8437305) from open source there is no separation for Read1 and Read2, even though it says that it's paired-end sequencing. Moreover, when I process it in FastQC it shows me the following pic, which looks like there is 200 bases and adapter contamination at first 10 bases.

enter image description here

My question is why there is no adapter contamination on the other end and it looks like bias in the middle, so should I trim it from the middle as well? If yes, how can I do it? Also, for further steps of the analysis should I define it as paired-end or single-end since I have only one file.

Thanks a lot!

sequencing next-gen RNA-Seq • 3.0k views
ADD COMMENT
1
Entering edit mode

Those first 10 bases are not adapter contamination. This is a very well known bias seen with RNAseq datasets. See this blog entry for more information.

You don't need to do anything with data in first 10 bases. Data should align without any issues.

ADD REPLY
0
Entering edit mode
4.8 years ago
ATpoint 82k

See Fast download of FASTQ files from the European Nucleotide Archive (ENA) for downloading files. If you are going to use fastq-dump use the split-3option. Also please google adapter trimming and search for tutorials towards RNA-seq preprocessing, there are plenty out there, e.g. https://www.bioconductor.org/packages/devel/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html

ADD COMMENT
0
Entering edit mode

Yes, I used just fastq-dump... I will add split-3 option Thanks a lot!

ADD REPLY
1
Entering edit mode

It is always recommended to read the documentation first. This can save you quite some time ;-)

ADD REPLY

Login before adding your answer.

Traffic: 2685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6