How to know if the sample from SRA is trimmed or un-trimmed
2
0
Entering edit mode
16 months ago

I am searching for a human mRNA sample on SRA database that is untrimmed, but I do not know how to check if it's trimmed or not

FASTQC NGS SRA FASTQ NCBI • 1.5k views
0
Entering edit mode

In my understanding we submit raw data in NCBI GEO database with md5sum information not trimmed data. Although, you can perform quality check to see if data is trimmed.

0
Entering edit mode
16 months ago
sc-ruzafa ▴ 10

You can use FASTQC to check if the sequences are trimmed or you need to remove the adapters, etc...

0
Entering edit mode

FastQC can be of help. If data is untrimmed then all reads will be reported as full size and will match the reported length of sequencing. Generally after trimming reads will have a distribution in FastQC read length plot since all of them may not remain full length after trimming.

Note: There is a possibility that the data has NO extraneous sequence and thus would still remain full length after trimming.

0
Entering edit mode
16 months ago
ATpoint 65k

Basically you don't. While it is convention (afaik) to upload the raw data as they come from demultiplexing, the actual uploaded data is what the authors well...uploaded, and this in theory can be anything. There is no bullet-proof way to know beside emailing them.

Though, trimming would usually result in unequal read lengths throughout the files (adapter-containing reads get trimmed, others remain untrimmed) so this is something you can check. I mean in the end it does not really matter, does it? If you want to use the public dataset you are after then you have to use what is provided, and a good QC should always start with something like fastqc to assess whether trimming for adapters or quality was necessary, so this you anyway have to do, regardless how the data have been treated by the uploader before.