Question

How to work with long read (Pacbio) Data from SRA

0

Entering edit mode

17 months ago

Thomas • 0

Hi everyone, I am a non-bioinformatician (more precisely pharmaceutical scientist) with some minor experience in short read RNAseq analysis but no experience whatsoever with long read Sequencing. At the moment, I am trying to characterize full length isoform expression under certain treatment conditions, for which in my general understanding long read RNAseq is the gold standard. I was therefore trying to download some long read (Pacbio) data from the SRA (SRP091981) but got confused pretty quickly: There are many different runs associated with each treatment condition, even though the original paper (https://doi.org/10.1038/s41467-016-0008-7 supplementary table 5) only specifies 3 different libraries that were prepared, one for each treatment condition.

So my question is: Is it normal for long read data to be separated like this? Why? How do you properly download this data properly or prepare it for downstream analysis, i.e. mapping, after you have downloaded the single files?

Please excuse me if this is a stupid question or has been answered before. I genuinely couldn't find anything.

RNAseq Longread • 863 views

ADD COMMENT • link updated 17 months ago by GenoMax 142k • written 17 months ago by Thomas • 0

score 1 · Answer 1 · 2022-11-29

1

Entering edit mode

17 months ago

GenoMax 142k

If you look at this project in SRA Run Selector (I have sorted the table so you see PacBio data at top) you can see that there were multiple runs of these three libraries. Take a look at the metadata table and see if it makes sense. There are multiple multiple sequencers involved in this set.

As for downloading the data you can either get fastq files via sra-explorer (sra-explorer : find SRA and FastQ download URLs in a couple of clicks ) or get the original PacBio format data submitted by using Data Access tab for each run record. One example here.

ADD COMMENT • link 17 months ago by GenoMax 142k

0

Entering edit mode

Thank you so much. So if I understand you correctly, these are just replicate measurements from the same library? That I can basically treat like technical replicates?