Question

Aligning Paired-end file in R/Rstudio using align function

0

Entering edit mode

3.0 years ago

student • 0

Hello! I have paired-end data but when I initially converted it to Fastq (from SRA) I forgot to spit the file so I ended up with Fastq files with read length of 200 (due to both forward and rev strands being combined).

Now I've correctly redone the SRA-> Fastq conversion but I'm wondering if it is necessary? Can I just use the original file rather than the split one? For example, if I used only the forward strand for my analysis wouldn't I get less reads aligned than if I used the file with both strands combined? I don't know if this would cause issues in the function.

Here is my code: align(index="./path/full_hg19", readfile1 = "./path/Sample.fastq")

Any help would be appreciated!

Rstudio RNA-Seq R • 2.1k views

ADD COMMENT • link updated 3.0 years ago by colindaven 6.3k • written 3.0 years ago by student • 0

1

Entering edit mode

I strongly recommend to repeat the download. You can search sra-explorer.info to directly download fastq files rather than sra.

ADD REPLY • link 3.0 years ago by ATpoint 81k

0

Entering edit mode

okay thank u! I will do that

ADD REPLY • link 3.0 years ago by student • 0

score 1 · Answer 1 · 2021-04-04

1

Entering edit mode

3.0 years ago

Istvan Albert 100k

The unsplit fastq format is just an oddity of data representation, it is not really in use on a general scale. Most tools won't be able to use it properly and there is no reason to bother with it. It just adds to the complexities you already have to put up with.

get the right data and unpack it correctly

ADD COMMENT • link 3.0 years ago by Istvan Albert 100k

0

Entering edit mode

HI Istvan! Thanks for the reply! I will split the SRA properly into 2 files. However these files are large, do you think I can just use one (as in just the forward strand for example) to create the BAM file and get the counts data? I'm just trying to get an idea of if there is differential expression between Samples. I don't know how the featureCounts function works so I don't know if this would completely mess up the results?

ADD REPLY • link 3.0 years ago by student • 0

0

Entering edit mode

yes, you can use just one file as well, use the 1st file (pair 1 or read 1). Don't call it "forward strand" file though, that naming while common - is fundamentally incorrect.

as a matter of fact, the counts will be the same with single and paired reads as well as both reads in the pair measure the same fragment hence should be counted as one count.

the paired end-ness helps with alignment, though depending on the situations it may not make that much of a difference.

ADD REPLY • link 3.0 years ago by Istvan Albert 100k

score 0 · Answer 2 · 2021-04-06

0

Entering edit mode

3.0 years ago

colindaven 6.3k

Though I tend to avoid loading read files into R like the plague, apparently the Rsubread R package does a good job.

See the vignette here

https://bioconductor.org/packages/release/bioc/html/Rsubread.html

I second sra-explorer for getting reads quickly and easily.

ADD COMMENT • link 3.0 years ago by colindaven 6.3k