Question

BWA mem for many RNA.seq reads align to reference

0

Entering edit mode

3.3 years ago

Duminda ▴ 10

Hi all,

I want to use bwa mem to align many RNA-seq reads into a reference genome. I am a beginner and your support is appreciated.

RNA-Seq alignment genome • 2.7k views

ADD COMMENT • link 3.3 years ago by Duminda ▴ 10

0

Entering edit mode

Thank you so much.. I will check more information. I am trying to make a draft genome by bwa mem using RNA.seq data.. I was trying this command; "bwa mem ref.fa reads.fq > aln.sam" I was confused with "reads.fq", just only 1 read or can I put many reads here..? However, I hope I got it... I will try to make a 1 read file combining all reads..

ADD REPLY • link 3.3 years ago by Duminda ▴ 10

0

Entering edit mode

You cannot make a genome out of RNA-seq, only a transcriptome. bwa-mem is in any case the wrong tool, it is a non-splice aware aligner. Aligners so to say "put back reads to a reference genome", but you need de novo assembly. You should google for de novo transcriptome assembly pipelines (not genome). Not my field at all but I hear people using something like trinity: https://github.com/trinityrnaseq/trinityrnaseq/wiki Be sure to read previous threads as well, this has for sure been asked before.

ADD REPLY • link 3.3 years ago by ATpoint 81k

0

Entering edit mode

thank you... your comment is really helpful

ADD REPLY • link 3.3 years ago by Duminda ▴ 10

score 2 · Answer 1 · 2020-12-27

I want to use bwa mem to align many RNA-seq reads into a reference genome.

I don't think that's the recommended aligner for this purpose. Please have a look at STAR.

I am a beginner and your support is appreciated.

Please show us what you tried. Many tools are well documented, and we rather help with a specific question. Let us know if something doesn't work, and show the command you used and error you get.

score 1 · Answer 2 · 2020-12-27

Hello, I am actually tempted to close this one since there seems to be no effort shown into reading existing threads and literature. Instead I will just link some good literature that you can go through, then hopefully being able to solve this yourself or ask a specific question:

RNA-seq for starters: https://www.annualreviews.org/doi/abs/10.1146/annurev-biodatasci-072018-021255

Edit: As OP seems to aim for assembly, see for example trinity: https://github.com/trinityrnaseq/trinityrnaseq/wiki

RNA-seq workflow at Bioconductor: https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html

Some buzzwords to google: Traditional alignment vs pseudo- and selective alignment. Involved tools: hisat2, STAR, featureCounts, HTSeq, salmon, kallisto, tximport. hisat2 and STAR are aligners, salmon and kallisto are selective- or pseudoaligners. All have extensive documentation including recommended syntax to run from command line.

Some tools for differential expression: DESeq2, edgeR, limma, sleuth (the latter downstream of kallisto), all have extensive manuals, the first three are part of the Bioconductor project, all are in R.

Some other buzzwords: Splice-awareness (this is respecting splice junctions during alignment, bwa-mem does not do that). Gene-level versus transcript-level differential analysis.

What you don't need unless you want to do transcript-level differential analysis (and even then there are probably more recent workflows): https://www.nature.com/articles/nprot.2016.095

As your post does not contain details, please first go through all that content, and if then something is unclear please comment.