Question: denovo assembly of unmapped reads
0
gravatar for ashish
5 weeks ago by
ashish160
ashish160 wrote:

I have RNA-seq data of a wheat translocation line which has its chromosome short arm replaced by another wild variety chromosome short arm. There is no genomic or transcriptomic information available for that wild variety. I want to see what kind of genes are expressing from the new short arm. For this I was thinking if I can do denovo assembly of the unmapped reads and then blast the assembled transcripts with nr database. I want to know if this strategy is correct or not or if there is a better way to do it.

update: number of unmapped read pairs is around 8 million, total reads were around 33 million and I used the following parameters in hisat2

hisat2 -x wheat_hisat_index -1 13R_R1.fastq -2 13R_R2.fastq -S 13.sam -p 12 --mp 8,4 --rdg 7,5 --rfg 7,5 --un-conc 13_unmapped_pairs
rna-seq • 226 views
ADD COMMENTlink modified 4 weeks ago • written 5 weeks ago by ashish160

Provided you have an adequate amount of reads left over to do a reasonable assembly. What fraction of data has been left over? You are only going to capture genes that are unique for that short arm.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by genomax37k

I also have the same doubt. Currently the job is running, Soon I will update the number of unmapped reads left and my aim is to capture only the unique genes.

ADD REPLYlink written 4 weeks ago by ashish160

I've updated the question.

ADD REPLYlink written 4 weeks ago by ashish160

Good point, I gave you my answer below but it holds only as long as @genomax's point is fulfilled.

ADD REPLYlink written 4 weeks ago by Macspider1.6k

To the updated question:

your mismatch penalty seems to be quite high compared to what the threshold for alignment scores is if you don't touch the scoring function.

Increasing penalties for gaps and mismatches obviously results in more unmapped reads.

ADD REPLYlink written 4 weeks ago by Macspider1.6k

I did not understand the --score-min parameter very well from the manual. Can you suggest me some better values?

ADD REPLYlink written 4 weeks ago by ashish160
1

There is no such thing as "better values". Parameters have to be chosen according to your data. From the manual:

--score-min <func> : Sets a function governing the minimum alignment score needed for an alignment to be onsidered "valid" (i.e. good enough to report). This is a function of read length. For instance, specifying L,0,-0.6 sets the minimum core function f to f(x) = 0 + -0.6 * x, where x is the read length. See also: [setting function options]. The default is L,0,-0.2.

What is your maximum read length after trimming? Let's say it's 100 nt for example. In this function, this would be multiplied by the third parameter, and after the multiplication the second parameter will be added. The final value is the threshold for the alignment score that is applied to keep or discard a mapping result of a read. That score is negative and every time you have a mismatch or a gap the algorithm will subtract the penalties that you specified for gaps and mismatches.

I think you should calculate the alignment score threshold in a way that you get what you want. For example, if you want to call variants between strains of the same species, you might allow very few mismtaches because you know that the genomes are very similar. If you're mapping reads from different species to a reference, you might allow more gaps/mismtaches because biologically speaking they're expected.

ADD REPLYlink written 4 weeks ago by Macspider1.6k

Thanks Macspider, I got it.

ADD REPLYlink written 4 weeks ago by ashish160
1
gravatar for Macspider
4 weeks ago by
Macspider1.6k
Vienna - BOKU
Macspider1.6k wrote:

I think the best way to proceed would be to:

  • map your RNASeq reads against the wheat genome with stringent criteria with hisat2 or bowtie2 using:
  --mp <int>,<int>   max and min penalties for mismatch; lower qual = lower penalty <6,2>
  --rdg <int>,<int>  read gap open, extend penalties (5,3)
  --rfg <int>,<int>  reference gap open, extend penalties (5,3)
  --score-min <func> min acceptable alignment score w/r/t read length
  
  • save the unmapped reads to a file using:
  --un <path>           write unpaired reads that didn't align to <path>
  --al <path>           write unpaired reads that aligned at least once to <path>
  --un-conc <path>      write pairs that didn't align concordantly to <path>
  --al-conc <path>      write pairs that aligned concordantly at least once to <path>
  
  • assemble those unmapped reads with trinity or any other RNASeq read assembler
  • run a quantification method
  • run a differential expression method :)
ADD COMMENTlink written 4 weeks ago by Macspider1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1072 users visited in the last hour