Question

how to incorporate the gene ids/in tophat query

0

Entering edit mode

7.0 years ago

Mirza.Jawad • 0

I have read file, GRCh38 reference genome indexes, and using tophat. I got the bam file in result then i used 'Samtools view -h -o out.sam in.bam' and got .sam file but my sam file is having accession ids like 'NM:i:2' which is not like NM_000014.4 which i found from transcriptome alignment. so are these (NM:i:2 etc) acession ids are valuable. secondly for transcriptome i used to extract transcriptome accession ids then i run counter code. but for genome alignment where i can get the list of acession ids from? because all I found the file containing scaffolds of genome which number is 557. and i know that gene acession ids should be greater than 20000. Need quick response please.

alignment sequencing RNA-Seq • 1.4k views

ADD COMMENT • link 7.0 years ago by Mirza.Jawad • 0

1

Entering edit mode

Need quick response please.

Your question is no more a priority than others.

using tophat

Tophat has been replaced by HISAT2, or you could use STAR, both of which are perform far better than Tophat.

Your question is unclear on how specifically you've done your counting, which is most likely where you're getting problems. I'd suggest you amend your OP with the code you used for your counting.

ADD REPLY • link 7.0 years ago by andrew.j.skelton73 6.5k

1

Entering edit mode

Your question is hard to read and I'm not sure I interpret everything correctly.

'NM:i:2' which is not like NM_000014.4 which i found from transcriptome alignment. so are these (NM:i:2 etc) acession ids are valuable.

You are looking at the NM tag in the sam record, which has nothing to do with accession IDs.

secondly for transcriptome i used to extract transcriptome accession ids then i run counter code. but for genome alignment where i can get the list of acession ids from?

Can you elaborate on this part? Unclear.

That being said, you should know that Tophat is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap,...

ADD REPLY • link 7.0 years ago by WouterDeCoster 47k

0

Entering edit mode

thanks to all of you to give me time from your precious routines. I am going towards HISAT2, StringTie, ballgow.

ADD REPLY • link 7.0 years ago by Mirza.Jawad • 0

0

Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your post but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLY • link 7.0 years ago by WouterDeCoster 47k

score 1 · Answer 1 · 2017-04-13

1

Entering edit mode

7.0 years ago

Devon Ryan 104k

NM:i:2 is not an accession number from NCBI, it's a SAM auxiliary tag indicating 2 mismatches (NM is "number of mismatches"). There are no accession numbers in your SAM or BAM file. Do the following:

Delete the SAM file.
Use featureCounts with the BAM file.
Done

The output of featureCounts will be a text file with the number of reads mapping per feature that you're interested in (the transcripts).

ADD COMMENT • link 7.0 years ago by Devon Ryan 104k

1

Entering edit mode

already well-articulated answer but then I will still say. Why tophat? I feel sad when efforts of benchmarking papers really get unnoticed or even when labs or PI's do not read such papers to give an idea of the latest evolution in the field of RNA-Seq .

ADD REPLY • link 7.0 years ago by ivivek_ngs ★ 5.2k