Raw Read count from bam file
2
0
Entering edit mode
7.4 years ago
Bioinfonext ▴ 460

I mapped pair end reads to CDS transcripts using BWA.

If I do sorting of bam fine by not using -n flag than its worting:

For sorting: /home/yog/software/samtools-1.3.1/samtools sort 216_5W_Ca1.bam -O BAM -o 216_5W_Ca1.sort.bam

For indexing: /home/yog/software/samtools-1.3.1/samtools index 216_5W_Ca1.sort.bam

For read count: /home/yog/software/samtools-1.3.1/samtools idxstats 216_5W_Ca1.sort.bam > readcount[bam_idxstats]

But read count I am getting like this:

Rs025080 1341 239 27

Rs035250 621 0 0

Rs035280 408 0 0

Rs035290 318 0 0

Rs035300 456 87 0

Why there is three count column for each transcripts.

RNA-Seq • 6.9k views
ADD COMMENT
1
Entering edit mode

I quick look into the manual gives you: "Retrieve and print stats in the index file. The output is TAB-delimited with each line consisting of reference sequence name, sequence length, # mapped reads and # unmapped reads".

ADD REPLY
0
Entering edit mode

How do you have # unmapped reads per chromosome xD

ADD REPLY
1
Entering edit mode

Here are two reasons: Samtools Idxstats

ADD REPLY
0
Entering edit mode

Yes the 'Read is unmapped but has the chromosome of it's mapped pair" was my first thought, but then the sum of mapped/unmapped per chromosome would be an even number - but for the last entry it is not, so i don't think that's what's going on.

I didn't know about the BWA concatenating contigs though, that's new to me - thanks for the info :) And that could certainly be what's causing it. I also didn't know the SAM spec calls for unmapped pairs to be on the same chromosome - I always thought they came at the end with all the other unmapped reads. Two biologically meaningless but bioinformatically important bits of information today :)

ADD REPLY
0
Entering edit mode

It stands for "number" in many manuals.

ADD REPLY
2
Entering edit mode

That's not what John meant, but rather "if a read is unmapped - how do you know it belongs to a certain chromosome" ;-)

ADD REPLY
0
Entering edit mode

BWA for instance assigns both the unmapped flag and a position. In case the read maps somewhere but is not fulfilling the criteria of being reported as mapped (e.g. too many mismatches).

ADD REPLY
0
Entering edit mode

So the read "almost" maps there :p

ADD REPLY
0
Entering edit mode

Here I map raw reads to a CDS transcripts of a draft genome. So there are around 30% reads of a paired-end library do not map to CDS.

But I am not sure how number were given of unmapped reads according to CDS transcripts ID.

ADD REPLY
0
Entering edit mode

Could you reformulate this question in a more explicit fashion? I am not sure if I got the point. I also have a question: when you say:

I map raw reads to a CDS transcripts

Do you mean against a transcriptome, or against a genome with a GFF/GTF/GFF3 file that indicates the CDS of the transcripts, and you are considering only those?

ADD REPLY
0
Entering edit mode

I took CDS from this database. but its genome is not completely sequenced. it's a draft genome.

http://radish-genome.org/Data_resource/

ADD REPLY
0
Entering edit mode

As far as I know, the notation CDS defines those regions of the exons that are translated, because exons also have UTRs. So you might have some RNASeq reads that belong to UTRs and are therefore not present in your CDS sequences. Could this be the case?

ADD REPLY
0
Entering edit mode
7.4 years ago
Vitis ★ 2.5k

I think to get read mapping stats you should use "samtools flagstat" or for the newer version of samtools, "samtools stats".

ADD COMMENT
0
Entering edit mode
7.4 years ago

to get read counts you can use -c option, plus all the other options you may want to use to filter the reads to count, like -F4 to count mapped reads only:

samtools view -c input.bam
ADD COMMENT

Login before adding your answer.

Traffic: 2281 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6