I run BWA for mapping pair end raw reads to CDS transcripts. Now can suggest me how to get unmapped Right and Left reads in two separate file using samtool or sambanba cammand from the output sam file.
I run BWA for mapping pair end raw reads to CDS transcripts. Now can suggest me how to get unmapped Right and Left reads in two separate file using samtool or sambanba cammand from the output sam file.
Dear Yogesh,
You should be clear with your requirements. However, very likely you need the following (using samtools):
To get the unmapped reads from a bam file:
samtools view -f 4 file.bam > unmapped.sam
To get the output in bam:
samtools view -b -f 4 file.bam > unmapped.bam
using bedtools
bedtools bamtofastq [OPTIONS] -i <BAM> -fq <FASTQ>
These simple tasks have been already discussed on biostars. A simple way to search is googling in the following manner
Can you please also suggest me how to count no. of mapped reads to each transcripts from bam file. When I am using using:samtool idstats it showing error: /home/yog/software/samtools-1.3.1/samtools idxstats 216_5W_Ca1.bam >readcount [bam_idxstats] fail to load the index.
Thanks
From your questions, you seem to be newbie for this kind of stuff. For the benefit of you, I am not going to answer this question. I strongly recommend you to first understand the "very basics" of the "very basic" tools for analysing sequencing data.
Hence, please go through this page thoroughly. Try to understand the whys and whats of basic steps like sorting and indexing a bam file.
Don't get me wrong, but your error points to a key step before you can proceed for idxstats. So, first go through the help link above and then post the problem if you encounter any.
Thanks I am trying to understand samtools flags.
But when I used to convert bam to into files R1 and R2, it show warning on screen as below I posted: /home/yog/software/bedtools2/bin/bamToFastq -i Ca2._resorted -fq Ca2_read1.fastq -fq2 Ca2_read2.fastq
but it able to complete to write R1 and R2 reads in separate files: is it ok?
WARNING: Query PC168224:127:C98A1ANXX:1:2316:21235:23135 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping. WARNING: Query PC168224:127:C98A1ANXX:1:2316:21236:13229 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping. *WARNING: Query PC168224:127:C98A1ANXX:1:2316:21237:66490 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
If I Indexing on sorted file it shows error:
/home/yog/software/samtools-1.3.1/samtools index unmap_216_5W_Ca2.resorted.bam [E::hts_idx_push] NO_COOR reads not in a single block at the end 9170 -1 samtools index: "unmap_216_5W_Ca2.resorted.bam" is corrupted or unsorted
Getting error when doing indexing of sorted bam file:
[root@psgl 216_raw_reads]# /home/yog/software/samtools-1.3.1/samtools sort -n 216_5W_Ca2.bam -O BAM -o 216_5W_Ca2.resorted.bam [bam_sort_core] merging from 14 files... [root@psgl 216_raw_reads]# /home/yog/software/samtools-1.3.1/samtools index 216_5W_Ca2.resorted.bam [E::hts_idx_push] NO_COOR reads not in a single block at the end 22591 -1 samtools index: "216_5W_Ca2.resorted.bam" is corrupted or unsorted
If I do sorting of bam fine by not using -n flag than its worting:
For sorting: /home/yog/software/samtools-1.3.1/samtools sort 216_5W_Ca1.bam -O BAM -o 216_5W_Ca1.sort.bam
For indexing: /home/yog/software/samtools-1.3.1/samtools index 216_5W_Ca1.sort.bam
For read count: /home/yog/software/samtools-1.3.1/samtools idxstats 216_5W_Ca1.sort.bam > readcount[bam_idxstats]
But read count I am getting like this:
Rs025080 1341 239 27 Rs035250 621 0 0 Rs035280 408 0 0 Rs035290 318 0 0 Rs035300 456 87 0 Rs035320 1827 498 105 Rs035330 1119 18 0 Rs035340 924 317 56 Rs000040 321 0 0 Rs000940 948 0 0
Why there is three count column for each transcript.
Please also suggest about the error:I extracted unmapped reads in bam file.
But when I used to convert bam to into files R1 and R2, it show warning on screen as below I posted:
For sorting: /home/yog/software/samtools-1.3.1/samtools sort -n unmap_216_5W_Ca1.bam -o unmap_216_5W_Ca1._resorted
For Fastq:
/home/yog/software/bedtools2/bin/bamToFastq -i Ca2._resorted -fq Ca2_read1.fastq -fq2 Ca2_read2.fastq
but it able to complete to write R1 and R2 reads in separate files: Here also for sorting I should use cammand without -n flag .is it ok?
WARNING: Query PC168224:127:C98A1ANXX:1:2316:21235:23135 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping. WARNING: Query PC168224:127:C98A1ANXX:1:2316:21236:13229 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping. *WARNING: Query PC168224:127:C98A1ANXX:1:2316:21237:66490 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
If I do sorting of bam fine by not using -n flag than its worting:
For sorting: /home/yog/software/samtools-1.3.1/samtools sort 216_5W_Ca1.bam -O BAM -o 216_5W_Ca1.sort.bam
For indexing: /home/yog/software/samtools-1.3.1/samtools index 216_5W_Ca1.sort.bam
For read count: /home/yog/software/samtools-1.3.1/samtools idxstats 216_5W_Ca1.sort.bam > readcount[bam_idxstats]
But read count I am getting like this:
Rs025080 1341 239 27 Rs035250 621 0 0 Rs035280 408 0 0 Rs035290 31 0 0 Rs035300 456 87 0
Why there is three count column for each transcript.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
This is not a tool post, it's a question. When you have questions, please mark them as questions.
By right and left, do you mean R1 and R2? Plus and minus? Please be very specific about what you wish to accomplish. Also, the reason why is helpful; I'm not sure that what you are trying to do is useful.
Thanks, These are CDS transcripts from Draft genome. After getting unmaaped reads I want to de novo assembled them so that for raw read mapping I can use CDS transcript plus unique de novo assembled transcript as a reference.
Right means R1 and left means R2 reads.
What you want to do does not make sense to me. I suggest you de-novo assemble all reads, or else just use the transcriptome. Assembling from reads whose mates are mapped, but that are not mapped, is a bad idea. You'll basically get a bunch of junk.
Hi Brian, If any of those R1and R2 shows similarity with CDS, I will not collect them for de novo assembly.
I want to do this because I found that a large no. of genes transcripts were not present in reported CDS.
But I am not sure I can collect those R1 and R2 reads from bam file.
In that case you should assemble only pairs in which both R1 and R2 are not mapped. You can gather them like this, with the BBMap package:
bbmap.sh in1=r1.fq in2=r2.fq outu=unmapped.fq ref=ref.fa