Question: How to extract only the alignments to the chloroplast reference genome of a sam file
0
gravatar for macielrodriguez2
5 months ago by
macielrodriguez220 wrote:

Hi! I’m pretty new to bioinformatics in linux :D

My lab received paired-end reads (151 bp) of WGS of purple corn and we mapped them (using bowtie2) against the maize B73v4 reference genome (against all the 10 chromosomes, mitochondria and chloroplast genomes). They got a SAM file (zm.sam) of 371,9 GB.

We used the following commands:

#Maize B73v4 reference genome: GCF_000005005.2_B73_RefGen_v4_genomic.fna

#To index the reference genome:
bowtie2-build GCF_000005005.2_B73_RefGen_v4_genomic.fna GCF_000005005.2_B73_RefGen_v4_genomic.fna

#To align the reads to the reference genome:
bowtie2 -p 10 -I 0 -X 600 --fr --very-fast-local -x ../ref/GCF_000005005.2_B73_RefGen_v4_genomic.fna -1 ZM1_R1.fastq.gz -2 ZM1_R2.fastq.gz -S zm.sam

My task is to assembly the chloroplast genome of purple maize, so I only want the alignments of the reads that mapped/aligned with the chloroplast reference genome (Name: Pltd, NC_001666.2).

Please, can someone guide me in how to extract only the alignments to the chloroplast reference genome (Name: Pltd, NC_001666.2) from that SAM file?

(I will map the reads against the chloroplast reference genome alone later, but my advisor wants me to do this first, to extract the alignments with the chloroplast reference genome from this big sam file)

I have been told to use grep, but I have looked at other posts and I don’t know if that is the best way to deal with this.

Thank you in advance for your help :)

ADD COMMENTlink modified 5 months ago by ATpoint24k • written 5 months ago by macielrodriguez220

You could use grep... but use samtools. (After compressing the sam to bam)

ADD REPLYlink written 5 months ago by swbarnes26.7k

Yes! I'm going to do that! thank you :)

ADD REPLYlink written 5 months ago by macielrodriguez220
4
gravatar for ATpoint
5 months ago by
ATpoint24k
Germany
ATpoint24k wrote:

Hi macielrodriguez2,

a couple of things:

First, do not store SAM files. They are uncompressed and only take up space. Use BAM files via:

bowtie2 (commands...) | samtools sort -o zm_sorted.bam

Second, do not align against single chromosomes. Always align against the full genome as using single chromosomes can lead to false-positive alignments. The aligner always try to find the best match and if the actual chromosome that a read belongs to is not present in the reference it will be mapped to the next best (=false) region that is included in the reference. For quick extraction of a chromsome, index the bam file and then use samtools view:

samtools index zm.bam
samtools view -o zm_chloroplast.bam zm.bam NC_001666.2
ADD COMMENTlink written 5 months ago by ATpoint24k

Oh! So, it is not convenient to map my WGS paired-end reads only against the chloroplast reference genome???? I wanted to do that because I read papers in which the authors did just that (map all WGS reads against a chloroplast reference genome :O )

Thank you so much ATpoint! I'll try that right away :D

ADD REPLYlink modified 5 months ago • written 5 months ago by macielrodriguez220

Dear macielrodriguez2, how do you map your FASTQ against a chloroplast reference genome? I have a chloroplast genome from another species (but very close with my study species) maybe I could use this genome. What do you think? Best

Mariana

ADD REPLYlink written 4 months ago by marianfasanella0

Dear ATpoint, Can I find chloroplast SNPs from my FASTQ if I do not have a reference genome? I´m working on a species with no reference genome, I used UNEAK to find SNPs but now I would like to find only chloroplast SNPs and I do not know how to do it, could you help me please? Thanks! Mariana

ADD REPLYlink written 4 months ago by marianfasanella0

Please open a new question after using google and the search function towards your issue :)

ADD REPLYlink written 4 months ago by ATpoint24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1743 users visited in the last hour