Question: Extracting chimeric reads from mapping
0
gravatar for Anushka
19 months ago by
Anushka20
France
Anushka20 wrote:

Hello, I am struggling to processing and analyse bam files (from bwa alignment), to extracting the chimeric read alignment. I am aligning human cell line RNA-seq data (paired end) to virus, aimed to find the viral integration sites in the genome. For that, after reading a bit here from following links like: Confirming Viruses detected from NGS data

RNA Seq analysis of virally infected cells

For find out the chimeric reads (partially mapping to one and partially to other genome),I have concatenated the two genomes (human + viral) and prepared the index using bwa index

bwa index -a bwtsw human_virus.fa

and then performed the mapping using bwa-mem and piping the output samtools of alignment to

bwa mem -R '@RG\tID:ID\tSM:SM\tPL:Illumina' index/human_virus.fa input_pair_1.fastq input_pair_2.fastq | samtools view -bS > input.bam

I have sorted and indexed the resulting input.bam (input_sorted.bam and input.bai)

Now for the post alignment processing, I am trying samtools view -f 0x04 (for unmapped reads) and flagstat to find other statistics.

My question question is how can I extract the numbers of reads that are partially mapped to either of the two genome, how many reads mapped to viral genome and which to human. Conceptually, I might not be understanding this problem, that is why I cab not implement proper samtool view options.

Plus I have approx 50 samples how could I evaluate how many reads have mapped to the viral genome in each sample.

I would really appreciate your inputs.

bwa rna-seq samtools alignment • 1.9k views
ADD COMMENTlink modified 19 months ago by Pierre Lindenbaum119k • written 19 months ago by Anushka20
1
gravatar for Pierre Lindenbaum
19 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

My question question is how can I extract the numbers of reads that are partially mapped to either of the two genome, how many reads mapped to viral genome and which to human. Conceptually, I might not be understanding this problem, that is why I cab not implement proper samtool view options.

using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html .

For paired end we're searching for reads mapping one organism, and the mate mapping another. We also look for split reads.

java -jar dist/samjdk.jar -e '
    final Set<String> virus_chrom = new HashSet<>(Arrays.asList("viral_seg1", "viral_seg2"));
    if(record.getReadUnmappedFlag()) return false;
    if(record.getReadPairedFlag() && !record.getMateUnmappedFlag())
        {
        if( virus_chrom.contains(record.getReferenceName()) != 
            virus_chrom.contains(record.getMateReferenceName())
            )
            {
            return true;
            }
        }
    for(final SAMRecord other: SAMUtils.getOtherCanonicalAlignments(record))
        {
        if( virus_chrom.contains(record.getReferenceName()) != 
            virus_chrom.contains(other.getReferenceName())
            )
            {
            return true;
            }
        }
    return false;' input.bam
ADD COMMENTlink modified 19 months ago • written 19 months ago by Pierre Lindenbaum119k

Thanks Pierre for your help . I am trying to understand the output of SamJdk, basically it has given me all he SQ tag with reference sequence name and corresponding length.

Additionally is it possible to retrieve the chimeric reads using samtools FLAG options. As in the manual states definition of a chimeric alignment is:

Typically, one of the linear alignments in a chimeric alignment is considered the “representative” alignment, and the others are called “supplementary” and are distinguished by the supplementary alignment flag. All the SAM records in a chimeric alignment have the same QNAME and the same values for 0x40 and 0x80 flags

ADD REPLYlink written 19 months ago by Anushka20
1

Additionally is it possible to retrieve the chimeric reads using samtools FLAG options

that's the purpose of SAMUtils.getOtherCanonicalAlignments . But you need some code to get the difference between virus/host.

ADD REPLYlink written 19 months ago by Pierre Lindenbaum119k

yes that make sense. Thank you so much for your help

ADD REPLYlink written 19 months ago by Anushka20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 859 users visited in the last hour