Question: No alignment, only header in sam file
0
gravatar for ss
9 months ago by
ss0
ss0 wrote:

I am currently working with highthroughput sequencing of the genome and have generated a SAM file.

However, all of my lines have the @sq and SN body tag, which, according to the SAMTools website is only supposed to be in the Header lines of the file.

Does anyone know why this happened and why my file does not have an alignment section even though I did align sequences using the BWA program?

Thank you in advance for your help!

UPDATE based on question:

  • First, I concatenated two files together: a file with the reference genome and a file with the RNA splice junctions using the cat function.
  • Second, I indexed this concatenated file using the bwa index -a bwtsw reference.fa command
  • Third, I took this indexed file and aligned it to the read sequence using the bwa aln function and bwa samse -n4 function
  • Fourth, I applied the java code to find the start and end coordinates of the exons and apply them
  • Fifth, I tried to use the Picard MarkDuplicates method but then got the error that I "cannot copy duplicates to the SAMdictionary." I opened the SAM file that was inputted into this method, and it had only @sq lines with SN tags which means that they are part of the header. There was no alignment section and I am not sure why.

This is the code from the fourth part - I am actually using someone else's source code for this and am not completely sure what this code means. So here it is

import java.util.ArrayList;
import java.util.Arrays;




public class convertCoordinates {

    private static String convertLine(String line) {
        // TODO Auto-generated method stub
        //System.out.println(line);
        String[] fields = line.split("\t");
        //System.out.println(fields[2]);
        String[] chrom = fields[2].split("[-]");
        //System.out.println(Arrays.toString(chrom));
        //System.out.println(line.charAt(1));
        if(line.charAt(0) == '@'){
            return line;
        }
        else if ( fields[5].equals("*") && ( fields[2].length() != 1 || Integer.parseInt(fields[3]) != 0 ) ){
            //  System.out.println("malformed read");
                //return (fields[0] + "malformed read");
                //System.exit(0);
                fields[2] = "*";
                fields[3] = "0";
                String newline = Arrays.toString(fields).replace(", ", "\t").replaceAll("[\\[\\]]", "");
                return newline;

            }
        //else if (chrom.length == 7 && )
        else if (chrom.length == 1) {
            return line;
        }
        //if(chrom.length != 6){
        //  return line;
        //}
        else if(chrom.length == 100){

            int start1 = Integer.parseInt(chrom[1]);
            int stop1 = Integer.parseInt(chrom[2]);
            int start2 = Integer.parseInt(chrom[3]);
            int beginning = Integer.parseInt(fields[3]);
            int read1start = start1 + beginning - 1;
            //System.out.println(read1start);
            int read1length = stop1 - read1start + 1;
            int Nlength = start2-stop1-1;
            String cigar = fields[5];
            String newcigar = "";


            //System.out.println(cigarletters[1]);

            if(cigar.equals("*")){
                newcigar = "*";
            }
            else if (read1length <= 0) {
                newcigar = cigar;
                //read1start = start2 - (read1length + 1);
                read1start = start2 + beginning - stop1 + start1 - 2;
                //System.out.println(read1start);
            }
            else{
                String[] cigarnumsTmp = cigar.split("[MIDNSHP]");
                int[] cigarnums = new int[cigarnumsTmp.length];
                for(int i = 0; i < cigarnumsTmp.length; i++){
                    cigarnums[i] = Integer.parseInt(cigarnumsTmp[i]);
                }
                String cigarlettersTmp[] = cigar.split("[0-9]+");//.toString().toCharArray();
                String cigarletters[] = Arrays.copyOfRange(cigarlettersTmp, 1, cigarlettersTmp.length);
                int currentpos = 0;
                boolean found = false;

                for(int i = 0; i < cigarnums.length; i++){
                    if(!cigarletters[i].equals("I") && !cigarletters[i].equals("S") && !cigarletters[i].equals("H")){
                        //System.out.println("adding cignum" + cigarnums[i] + " for " + );
                        currentpos += cigarnums[i];
                    }

                    if(currentpos > read1length && !found){
                        found = true;
                        int len1 = cigarnums[i] - (currentpos - read1length);
                        int len2 = cigarnums[i] - len1;
                        newcigar = newcigar + String.valueOf(len1) + cigarletters[i] + String.valueOf(Nlength) + "N" + String.valueOf(len2 + cigarletters[i]);
                    }
                    else{
                        newcigar = newcigar + String.valueOf(cigarnums[i]) + cigarletters[i];
                    }
                }
            }                
            String sep = "\t";
            String newLine = fields[0] + sep + fields[1] + sep + chrom[0] + sep + read1start + sep + fields[4] + sep + newcigar + sep;
            for(int i = 6; i < fields.length-1; i++){
                newLine += fields[i]+sep;
            }
            newLine += fields[fields.length-1];

            return newLine;

        }
        else{

            String newline = parseMultiexon(fields,chrom);
            return newline;


        }
    }



    private static String parseMultiexon(String[] fields, String[] chrom) {
        ArrayList<Integer> exonStarts = new ArrayList<Integer>();
        ArrayList<Integer> exonEnds = new ArrayList<Integer>();
        ArrayList<Integer> cmLength = new ArrayList<Integer>();
        //ArrayList<Integer> exonLength = new ArrayList<Integer>();

        int readStartRel = Integer.parseInt(fields[3]);
        String newcigar = "";

        /*
         * get numbers and characters from cigar string
         */
        String cigar = fields[5];
        String[] cigarnumsTmp = cigar.split("[MIDNSHP]");
        int[] cigarnums = new int[cigarnumsTmp.length];
        int cigarsum = 0;
        for(int i = 0; i < cigarnumsTmp.length; i++){
            cigarnums[i] = Integer.parseInt(cigarnumsTmp[i]);
            cigarsum += cigarnums[i]; 
        }
        String cigarlettersTmp[] = cigar.split("[0-9]+");//.toString().toCharArray();
        String cigarletters[] = Arrays.copyOfRange(cigarlettersTmp, 1, cigarlettersTmp.length);

        /*
         * get exon starts and ends as well as individual and cumulative exon lengths
         */
        int len = 0;
        for(int i = 1; i < chrom.length; i++){
            //System.err.println(i + "\t" + Arrays.toString(chrom));
            if( i % 2 == 0){
                exonEnds.add(Integer.parseInt(chrom[i]));
                len += exonEnds.get(exonEnds.size()-1) - exonStarts.get(exonStarts.size()-1) +1;
                //exonLength.add(new Integer(exonEnds.get(exonEnds.size()-1) - exonStarts.get(exonStarts.size()-1) +1));
                cmLength.add(new Integer(len));
            }
            else{
                exonStarts.add(Integer.parseInt(chrom[i]));
            }
        }
        //System.err.println(cmLength);
        /*
         * find the exon that contains the read start
         */
        int readStartIdx = -1;
        int readStartAbs = -1;
        if(readStartRel <= cmLength.get(0)){
            readStartIdx = 0;
            readStartAbs = exonStarts.get(0) + readStartRel - 1;
        }
        for(int i = 1; i < cmLength.size(); i++){
            //System.err.println(readStartRel + "\t" +  cmLength.get(i)  + "\t" +  readStartRel  + "\t" +   (cmLength.get(i)-cmLength.get(i-1)));
            if(readStartRel <= cmLength.get(i) && readStartRel > cmLength.get(i-1)){
                readStartIdx = i;
                readStartAbs = exonStarts.get(i) + (readStartRel-cmLength.get(i-1))-1;
            }
        }


        //System.err.println(readStartIdx);
        if(cigar.equals("*")){
            newcigar = "*";
        }
        else{
            int currCigarPos = 0;
            int currReadPos = exonEnds.get(readStartIdx) - readStartAbs + 1;
            int readRemain = cigarsum;
            boolean found = false;

            for(int i = 0; i < cigarnums.length; i++){

                if(!cigarletters[i].equals("I") && !cigarletters[i].equals("S") && !cigarletters[i].equals("H")){
                    //System.out.println("adding cignum" + cigarnums[i] + " for " + );
                    currCigarPos += cigarnums[i];
                    //readRemain -= cigarnums[i];
                }

                while(currCigarPos > currReadPos && !found && readStartIdx < exonEnds.size()-1){

                    //found = true;
                    int len1 = cigarnums[i] - (currCigarPos - currReadPos);
                    cigarnums[i] -= len1;
                    int Nlength = exonStarts.get(readStartIdx+1) - exonEnds.get(readStartIdx) - 1 ;
                    newcigar = newcigar + String.valueOf(len1) + cigarletters[i] + String.valueOf(Nlength) + "N";// + String.valueOf(len2 + cigarletters[i]);
                    readRemain -= len1;
                    //System.err.println(readStartIdx + "\t" + readRemain);
                    readStartIdx++;
                    //System.err.println(exonEnds.get(readStartIdx) - exonStarts.get(readStartIdx) + 1);
                    if(exonEnds.get(readStartIdx) - exonStarts.get(readStartIdx) + 1 < readRemain){
                        currReadPos += (exonEnds.get(readStartIdx) - exonStarts.get(readStartIdx) + 1);
                        //System.err.println("here");
                    }
                    else{
                        currReadPos += readRemain;
                    }

                }
                //else{
                newcigar = newcigar + String.valueOf(cigarnums[i]) + cigarletters[i];
                //}
            }
        }
//      System.err.println(readStartRel);
//      System.err.println(readStartIdx);
//      System.err.println(readStartAbs);
        //System.err.println(exonStarts);
        //System.err.println(exonEnds);
//      System.err.println(exonLength);
//      System.err.println(cmLength);
//      System.err.println(cigar);
//      System.err.println(newcigar);

        String sep = "\t";
        String newLine = fields[0] + sep + fields[1] + sep + chrom[0] + sep + readStartAbs + sep + fields[4] + sep + newcigar + sep;
        for(int i = 6; i < fields.length-1; i++){
            newLine += fields[i]+sep;
        }
        newLine += fields[fields.length-1];

        return newLine;
    }



    /**
     * @param args
     */
    public static void main(String[] args) {
        try {
           java.io.BufferedReader stdin = new java.io.BufferedReader(new java.io.InputStreamReaderSystem.in));
           String line;


            while((line = stdin.readLine()) !=null){
                //System.out.println();
                 //System.err.println('>'+line);
                 String convertedLine = convertLine(line);

                 System.out.println(convertedLine);
            }
        }     
        catch (java.io.IOException e) { System.out.println(e); }
        catch (NumberFormatException e) { System.out.println(e); }


    }


}
rna-seq unix samtools • 370 views
ADD COMMENTlink modified 9 months ago by ATpoint15k • written 9 months ago by ss0
4

Rule # 1: Don't reinvent the wheel. Are you sure that you had to write a custom script? That something like bedtools wouldn't do the trick? Because the simplest explanation is that your custom script has a bug.

ADD REPLYlink written 9 months ago by swbarnes25.2k
1

do not reinvent the wheel , you're re-writing htsjdk : https://github.com/samtools/htsjdk/

ADD REPLYlink written 9 months ago by Pierre Lindenbaum119k
1

I'm not sure I understand this java code but it looks like you're re-inventing bam2bed http://bedtools.readthedocs.io/en/latest/content/tools/bamtobed.html

ADD REPLYlink written 9 months ago by Pierre Lindenbaum119k
2

Please post the command line args, especially samtools view -H would cause this.

ADD REPLYlink written 9 months ago by e.benn110

Please refer to my update above. Thanks!

ADD REPLYlink written 9 months ago by ss0

"Fourth, I applied the java code to find the start and end coordinates of the exons and apply them" You attempted to filter the output of BWA to exon-only regions? And the output of this command was no data. It seems like your filter was wrong in some manner. If you don't post the actual commands you use we can only guess what was wrong though.

ADD REPLYlink written 9 months ago by e.benn110

Thank you for your reply. sorry for not including code earlier. Can you please look above for the code for the fourth part. As for the specific commands, I am just using trial and error based on what I see online and what is working. I gave the basic functions of what I am using above as well.

ADD REPLYlink written 9 months ago by ss0
2

What is the code above trying to do? Filter for certain regions? See gatk PrintReads -L or samtools view -L on how to do this using various region specifying files (beds vcfs etc).

Also did the output of BWA contain data apart from the header? You only mention the output of your java code. Another possibility is that you ran this java code on some kind of compressed sam file (I wont speculate). That may have left the header intact and destroyed the data as the headers are sometimes uncompressed.

ADD REPLYlink written 9 months ago by e.benn110
2

Make your life easier and just use STAR.

ADD REPLYlink written 9 months ago by Devon Ryan89k

Hello ss,

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.

code_formatting

Thank you!

fin swimmer

ADD REPLYlink written 9 months ago by finswimmer11k

I have adapted your title to make it more specific.

Additionally:

I concatenated two files together: a file with the reference genome and a file with the RNA splice junctions using the cat function.

Why?

ADD REPLYlink modified 9 months ago • written 9 months ago by WouterDeCoster38k
1
gravatar for ATpoint
9 months ago by
ATpoint15k
Germany
ATpoint15k wrote:

Because there are plenty of answers already, and this becomes a bit messy, please give some details on the following:

1) What kind of data is this? Given that you mentioned splice junctions, is it RNA-seq?

2) Please give a screenshot of samtools view your_file.sam | head

3) What exactly is the problem you are experiencing? Is the sam file corrupted? Check with samtools quickcheck -qvvv your_file.sam

4) Given your file was not corrupted and the alignment was fine, what are the next steps you plan and why, so what is the question you want to answer.

Once the points are clear, we can try to put you back on track.

ADD COMMENTlink modified 9 months ago • written 9 months ago by ATpoint15k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 760 users visited in the last hour