I am currently working with highthroughput sequencing of the genome and have generated a SAM file.
However, all of my lines have the @sq and SN body tag, which, according to the SAMTools website is only supposed to be in the Header lines of the file.
Does anyone know why this happened and why my file does not have an alignment section even though I did align sequences using the BWA program?
Thank you in advance for your help!
UPDATE based on question:
- First, I concatenated two files together: a file with the reference genome and a file with the RNA splice junctions using the cat function.
- Second, I indexed this concatenated file using the
bwa index -a bwtsw reference.fa
command - Third, I took this indexed file and aligned it to the read sequence using the bwa aln function and
bwa samse -n4
function - Fourth, I applied the java code to find the start and end coordinates of the exons and apply them
- Fifth, I tried to use the Picard MarkDuplicates method but then got the error that I "cannot copy duplicates to the SAMdictionary." I opened the SAM file that was inputted into this method, and it had only @sq lines with SN tags which means that they are part of the header. There was no alignment section and I am not sure why.
This is the code from the fourth part - I am actually using someone else's source code for this and am not completely sure what this code means. So here it is
import java.util.ArrayList;
import java.util.Arrays;
public class convertCoordinates {
private static String convertLine(String line) {
// TODO Auto-generated method stub
//System.out.println(line);
String[] fields = line.split("\t");
//System.out.println(fields[2]);
String[] chrom = fields[2].split("[-]");
//System.out.println(Arrays.toString(chrom));
//System.out.println(line.charAt(1));
if(line.charAt(0) == '@'){
return line;
}
else if ( fields[5].equals("*") && ( fields[2].length() != 1 || Integer.parseInt(fields[3]) != 0 ) ){
// System.out.println("malformed read");
//return (fields[0] + "malformed read");
//System.exit(0);
fields[2] = "*";
fields[3] = "0";
String newline = Arrays.toString(fields).replace(", ", "\t").replaceAll("[\\[\\]]", "");
return newline;
}
//else if (chrom.length == 7 && )
else if (chrom.length == 1) {
return line;
}
//if(chrom.length != 6){
// return line;
//}
else if(chrom.length == 100){
int start1 = Integer.parseInt(chrom[1]);
int stop1 = Integer.parseInt(chrom[2]);
int start2 = Integer.parseInt(chrom[3]);
int beginning = Integer.parseInt(fields[3]);
int read1start = start1 + beginning - 1;
//System.out.println(read1start);
int read1length = stop1 - read1start + 1;
int Nlength = start2-stop1-1;
String cigar = fields[5];
String newcigar = "";
//System.out.println(cigarletters[1]);
if(cigar.equals("*")){
newcigar = "*";
}
else if (read1length <= 0) {
newcigar = cigar;
//read1start = start2 - (read1length + 1);
read1start = start2 + beginning - stop1 + start1 - 2;
//System.out.println(read1start);
}
else{
String[] cigarnumsTmp = cigar.split("[MIDNSHP]");
int[] cigarnums = new int[cigarnumsTmp.length];
for(int i = 0; i < cigarnumsTmp.length; i++){
cigarnums[i] = Integer.parseInt(cigarnumsTmp[i]);
}
String cigarlettersTmp[] = cigar.split("[0-9]+");//.toString().toCharArray();
String cigarletters[] = Arrays.copyOfRange(cigarlettersTmp, 1, cigarlettersTmp.length);
int currentpos = 0;
boolean found = false;
for(int i = 0; i < cigarnums.length; i++){
if(!cigarletters[i].equals("I") && !cigarletters[i].equals("S") && !cigarletters[i].equals("H")){
//System.out.println("adding cignum" + cigarnums[i] + " for " + );
currentpos += cigarnums[i];
}
if(currentpos > read1length && !found){
found = true;
int len1 = cigarnums[i] - (currentpos - read1length);
int len2 = cigarnums[i] - len1;
newcigar = newcigar + String.valueOf(len1) + cigarletters[i] + String.valueOf(Nlength) + "N" + String.valueOf(len2 + cigarletters[i]);
}
else{
newcigar = newcigar + String.valueOf(cigarnums[i]) + cigarletters[i];
}
}
}
String sep = "\t";
String newLine = fields[0] + sep + fields[1] + sep + chrom[0] + sep + read1start + sep + fields[4] + sep + newcigar + sep;
for(int i = 6; i < fields.length-1; i++){
newLine += fields[i]+sep;
}
newLine += fields[fields.length-1];
return newLine;
}
else{
String newline = parseMultiexon(fields,chrom);
return newline;
}
}
private static String parseMultiexon(String[] fields, String[] chrom) {
ArrayList<Integer> exonStarts = new ArrayList<Integer>();
ArrayList<Integer> exonEnds = new ArrayList<Integer>();
ArrayList<Integer> cmLength = new ArrayList<Integer>();
//ArrayList<Integer> exonLength = new ArrayList<Integer>();
int readStartRel = Integer.parseInt(fields[3]);
String newcigar = "";
/*
* get numbers and characters from cigar string
*/
String cigar = fields[5];
String[] cigarnumsTmp = cigar.split("[MIDNSHP]");
int[] cigarnums = new int[cigarnumsTmp.length];
int cigarsum = 0;
for(int i = 0; i < cigarnumsTmp.length; i++){
cigarnums[i] = Integer.parseInt(cigarnumsTmp[i]);
cigarsum += cigarnums[i];
}
String cigarlettersTmp[] = cigar.split("[0-9]+");//.toString().toCharArray();
String cigarletters[] = Arrays.copyOfRange(cigarlettersTmp, 1, cigarlettersTmp.length);
/*
* get exon starts and ends as well as individual and cumulative exon lengths
*/
int len = 0;
for(int i = 1; i < chrom.length; i++){
//System.err.println(i + "\t" + Arrays.toString(chrom));
if( i % 2 == 0){
exonEnds.add(Integer.parseInt(chrom[i]));
len += exonEnds.get(exonEnds.size()-1) - exonStarts.get(exonStarts.size()-1) +1;
//exonLength.add(new Integer(exonEnds.get(exonEnds.size()-1) - exonStarts.get(exonStarts.size()-1) +1));
cmLength.add(new Integer(len));
}
else{
exonStarts.add(Integer.parseInt(chrom[i]));
}
}
//System.err.println(cmLength);
/*
* find the exon that contains the read start
*/
int readStartIdx = -1;
int readStartAbs = -1;
if(readStartRel <= cmLength.get(0)){
readStartIdx = 0;
readStartAbs = exonStarts.get(0) + readStartRel - 1;
}
for(int i = 1; i < cmLength.size(); i++){
//System.err.println(readStartRel + "\t" + cmLength.get(i) + "\t" + readStartRel + "\t" + (cmLength.get(i)-cmLength.get(i-1)));
if(readStartRel <= cmLength.get(i) && readStartRel > cmLength.get(i-1)){
readStartIdx = i;
readStartAbs = exonStarts.get(i) + (readStartRel-cmLength.get(i-1))-1;
}
}
//System.err.println(readStartIdx);
if(cigar.equals("*")){
newcigar = "*";
}
else{
int currCigarPos = 0;
int currReadPos = exonEnds.get(readStartIdx) - readStartAbs + 1;
int readRemain = cigarsum;
boolean found = false;
for(int i = 0; i < cigarnums.length; i++){
if(!cigarletters[i].equals("I") && !cigarletters[i].equals("S") && !cigarletters[i].equals("H")){
//System.out.println("adding cignum" + cigarnums[i] + " for " + );
currCigarPos += cigarnums[i];
//readRemain -= cigarnums[i];
}
while(currCigarPos > currReadPos && !found && readStartIdx < exonEnds.size()-1){
//found = true;
int len1 = cigarnums[i] - (currCigarPos - currReadPos);
cigarnums[i] -= len1;
int Nlength = exonStarts.get(readStartIdx+1) - exonEnds.get(readStartIdx) - 1 ;
newcigar = newcigar + String.valueOf(len1) + cigarletters[i] + String.valueOf(Nlength) + "N";// + String.valueOf(len2 + cigarletters[i]);
readRemain -= len1;
//System.err.println(readStartIdx + "\t" + readRemain);
readStartIdx++;
//System.err.println(exonEnds.get(readStartIdx) - exonStarts.get(readStartIdx) + 1);
if(exonEnds.get(readStartIdx) - exonStarts.get(readStartIdx) + 1 < readRemain){
currReadPos += (exonEnds.get(readStartIdx) - exonStarts.get(readStartIdx) + 1);
//System.err.println("here");
}
else{
currReadPos += readRemain;
}
}
//else{
newcigar = newcigar + String.valueOf(cigarnums[i]) + cigarletters[i];
//}
}
}
// System.err.println(readStartRel);
// System.err.println(readStartIdx);
// System.err.println(readStartAbs);
//System.err.println(exonStarts);
//System.err.println(exonEnds);
// System.err.println(exonLength);
// System.err.println(cmLength);
// System.err.println(cigar);
// System.err.println(newcigar);
String sep = "\t";
String newLine = fields[0] + sep + fields[1] + sep + chrom[0] + sep + readStartAbs + sep + fields[4] + sep + newcigar + sep;
for(int i = 6; i < fields.length-1; i++){
newLine += fields[i]+sep;
}
newLine += fields[fields.length-1];
return newLine;
}
/**
* @param args
*/
public static void main(String[] args) {
try {
java.io.BufferedReader stdin = new java.io.BufferedReader(new java.io.InputStreamReaderSystem.in));
String line;
while((line = stdin.readLine()) !=null){
//System.out.println();
//System.err.println('>'+line);
String convertedLine = convertLine(line);
System.out.println(convertedLine);
}
}
catch (java.io.IOException e) { System.out.println(e); }
catch (NumberFormatException e) { System.out.println(e); }
}
}
Rule # 1: Don't reinvent the wheel. Are you sure that you had to write a custom script? That something like bedtools wouldn't do the trick? Because the simplest explanation is that your custom script has a bug.
do not reinvent the wheel , you're re-writing htsjdk : https://github.com/samtools/htsjdk/
I'm not sure I understand this java code but it looks like you're re-inventing bam2bed http://bedtools.readthedocs.io/en/latest/content/tools/bamtobed.html
Please post the command line args, especially
samtools view -H
would cause this.Please refer to my update above. Thanks!
"Fourth, I applied the java code to find the start and end coordinates of the exons and apply them" You attempted to filter the output of BWA to exon-only regions? And the output of this command was no data. It seems like your filter was wrong in some manner. If you don't post the actual commands you use we can only guess what was wrong though.
Thank you for your reply. sorry for not including code earlier. Can you please look above for the code for the fourth part. As for the specific commands, I am just using trial and error based on what I see online and what is working. I gave the basic functions of what I am using above as well.
What is the code above trying to do? Filter for certain regions? See
gatk PrintReads -L
orsamtools view -L
on how to do this using various region specifying files (beds vcfs etc).Also did the output of BWA contain data apart from the header? You only mention the output of your java code. Another possibility is that you ran this java code on some kind of compressed sam file (I wont speculate). That may have left the header intact and destroyed the data as the headers are sometimes uncompressed.
Make your life easier and just use STAR.
Hello ss,
Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.Thank you!
fin swimmer
I have adapted your title to make it more specific.
Additionally:
Why?