I have this bam file that originally contained paired sequences, after converting it into fastq sequences I got 2 different files. My idea is to make an alignment with a chromosome sequence, and from there obtain a VCF. My question is if the sequence that I will use will matter in my future results?, there is a way to know which sequence should I use?
I got the sequences using samtools:
bam2fastq -o blah_aligned_R#.fastq --no-unaligned Bamfilename.bam
Thank you very much for the comments, I will explain myself better (and sorry for any misunderstanding, but this is the first time that I am working in bioinformatics).
I am wanting to see the occurrence of KRAS-variant mutation (as described by Ratner et al, in http://www.nature.com/onc/journal/v31/n42/pdf/onc2011539a.pdf) in chromosome 12 in ovarian cancer cell lines. For this, I have downloaded some bam files from the nci60-project (http://watson.nci.nih.gov/projects/nci60/wes/BAMS/). These bam files are presented as paired sequences, so after extracting chromosome 12 information from them (using the bai files that are in the same webpage as index) by the code:
$ samtools view IGR-OV1_reord_mdups_ralgn_fmate_recal.bam chr12>chr12.bam
Then, I sorted the bam file, and instead of using the original code that I posted at the beginning, I converted them to fastq using:
$ bedtools bamtofastq -i chr12.qsort.bam \ -fq chr12.end1.fq \ -fq2 chr12.end2.fq
So now, I have two sequences in fastq format per each original bam file.
My following steps, that I have in mind, is to align these fastq files with the human assembly 37, convert then again to bam files, and make the variant calling with mpileup so from there start with the annotation using annovar or something alike. So my questions in fact are:
1) Is my sort of pipeline acceptable?
2) Should I use both sequences for the variant calling?
Thank you in advance for your help and comments