Hi all,
I want to create my RNA mapping data into a library for further analysis. Now I have bowtie2 mapping data, which is in bam files, I now use bedtools to extract fastq mapping reads from those bam mapping files. (bedtools bamtofastq function) But it seems like the fastq data produced by bedtools would only contain the original sequence name, but not which , for example , chromosomes it mapped to, and where it is. Below is my bam files, I want to have the the mapped chromosomes and location information in the name of the fastq files, what I could do? Is there a simple way to do it? I have tried to do the replace with 'sed' command, but not very successful so far. Below I attached my bam files info and my attempt to replace names with sed command
sed files
while read p; do echo "$p"; sed -i -e $p mapped.fastq; done < replacename.txt
replacename.txt
in each line, is in's/original_sequence_name/replaced_mapping_information/g'
my bam files that need to convert to fastq
SRR1761528.1829103 16 CL1Contig15_2_sc_0.515159_l_155 337 1 50M * 0 0 TTGGAGTGTATTGGGTGCGTTCGTGGCAAAAAATCACTTCGTGATTCGCG BFHFEECHEIIIHDIIIIHDIIIIIHDHIHIIHEG;GFHFHHDD<FF@@@ AS:i:-5 XS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:32C17 YT:Z:UU SRR1761528.942684 16 CL34Contig23_2_sc_0.402716_l_155 34 0 12M1D38M * 0 0 TTTGGACATATTGAATTTACTGGGTGCGTTCGTGGCAAAAACTCACTTCG JJJJJJJJJJJJIJJJJJIJJJJJJJJIJJJJJJJJJHGHFHEFDFFCCB AS:i:-26 XS:i:-26 XN:i:0 XM:i:3 XO:i:1 XG:i:1 NM:i:4 MD:Z:12^G2G1G2T30 YT:Z:UU
For example, I want the CL1Conitg15_2_sc_0.515159_l_155
to be included in the fastq sequence name, to form like SRR1761528.1829103_CL1Conitg15_2_sc_0.515159_l_155
.
If anyone has a good idea?
You should probably run this through a
samtools view
with appropriate filtering flags before to make sure all non-primary alignments are filtered out. Otherwise you might create kind of "duplicate" entries in the fastq file.Thanks for your suggestion. I have filtered out the mappings with flags first, then extract the reads to fastq. The code I used in the end is
$ samtools view test.bam | awk -F " " '{print "@"$1"_"$3"\n"$10"\n""+""\n"$11}' > output.fastq
Thanks very much!
Thank you very much! This is so helpful. Due to my data is in bam file. So I run it through samtools view and it works like magic.
The code is $ samtools view test.bam | awk -F " " '{print "@"$1"_"$3"\n"$10"\n""+""\n"$11}' > output.fastq