BAM
BAM文件为二进制格式,用于保存序列的比对和基因注释信息。SAW count 在BAM文件的可选字段中添加自定义标签,用于记录read的坐标、CID、MID信息,注释信息通过标签字段添加在BAM中。
标签信息
BAM 可选字段中添加的自定义标签信息:
| Tag | Description |
|---|---|
| Cx:i | x coordinate of the Coordiante ID. |
| Cy:i | y coordinate of the Coordiante ID. |
| UR:Z | The hexadecimal representation of uncorrected binary-encoded MID. |
| XF:i | Mapping region on the reference genome. Valid value: 0=EXONIC, 1=INTRONIC, 2=INTERGENIC, 3=rRNA. |
| GI:Z | Annotated gene ID. |
| GE:Z | Annotated gene name. |
| GS:Z | ‘+’ or ‘-’, indicating forward/reverse strand respectively. |
| UB:Z | The hexadecimal representation of count corrected binary-encoded MID. |
原始比对输出的BAM示例:
E100026571L1C009R00301275185 16 1 3000095 255 26M121066N74M * 0 0 GGCTTTTTTTTTTTTTTTTTTTTTTTTTTTCTAAATATTGGGTTTTATTAGCACCATGATAACTGTATATTAATTTGCACTGACTGTCATAACAAAATAC G+:GFFGGFGFFGFFGFGGFFGFFFFFCFGFCFGGGFGGFGFFFFGGFGGFGFFFGGFFGFFFGFGFGFFGFFGFGFFFFGFFFFFFFFGGFFGGFFGEF NH:i:1 HI:i:1 AS:i:88 nM:i:0 Cx:i:4826 Cy:i:11598 UR:Z:6FA29
基因注释后的BAM示例:
E100026571L1C002R00703943265 1040 1 3082766 255 11M132671N89M * 0 0 CTGCTGCAGCTTTTTTTTCTTTGAGATTTATTTTTATGCTATGTGTATGGGTATTTTGCCTGCATATATGTCTATGCACCATGTGTGTGCAGTGCTTGAG FFFFFECGFDCFGDGDFEE@EEGIBFGGCGFFGACGFCGFFDGDGFFFFFFEGCDFCGFFGG@FFF=EFFDGGGGGFDGFFFGGGFGFFGGGFFGGGDFG NH:i:1 HI:i:1 AS:i:88 nM:i:0 Cx:i:7767 Cy:i:18052 UR:Z:7AE49 XF:i:0 GI:Z:ENSMUSG00000051951 GE:Z:Xkr4 GS:Z:- UB:Z:79E49
比对统计信息
测序 FASTQ 经过 read 比对之后,该环节的统计文件、详细信息被保存在 /STEREO_ANALYSIS_WORKFLOW/ALIGNMENT/<lane>.CIDMap.stat目录下。
| Metric | Description |
|---|---|
| Number of CID in chip mask | Number of CIDs in the chip mask file |
| Number of unique CID in FASTQ | Number of unique CIDs in FASTQs |
| Number of total reads | Number of total reads in FASTQs |
| Q10 in CID % | Ratio of Q10 CID bases |
| Q20 in CID % | Ratio of Q20 CID bases |
| Q30 in CID % | Ratio of Q30 CID bases |
| Number of mapped CID | Number of reads mapped to CID |
| % of mapped CID | Ratio of reads mapped to CID |
| Number of exactly mapped CID | Number of reads exactly mapped to CID |
| % of exactly mapped CID | Ratio of reads exactly mapped to CID |
| Number of CID with mismatch | Number of reads mapped to CID with mismatch |
| % of CID with mismatch | Ratio of reads mapped to CID with mismatch |
| Q10 in RNA % | Ratio of Q10 RNA bases |
| Q20 in RNA % | Ratio of Q20 RNA bases |
| Q30 in RNA % | Ratio of Q30 RNA bases |
| Number of reads with polyA | Number of reads with polyA sequence |
| % of reads with polyA | Ratio of reads with polyA sequence |
| Number of short reads (trim polyA) | Number ot short reads after trimming polyA sequence |
| % of short reads (trim polyA) | Ration ot short reads after trimming polyA sequence |
| Number of reads with adapter | Number of reads with adapter sequence |
| % of reads with adapter | Ration of reads with adapter sequence |
| Number of short reads (trim adapter) | Number of short reads after trimming adapter sequence |
| % of short reads (trim adapter) | Ratio of short reads after trimming adapter sequence |
| Number of reads filtered with DNB | Number of reads with DNB sequence |
| % of reads filtered with DNB | Ratio of reads with DNB sequence |
| Q10 in clean RNA % | Ratio of Q10 RNA bases after filtering |
| Q20 in clean RNA % | Ratio of Q20 RNA bases after filtering |
| Q30 in clean RNA % | Ratio of Q30 RNA bases after filtering |
| Q10 in MID % | Ratio of Q10 MID bases |
| Q20 in MID % | Ratio of Q20 MID bases |
| Q30 in MID % | Ratio of Q30 MID bases |
| Number of low quality MID | Number of MID with low quality bases |
| % of low quality MID | Ratio of MID with low quality bases |
| Number of MID with N | Number of MID with N base |
| % of MID with N | Ratio of MID with N base |
| Number of MID in specific sequence | Number of MID mapped to specific sequences |
| % of MID with specific sequence | Ratio of MID mapped to specific sequences |
| Q10 in clean MID % | Ratio of Q10 MID bases after filtering |
| Q20 in clean MID % | Ratio of Q20 MID bases after filtering |
| Q30 in clean MID % | Ratio of Q30 MID bases after filtering |
| Number of exact MID | Number of reads exactly mapped to MID |
| % of exact MID | Ratio of reads exactly mapped to MID |
| Number of inexact MID | Number of reads inexactly mapped to MID |
| % of inexact MID | Ratio of reads inexactly mapped to MID |
注释统计信息
read 经过基因注释之后,该环节的统计文件、详细信息被保存在 /STEREO_ANALYSIS_WORKFLOW/ANNOTATION/*.bam.summary.stat目录下。
| Metric | Description |
|---|---|
| Number of total reads | Number for total reads aligned to genome |
| Number of reads to be annotated | Number of reads that will be annotated with GTF/GFF annotation database |
| % of reads to be annotated | % of reads that will be annotated with GTF/GFF annotation database |
| Number of uniquely mapped reads to be annotated | Number of reads to be annotated which are uniquely mapped to genome |
| % of uniquely mapped reads to be annotated | Ratio of reads to be annotated which are uniquely mapped to genome |
| Number of multi-mapped reads to be annotated | Number of reads to be annotated which are multi-mapped to genome |
| % of multi-mapped reads to be annotated | Ratio of reads to be annotated which are multi-mapped to genome |
| Number of multi-mapped reads | Number of reads multi-mapped to genome |
| Number of reads mapped to transcriptome | Number of reads mapped to transcriptome, including exon and intron regions. |
| % of reads mapped to transcriptome | % of reads mapped to transcriptome, including exonic and intronic regions. |
| Number of unique captures (on CID, gene and MID) | Number of unique captures for reads, based on CID, gene and MID information |
| % of unique captures (on CID, gene and MID) | % of unique captures for reads, based on CID, gene and MID information |
| Number of duplicated reads | Number of duplicated captures for reads, based on CID, gene and MID information |
| % of duplicated reads | % of duplicated captures for reads, based on CID, gene and MID information |
| Number of reads to be annotated | Number of reads that will be annotated with GTF/GFF annotation database |
| Number of reads mapped to exonic regions | Number of reads mapped to exonic regions |
| % of reads mapped to exonic regions | % of reads mapped to exonic regions |
| Number of reads mapped to intronic regions | Number of reads mapped to intronic regions |
| % of reads mapped to intronic regions | % of reads mapped to intronic regions |
| Number of reads mapped to intergenic regions | Number of reads mapped to intergenic regions |
| % of reads mapped to intergenic regions | % of reads mapped to intergenic regions |
| Number of reads mapped antisense to gene | Number of reads mapped antisense to gene |
| % of reads mapped antisense to gene | % of reads mapped antisense to gene |
| Number of reads mapped to rRNA | Numder of reads mapped to rRNA regions |
| Number of rRNA reads in uniquely mapped | Numder of uniquely mapped reads mapped to rRNA regions |
| % of rRNA reads in uniquely mapped | % of uniquely mapped reads mapped to rRNA regions |
| Number of rRNA reads in multi-mapped | Numder of multi-mapped reads mapped to rRNA regions |
| % of rRNA reads in multi-mapped reads | % of multi-mapped reads mapped to rRNA regions |