Analysis

聚类分析

SAW count, realignreanalyze 分析流程以 AnnData H5AD 格式输出空间聚类结果,其中了数据记录预处理、降维、聚类和差异表达分析的信息结果。

H5AD 中的聚类结果和 UMAP 降维信息可以在 StereoMap 中实现可视化。

这里详细展开了一个H5AD文件中的记录信息:

$ h5dump -n <task id>/outs/analysis/<SN>.bin200_1.0.h5ad ## you can also check <SN>.cellbin_1.0.h5ad
HDF5 "<task id>/outs/analysis/<SN>.bin200_1.0.h5ad" {
FILE_CONTENTS {
 group      /
 dataset    /X
 group      /layers
 group      /obs
 dataset    /obs/_index
 group      /obs/leiden
 dataset    /obs/leiden/categories
 dataset    /obs/leiden/codes
 dataset    /obs/n_genes_by_counts
 group      /obs/orig.ident
 dataset    /obs/orig.ident/categories
 dataset    /obs/orig.ident/codes
 dataset    /obs/pct_counts_mt
 dataset    /obs/total_counts
 dataset    /obs/x
 dataset    /obs/y
 group      /obsm
 dataset    /obsm/X_pca
 dataset    /obsm/X_umap
 dataset    /obsm/spatial
 group      /obsp
 group      /obsp/connectivities
 dataset    /obsp/connectivities/data
 dataset    /obsp/connectivities/indices
 dataset    /obsp/connectivities/indptr
 group      /obsp/distances
 dataset    /obsp/distances/data
 dataset    /obsp/distances/indices
 dataset    /obsp/distances/indptr
 group      /raw
 group      /raw/X
 dataset    /raw/X/data
 dataset    /raw/X/indices
 dataset    /raw/X/indptr
 group      /raw/var
 dataset    /raw/var/_index
 dataset    /raw/var/mean_umi
 dataset    /raw/var/n_cells
 dataset    /raw/var/n_counts
 group      /raw/var/real_gene_name
 dataset    /raw/var/real_gene_name/categories
 dataset    /raw/var/real_gene_name/codes
 group      /raw/varm
 group      /uns
 dataset    /uns/bin_size
 dataset    /uns/bin_type
 group      /uns/gene_exp_leiden
 dataset    /uns/gene_exp_leiden/1
...
 dataset    /uns/gene_exp_leiden/_index
 group      /uns/hvg
 dataset    /uns/hvg/method
 group      /uns/hvg/params
 dataset    /uns/hvg/source
 dataset    /uns/leiden_resolution
 group      /uns/neighbors
 dataset    /uns/neighbors/connectivities_key
 dataset    /uns/neighbors/distance_key
 group      /uns/rank_genes_groups
 dataset    /uns/rank_genes_groups/logfoldchanges
 group      /uns/rank_genes_groups/mean_count
 dataset    /uns/rank_genes_groups/mean_count/1
...
 dataset    /uns/rank_genes_groups/mean_count/_index
 dataset    /uns/rank_genes_groups/names
 group      /uns/rank_genes_groups/params
 dataset    /uns/rank_genes_groups/params/corr_method
 dataset    /uns/rank_genes_groups/params/groupby
 dataset    /uns/rank_genes_groups/params/method
 dataset    /uns/rank_genes_groups/params/reference
 dataset    /uns/rank_genes_groups/params/use_raw
 group      /uns/rank_genes_groups/pts
 dataset    /uns/rank_genes_groups/pts/1
...
 dataset    /uns/rank_genes_groups/pts/_index
 group      /uns/rank_genes_groups/pts_rest
 dataset    /uns/rank_genes_groups/pts_rest/1
...
 dataset    /uns/rank_genes_groups/pts_rest/_index
 dataset    /uns/rank_genes_groups/pvals
 dataset    /uns/rank_genes_groups/pvals_adj
 dataset    /uns/rank_genes_groups/scores
 dataset    /uns/resolution
 group      /uns/sn
 dataset    /uns/sn/_index
 dataset    /uns/sn/batch
 dataset    /uns/sn/sn
 group      /var
 dataset    /var/_index
 dataset    /var/dispersions
 dataset    /var/dispersions_norm
 dataset    /var/highly_variable
 dataset    /var/mean_umi
 dataset    /var/means
 dataset    /var/n_cells
 dataset    /var/n_counts
 group      /var/real_gene_name
 dataset    /var/real_gene_name/categories
 dataset    /var/real_gene_name/codes
 group      /varm
 group      /varp
 }
}

查看数据集

可以在 Python 的 Jupyter Notebook工具页面中检查数据信息,通常使用 AnnData 包检查存储在 H5AD 文件中的元数据集。

import anndata as ad

adata = ad.read_h5ad('./C4144D5.h5ad')

当你输入变量 adata 时,将返回 adata 数据对象内包含内容的简要说明。

# adata

AnnData object with n_obs × n_vars = 439383 × 29759
    obs: 'total_counts', 'n_genes_by_counts', 'pct_counts_mt', 'leiden', 'orig.ident', 'x', 'y'
    var: 'real_gene_name', 'n_cells', 'n_counts', 'mean_umi', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'
    uns: 'bin_size', 'bin_type', 'gene_exp_leiden', 'hvg', 'leiden_resolution', 'neighbors', 'omics', 'pca_variance_ratio', 'rank_genes_groups', 'resolution', 'sn'
    obsm: 'X_pca', 'X_umap', 'spatial'
    obsp: 'connectivities', 'distances'
  • obs:观测元数据(细胞级注释)。一个存储细胞相关信息的表(DataFrame),例如 total_counts(即 MID)、n_genes_by_counts(即基因类型)、Leiden 聚类标签和空间坐标信息等;
  • var:变量元数据(基因级注释)。一个存储基因相关信息的表(DataFrame),例如基因名称、基因是否为高变基因以及基因的生物学功能等;
  • adata.X 是一个经过预处理和标准化的特征表达矩阵,通常表示为稀疏矩阵。

输入 adata.obs 来观察细胞元数据:

# adata.obs    

        total_counts    n_genes_by_counts    pct_counts_mt    leiden    orig.ident    x    y
5583457496640    5    5    0.0    16    sample    1300    11840
5583457496660    24    19    0.0    16    sample    1300    11860
5583457496680    35    24    0.0    16    sample    1300    11880
5583457496700    26    20    0.0    16    sample    1300    11900
5583457496720    20    13    0.0    16    sample    1300    11920
...    ...    ...    ...    ...    ...    ...    ...
84782654432300    5    4    0.0    16    sample    19740    9260
84782654432320    73    36    0.0    2    sample    19740    9280
84782654432340    66    33    0.0    2    sample    19740    9300
84782654432360    73    39    0.0    2    sample    19740    9320
84782654432380    18    9    0.0    16    sample    19740    9340

Leiden 算法得到的聚类分群结果位于 adata.obs['leiden']

输入 adata.var 来观察基因元数据:

# adata.var

        real_gene_name    n_cells    n_counts    mean_umi    means    dispersions    dispersions_norm    highly_variable
ENSMUSG00000000001    Gnai3    3641    4927    1.353200    0.218442    3.864523    0.787665    True
ENSMUSG00000000003    Pbsn    84    99    1.178571    0.004794    3.560551    -0.295258    False
ENSMUSG00000000028    Cdc45    279    335    1.200717    0.016009    3.475172    -0.599423    False
ENSMUSG00000000031    H19    11    13    1.181818    0.000698    3.654413    0.039135    False
ENSMUSG00000000037    Scml2    890    1142    1.283146    0.053320    3.649287    0.020871    False
...    ...    ...    ...    ...    ...    ...    ...    ...
ENSMUSG00000116984    CT030713.2    339    418    1.233038    0.020487    3.625916    -0.062388    False
ENSMUSG00000116987    AC150035.3    212    268    1.264151    0.013059    3.599458    -0.156647    False
ENSMUSG00000116988    AC164314.2    479    630    1.315240    0.029183    3.738448    0.338513    False
ENSMUSG00000116989    AC131339.4    67    77    1.149254    0.003788    3.486996    -0.557302    False
ENSMUSG00000116993    AC135964.2    210    276    1.314286    0.012649    3.561962    -0.290230    False

此外,还可以检查降维分析的结果:

# adata.obsm['X_pca']

array([[ -6.2587485 ,   4.520802  ,  -9.568217  , ...,  -1.9764128 ,
          1.8916789 ,  -1.5253358 ],
       [ -8.745589  ,  -0.8465148 ,  -5.714299  , ...,  -0.06065813,
          0.4646256 ,  -1.4891121 ],
       [ -9.6469    ,  -0.99549234,  -3.3434439 , ...,  -1.5173951 ,
          0.70049083,  -0.56164753],
       ...,
       [-10.14395   ,  -1.4017884 ,  -2.0154784 , ...,  -1.759757  ,
          0.12367768,  -0.3570788 ],
       [ -9.854649  ,  -0.06418456,  -1.2702237 , ...,   0.20150849,
         -0.12149827,  -0.20047918],
       [-11.477859  ,  -0.91366583,  -3.719183  , ...,  -1.4211301 ,
          0.35585332,  -0.44572574]], dtype=float32)

# adata.obsm['X_umap']

array([[-1.4992443,  1.803457 ],
       [ 1.5087016, -0.8629851],
       [ 2.234619 , -1.2892196],
       ...,
       [ 3.3134413, -1.8212844],
       [ 3.317021 , -1.615297 ],
       [ 1.4269315, -1.217469 ]], dtype=float32)

差异表达分析

SAW count, realignreanalyze 会以 CSV 格式输出差异表达分析结果。

差异表达分析的 CSV 结果文件有两种,分别为 find_marker_genes.csv<bin_size>_marker_features.csv

  • find_marker_genes.csv 是差异表达分析的原始输出结果
  • <bin_size>_marker_features.csv 中的数据信息经过整理,格式经过调整,更加简洁明了。

对于每个类群的特征信息,主要计算以下指标:

  • 平均 MID Count
  • 表达占比的Log2变化值
  • 校正后的 p-value (当前簇中特征表达相对于其他簇的置信度)
  • 基因在类群内的表达占比 (Cluster 1 % of expressed = 1 表示该特征在类群中的所所有细胞或bin中均有表达)
Feature ID,Feature Name,Cluster 1 Mean MID Count,Cluster 1 Log2 fold change,Cluster 1 Adjusted p-value,Cluster 1 % of expressed, ... ,Cluster 20 Mean MID Count,Cluster 20 Log2 fold change,Cluster 20 Adjusted p-value,Cluster 20 % of expressed
ENSMUSG00000016559,H3f3b,67.1754386,42.00155933,1.76E-41,1, ... ,0.076923077,-63.19518177,0,0.076923077

<bin_size>_marker_features.csv 中记录的差异表达分析结果可在 StereoMap 中查看,或直接使用 Excel 打开。

多组学联合分析-聚类

如果您对 Stereo-CITE T FF 样本执行 SAW reanalyze,其联合分析多组学聚类结果将保存在 H5MU 中。

以下是 H5MU 中记录的信息的示例:

$ h5dump -n <task id>/outs/analysis/<SN>.bin20.h5mu
HDF5 "<task id>/outs/analysis/<SN>.bin20.h5mu" {
FILE_CONTENTS {
 group      /
 group      /mod
 group      /mod/multiomics
 dataset    /mod/multiomics/X
 group      /mod/multiomics/layers
 dataset    /mod/multiomics/layers/denoised_rna
 group      /mod/multiomics/obs
 dataset    /mod/multiomics/obs/_index
 group      /mod/multiomics/obs/leiden
 dataset    /mod/multiomics/obs/leiden/categories
 dataset    /mod/multiomics/obs/leiden/codes
 dataset    /mod/multiomics/obs/n_genes_by_counts
 group      /mod/multiomics/obs/orig.ident
 dataset    /mod/multiomics/obs/orig.ident/categories
 dataset    /mod/multiomics/obs/orig.ident/codes
 dataset    /mod/multiomics/obs/pct_counts_mt
 dataset    /mod/multiomics/obs/total_counts
 dataset    /mod/multiomics/obs/x
 dataset    /mod/multiomics/obs/y
 group      /mod/multiomics/obsm
 dataset    /mod/multiomics/obsm/X_totalVI
 dataset    /mod/multiomics/obsm/X_umap
 dataset    /mod/multiomics/obsm/spatial
 group      /mod/multiomics/obsp
 group      /mod/multiomics/obsp/connectivities
 dataset    /mod/multiomics/obsp/connectivities/data
 dataset    /mod/multiomics/obsp/connectivities/indices
 dataset    /mod/multiomics/obsp/connectivities/indptr
 group      /mod/multiomics/obsp/distances
 dataset    /mod/multiomics/obsp/distances/data
 dataset    /mod/multiomics/obsp/distances/indices
 dataset    /mod/multiomics/obsp/distances/indptr
 group      /mod/multiomics/raw
 group      /mod/multiomics/raw/X
 dataset    /mod/multiomics/raw/X/data
 dataset    /mod/multiomics/raw/X/indices
 dataset    /mod/multiomics/raw/X/indptr
 group      /mod/multiomics/raw/var
 dataset    /mod/multiomics/raw/var/_index
 dataset    /mod/multiomics/raw/var/mean_umi
 dataset    /mod/multiomics/raw/var/n_cells
 dataset    /mod/multiomics/raw/var/n_counts
 group      /mod/multiomics/raw/var/real_gene_name
 dataset    /mod/multiomics/raw/var/real_gene_name/categories
 dataset    /mod/multiomics/raw/var/real_gene_name/codes
 group      /mod/multiomics/raw/varm
 group      /mod/multiomics/uns
 dataset    /mod/multiomics/uns/bin_size
 dataset    /mod/multiomics/uns/bin_type
 group      /mod/multiomics/uns/gene_exp_leiden
 dataset    /mod/multiomics/uns/gene_exp_leiden/1
...
 dataset    /mod/multiomics/uns/gene_exp_leiden/9
 dataset    /mod/multiomics/uns/gene_exp_leiden/_index
 group      /mod/multiomics/uns/hvg
 dataset    /mod/multiomics/uns/hvg/method
 group      /mod/multiomics/uns/hvg/params
 dataset    /mod/multiomics/uns/hvg/source
 dataset    /mod/multiomics/uns/leiden_resolution
 group      /mod/multiomics/uns/neighbors
 dataset    /mod/multiomics/uns/neighbors/connectivities_key
 dataset    /mod/multiomics/uns/neighbors/distance_key
 dataset    /mod/multiomics/uns/omics
 dataset    /mod/multiomics/uns/resolution
 group      /mod/multiomics/uns/sn
 dataset    /mod/multiomics/uns/sn/_index
 dataset    /mod/multiomics/uns/sn/batch
 dataset    /mod/multiomics/uns/sn/sn
 group      /mod/multiomics/var
 dataset    /mod/multiomics/var/_index
 dataset    /mod/multiomics/var/highly_variable
 dataset    /mod/multiomics/var/highly_variable_nbatches
 dataset    /mod/multiomics/var/highly_variable_rank
 dataset    /mod/multiomics/var/mean_umi
 dataset    /mod/multiomics/var/means
 dataset    /mod/multiomics/var/n_cells
 dataset    /mod/multiomics/var/n_counts
 group      /mod/multiomics/var/real_gene_name
 dataset    /mod/multiomics/var/real_gene_name/categories
 dataset    /mod/multiomics/var/real_gene_name/codes
 dataset    /mod/multiomics/var/variances
 dataset    /mod/multiomics/var/variances_norm
 group      /mod/multiomics/varm
 group      /mod/multiomics/varp
 group      /mod/protein
...
 group      /mod/rna
...
 group      /obs
 dataset    /obs/_index
 dataset    /obs/_scvi_batch
 dataset    /obs/_scvi_labels
 dataset    /obs/_scvi_raw_norm_scaling
 group      /obsm
 dataset    /obsm/multiomics
 dataset    /obsm/protein
 dataset    /obsm/rna
 group      /obsmap
 dataset    /obsmap/multiomics
 dataset    /obsmap/protein
 dataset    /obsmap/rna
 group      /obsp
 group      /uns
 dataset    /uns/_scvi_manager_uuid
 dataset    /uns/_scvi_uuid
 group      /var
 dataset    /var/_index
 dataset    /var/mean_umi
 dataset    /var/n_cells
 dataset    /var/n_counts
 group      /var/real_gene_name
 dataset    /var/real_gene_name/categories
 dataset    /var/real_gene_name/codes
 group      /varm
 dataset    /varm/multiomics
 dataset    /varm/protein
 dataset    /varm/rna
 group      /varmap
 dataset    /varmap/multiomics
 dataset    /varmap/protein
 dataset    /varmap/rna
 group      /varp
 }
}
© 2025 STOmics Tech. All rights reserved.Modified: 2025-12-29 19:47:43

results matching ""

    No results matching ""