HemI 2.0 is free and open to all users and there is no login requirement.


HemI Installation Packages
Windows Platform (x86)
36.41 MB
Windows Platform (x64)
37.26 MB
Linux/Unix Platform (x86)
38.62 MB
Linux/Unix Platform (x64)
39.04 MB
Mac OS X Platform (x86)
18.95 MB
Mac OS X Platform (x64)
18.87 MB
For publication of results please cite the following article:
HemI 2.0: an online service for heatmap illustration                              
Wanshan Ning, Yuxiang Wei, Letian Gao, Cheng Han, Yujie Gou, Shanshan Fu, Dan Liu, Chi Zhang , Xinhe Huang, Sicheng Wu, Di Peng, Chenwei Wang and Yu Xue.

Nucleic Acids Research 2022 Jun 7.

[Abstract] | [Full Text] | [PDF]

HemI: A Toolkit for Illustrating Heatmaps                           
Wankun Deng, Yongbo Wang, Zexian Liu, Han Cheng and Yu Xue.

PLoS One 2014 Nov 5;9(11):e111988.

[Abstract] | [Full Text] | [PDF]

New Features:

◈ An online platform that supports multiple browsers. New

◈ Adding a variety of enrichment analysis options. New

◈ Providing more clustering methods. New

◈ Providing more types of distance metrics. New

◈ Visualization of numeric data in the heatmap. New

◈ Introducing five idioms for visualization of enrichment analysis results. New

◈ Providing an option to query functions of individual genes. New

◈ Supporting more input formats. New

◈ Supporting more formats for output figures. New

◈ The enrichment analysis results are downloadable. New

◈ The clustering results are downloadable. New

HemI 2.0 HemI 1.0
Online Platform
Enrichment Analysis
Cluster Methods 7 3
Distances 22 7
Digital display
Presentations Heatmap, Bubble chart, Bar graph, Coxcomb chart, Pie chart and Word cloud Heatmap


A simple demo of using HemI 2.0:

The Options in HemI 2.0:

1. Data Loading

    HemI 2.0 supports loading of files in 3 formats: Microsoft Excel workbook (97-2003, *.xls; 07-2019, *.xlsx), Comma-Separated Values (*.csv) and Text File (*.txt, tab split file). After generating the table, click on the row/column titles to change or add annotations.

2. Clustering

     To better meet the enormous demands for data analysis in heatmaps, we implemented 7 commonly-used methods for data clustering, including single, complete, average, weighted, centroid, median and ward linkage clustering methods. Also, we included 22 types of distance metrics, including Bray-Curtis, Canberra, Chebyshev, Manhattan, Correlation, Cosine, Dice, Euclidean (default), Hamming, Jaccard, Jensen-Shannon, Kulsinski, Mahalanobis, Matching, Minkowski, Rogers-Tanimoto, Russell-Rao, Standardized Euclidean, Sokal-Michener, Sokal-Sneath, Squared Euclidean, and Yule distance metrics.

Clustering method:
single(y) Perform single/min/nearest linkage on the condensed distance matrix y.
complete(y) Perform complete/max/farthest point linkage on a condensed distance matrix.
average(y) Perform average/UPGMA linkage on a condensed distance matrix.
weighted(y) Perform weighted/WPGMA linkage on the condensed distance matrix.
centroid(y) Perform centroid/UPGMC linkage.
median(y) Perform median/WPGMC linkage.
ward(y) Perform Ward’s linkage on a condensed distance matrix.


1.For method ‘single’, an optimized algorithm based on minimum spanning tree is implemented. It has time complexity O(n2). For methods ‘complete’, ‘average’, ‘weighted’ and ‘ward’, an algorithm called nearest-neighbors chain is implemented. It also has a time complexity O(n2). For other methods, a naive algorithm is implemented with O(n3) time complexity. All algorithms use theO(n2) memory.

2.Methods ‘centroid’, ‘median’, and ‘ward’ are correctly defined only if Euclidean pairwise metric is used. If y is passed as a precomputed pairwise distance, then users should ensure that the distance is in fact Euclidean, otherwise the produced result will be incorrect.

     The clustering function of HemI 2.0 is based on the scipy clustering package (scipy.cluster). You can choose different clustering methods and distance metrics, such as ‘weighted’ method and ‘euclidean pairwise’ distance metric. See scipy docs (https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html) for more details about parameters.

3. Enrichment analysis

     For further data analysis, we also implemented an option of enrichment analysis for 12 model species, including Homo sapiens, Mus musculus, Rattus norvegicus, Saccharomyces cerevisiae, Drosophila melanogaster, Arabidopsis thaliana, Sus scrofa, Canis lupus familiaris, Bos taurus, Gallus gallus, Caenorhabditis elegans, and Danio rerio. For organism selection, we will recommend a most likely organism based on the UniProt IDs or gene symbols inputted by the user.

    We compiled 15 types of functional annotations, including 4 sets of Gene Ontology (GO) annotations (All, biology processes, molecular functions, and cellular components), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, Disease Ontology (DO) terms, and 9 sets of Hallmark Gene Sets taken from Molecular signatures database (MSigDB) (Positional Gene Sets, Curated Gene Sets, Regulatory Target Gene Sets, Computational Gene Sets, Ontology Gene Sets, Oncogenic Signature Gene Sets, Immunologic Signature Gene Sets, Cell Type Signature Gene Sets, and All Gene Sets mixed). For the enrichment analysis, the hypergeometric test was used to calculate an enrichment ratio (E-ratio) and a P value for each category of functional annotations. To intuitively visualize the enrichment results, we provided 5 idioms, including bubble chart, bar graph, coxcomb chart, pie chart and word cloud.

Gene Ontology

    An ontology is a formal representation of a body of knowledge within a given domain. Ontologies usually consist of a set of classes (or terms or concepts) with relations that operate between them. The Gene Ontology (GO) describes our knowledge of the biological domain with respect to three aspects:

Molecular Function Molecular-level activities performed by gene products. Molecular function terms describe activities that occur at the molecular level, such as “catalysis” or “transport”. GO molecular function terms represent activities rather than the entities (molecules or complexes) that perform the actions, and do not specify where, when, or in what context the action takes place. Molecular functions generally correspond to activities that can be performed by individual gene products (i.e. a protein or RNA), but some activities are performed by molecular complexes composed of multiple gene products. Examples of broad functional terms are catalytic activity and transporter activity; examples of narrower functional terms are adenylate cyclase activity or Toll-like receptor binding. To avoid confusion between gene product names and their molecular functions, GO molecular functions are often appended with the word “activity” (a protein kinase would have the GO molecular function protein kinase activity).
Cellular Component The locations relative to cellular structures in which a gene product performs a function, either cellular compartments (e.g., mitochondrion), or stable macromolecular complexes of which they are parts (e.g., the ribosome). Unlike the other aspects of GO, cellular component classes refer not to processes but rather a cellular anatomy.
Biological Process The larger processes, or ‘biological programs’ accomplished by multiple molecular activities. Examples of broad biological process terms are DNA repair or signal transduction. Examples of more specific terms are pyrimidine nucleobase biosynthetic process or glucose transmembrane transport. Note that a biological process is not equivalent to a pathway. At present, the GO does not try to represent the dynamics or dependencies that would be required to fully describe a pathway.

Disease Ontology

    The design of the Disease ontology will enable greater understanding of disease states by placing heritable disorders in the context of other infectious diseases and related diseases. The structure of Disease Ontology and the external references to other terminologies will enable the integration of disparate datasets through the concept of disease.

Kyoto Encyclopedia of Genes and Genomes

    KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies. See Release notes (October 1, 2021) for new and updated features.

Gene Set Enrichment Analysis

    Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).The Molecular Signatures Database (MSigDB) is a collection of annotated gene sets for use with GSEA.

H: hallmark gene sets
(browse 50 gene sets)
Hallmark gene sets summarize and represent specific well-defined biological states or processes and display coherent expression. These gene sets were generated by a computational methodology based on identifying overlaps between gene sets in other MSigDB collections and retaining genes that display coordinate expression. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
C1: positional gene sets
(browse 278 gene sets)
Gene sets corresponding to each human chromosome and each cytogenetic band. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
C2: curated gene sets
(browse 6290 gene sets)
Gene sets in this collection are curated from various sources, including online pathway databases and the biomedical literature. Many sets are also contributed by individual domain experts. The gene set page for each gene set lists its source. The C2 collection is divided into the following two sub-collections: Chemical and genetic perturbations (CGP) and Canonical pathways (CP). details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
CGP: chemical and genetic perturbations
(browse 3368 gene sets)
Gene sets represent expression signatures of genetic and chemical perturbations. A number of these gene sets come in pairs: xxx_UP (and xxx_DN) gene set representing genes induced (and repressed) by the perturbation. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
CP: Canonical pathways
(browse 2922 gene sets)
Gene sets from pathway databases. Usually, these gene sets are canonical representations of a biological process compiled by domain experts. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
BIOCARTA subset of CP
(browse 292 gene sets)
Canonical Pathways gene sets derived from the BioCarta pathway database. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
KEGG subset of CP
(browse 186 gene sets)
Canonical Pathways gene sets derived from the KEGG pathway database. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
PID subset of CP
(browse 196 gene sets)
Canonical Pathways gene sets derived from the PID pathway database. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
REACTOME subset of CP
(browse 1604 gene sets)
Canonical Pathways gene sets derived from the Reactome pathway database. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
WikiPathways subset of CP
(browse 615 gene sets)
Canonical Pathways gene sets derived from the WikiPathways pathway database. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
C3: regulatory target gene sets
(browse 3731 gene sets)
Gene sets representing potential targets of regulation by transcription factors or microRNAs. The sets consist of genes grouped by elements they share in their non-protein coding regions. The elements represent known or likely cis-regulatory elements in promoters and 3'-UTRs. The C3 collection is divided into two sub-collections: microRNA targets (MIR) and transcription factor targets (TFT). details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
MIR: microRNA targets
(browse 2598 gene sets)
All miRNA target prediction gene sets. Combined superset of both miRDB prediction methods and legacy sets. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
miRDB subset of MIR
(browse 2377 gene sets)
Gene sets containing high-confidence gene-level predictions of human miRNA targets as catalogued by miRDB v6.0 algorithm (Chen and Wang, 2020). details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
MIR_Legacy subset of MIR
(browse 221 gene sets)
Older gene sets that contain genes sharing putative target sites (seed matches) of human mature miRNA in their 3'-UTRs. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
TFT: transcription factor targets
(browse 1133 gene sets)
All transcription factor target prediction gene sets. Combined superset of both GTRD prediction methods and legacy sets. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
GTRD subset of TFT
(browse 523 gene sets)
Genes that share GTRD (Kolmykov et al. 2021) predicted transcription factor binding sites in the region -1000,+100 bp around the TSS for the indicated transcription factor. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
TFT_Legacy subset of TFT
(browse 610 gene sets)
Older gene sets that share upstream cis-regulatory motifs which can function as potential transcription factor binding sites. Based on work by Xie et al. 2005 details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
C4: computational gene sets
(browse 858 gene sets)
Computational gene sets defined by mining large collections of cancer-oriented microarray data. The C4 collection is divided into two sub-collections: CGN and CM. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
CGN: cancer gene neighborhoods
(browse 427 gene sets)
Gene sets defined by expression neighborhoods centered on 380 cancer-associated genes. This collection is described in Subramanian, Tamayo et al. 2005 Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
CM: cancer modules
(browse 431 gene sets)
Gene sets defined by Segal et al. 2004. Briefly, the authors compiled gene sets ('modules') from a variety of resources such as KEGG, GO, and others. By mining a large compendium of cancer-related microarray data, they identified 456 such modules as significantly changed in a variety of cancer conditions. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
C5: ontology gene sets
(browse 14998 gene sets)
Gene sets that contain genes annotated by the same ontology term. The C5 collection is divided into two sub-collections, the first derived from the Gene Ontology resource (GO) which contains BP, CC, and MF components and a second derived from the Human Phenotype Ontology (HPO). details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
GO: Gene Ontology gene sets
(browse 10185 gene sets)
All gene sets derived from Gene Ontology. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
BP: subset of GO
(browse 7481 gene sets)
Gene sets derived from the GO Biological Process ontology. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
CC: subset of GO
(browse 996 gene sets)
Gene sets derived from the GO Cellular Component ontology. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
MF: subset of GO
(browse 1708 gene sets)
Gene sets derived from the GO Molecular Function ontology. Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
HPO: Human Phenotype Ontology
(browse 4813 gene sets)
Gene sets derived from the Human Phenotype ontology. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
C6: oncogenic signature gene sets
(browse 189 gene sets)
Gene sets that represent signatures of cellular pathways which are often dis-regulated in cancer. The majority of signatures were generated directly from microarray data from NCBI GEO or from internal unpublished profiling experiments involving perturbation of known cancer genes. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
C7: immunologic signature gene sets
(browse 5219 gene sets)
Gene sets that represent cell states and perturbations within the immune system. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
ImmuneSigDB subset of C7
(browse 4872 gene sets)
Gene sets representing chemical and genetic perturbations of the immune system generated by manual curation of published studies in human and mouse immunology. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
VAX: vaccine reponse gene sets
(browse 347 gene sets)
Gene sets curated by the Human Immunology Project Consortium (HIPC) describing human transcriptomic immune responses to vaccinations. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs
C8: cell type signature gene sets
(browse 671 gene sets)
Gene sets that contain curated cluster markers for cell types identified in single-cell sequencing studies of human tissue. details Download GMT Files
Gene Symbols
NCBI (Entrez) Gene IDs

Frequently Asked Questions:

1. Q: What is the difference between HemI 2.0 and HemI 1.0 ?

A: HemI 2.0 provides more convenient web services, integrates enrichment analysis function, and provides a variety of statistical chart types including heatmap, bar graph, bubble chart, coxcomb chart, pie chart and word cloud.

2. Q: How to use HemI 2.0 ?

A: Firstly, please visit the HemI 2.0 at https://hemi.biocuckoo.org/. In the HEATMAP ILLUSTRATOR part, you can upload your dataset and customize Heatmap Settings or Clustering Settings. To know more about Heml 2.0, you can read the manual first. Also, we prepared 5 examples for you. You can browse the examples online.

3. Q: What's the usage of the examples?

A: We prepared 5 different datasets from the high-profile journals. Then the heatmaps are implemented in HemI 2.0. You can try to modify one of the examples into your figure at a start to use HemI 2.0.

4. Q: May I put two or more heatmaps in a single graph or add edit heatmap by adding some other figures with HemI?

A: At current stage, HemI 2.0 can't make more than one heatmap in a single figure. Also, HemI can't directly edit figures except to modify custom settings. If you truly need some other functions, please do not hesitate to contact with us. We will realize the new options based on experimentalists' feedbacks in later updated versions.

5. Q: I have a few questions which are not listed above, how can I contact the authors of HemI 2.0?

A: Please contact the major author: Dr. Yu Xue (Email: xueyu@hust.edu.cn) or Dr. Wanshan Ning (Email: ningwanshan@hust.edu.cn) for details.

6. Q: What does n, m, N, M, E-ratio, P value mean in the enrichment results?

A: N = number of proteins in background annotated by at least one GO term

    n = number of proteins in background annotated by the GO term t

    M = number of proteins in foreground annotated by at least one GO term

    m = number of proteins in foreground annotated by the GO term t


    P value was calculated with the hypergeometric distribution

Browser compatibility
OS Version Chrome Firefox Microsoft Edge Safari
Linux Ubuntu 18.04 96.0.4664.110 92.0 N/A N/A
MacOS HighSierra 96.0.4664.101 95.0 N/A 12.1
Windows 10 96.0.4664.110 95.0 96.0.1054.62 N/A