Documentation for hypercluster¶

Contents:

Indices and tables¶

Installation and logistics¶

Installation¶

Available via pip:

pip install hypercluster

Or bioconda:

conda install hypercluster
 # or
conda install -c conda-forge -c bioconda hypercluster

If you are having problems installing with conda, try changing your channel priority. Priority of conda-forge > bioconda > defaults is recommended.

To check channel priority: conda config --get channels

It should look like:

--add channels 'defaults'   # lowest priority
--add channels 'bioconda'
--add channels 'conda-forge'   # highest priority

If it doesn’t look like that, try:

conda config --add channels bioconda
conda config --add channels conda-forge

Quick reference for clustering and evaluation¶

Clustering algorithms¶
Clusterer	Type
KMeans/MiniBatch KMeans	Partitioner
Affinity Propagation	Partitioner
Mean Shift	Partitioner
DBSCAN	Clusterer
OPTICS	Clusterer
Birch	Partitioner
OPTICS	Clusterer
HDBSCAN	Clusterer
NMF	Partitioner
LouvainCluster	Partitioner
LeidenCluster	Partitioner

Evaluations¶
Metric	Type
adjusted_rand_score	Needs ground truth
adjusted_mutual_info_score	Needs ground truth
homogeneity_score	Needs ground truth
completeness_score	Needs ground truth
fowlkes_mallows_score	Needs ground truth
mutual_info_score	Needs ground truth
v_measure_score	Needs ground truth
silhouette_score	Inherent metric
calinski_harabasz_score	Inherent metric
davies_bouldin_score	Inherent metric
smallest_largest_clusters_ratio	Inherent metric
number_of_clusters	Inherent metric
smallest_cluster_size	Inherent metric
largest_cluster_size	Inherent metric

Quickstart and examples¶

With snakemake:¶

snakemake -s hypercluster.smk --configfile config.yml --config input_data_files=test_data input_data_folder=.

With python:¶

import pandas as pd
from sklearn.datasets import make_blobs
import hypercluster

data, labels = make_blobs()
data = pd.DataFrame(data)
labels = pd.Series(labels, index=data.index, name='labels')

# With a single clustering algorithm
clusterer = hypercluster.AutoClusterer()
clusterer.fit(data).evaluate(
  methods = hypercluster.constants.need_ground_truth+hypercluster.constants.inherent_metrics,
  gold_standard = labels
  )

clusterer.visualize_evaluations()

# With a range of algorithms

clusterer = hypercluster.MultiAutoClusterer()
clusterer.fit(data).evaluate(
  methods = hypercluster.constants.need_ground_truth+hypercluster.constants.inherent_metrics,
  gold_standard = labels
  )

clusterer.visualize_evaluations()

Example work flows for both python and snakemake are here

Source code is available here