Clustering
Clustering algorithms for data grouping and mixture modeling.
Clustering and mixture reduction algorithms.
This module provides Gaussian mixture operations and clustering algorithms commonly used in multi-target tracking for hypothesis management and track clustering.
- class pytcl.clustering.GaussianComponent(weight, mean, covariance)[source]
Bases:
NamedTupleA single Gaussian component in a mixture.
- mean
Mean vector, shape (n,).
- Type:
ndarray
- covariance
Covariance matrix, shape (n, n).
- Type:
ndarray
- class pytcl.clustering.MergeResult(component, cost)[source]
Bases:
NamedTupleResult of merging two Gaussian components.
- component
The merged Gaussian component.
- Type:
- component: GaussianComponent
Alias for field number 0
- class pytcl.clustering.ReductionResult(components, n_original, n_reduced, total_cost)[source]
Bases:
NamedTupleResult of mixture reduction.
- components
Reduced set of components.
- Type:
- components: List[GaussianComponent]
Alias for field number 0
- class pytcl.clustering.GaussianMixture(components=None)[source]
Bases:
objectGaussian Mixture representation with reduction operations.
Provides an object-oriented interface for Gaussian mixture operations including density evaluation, sampling, and reduction.
- Parameters:
components (list of GaussianComponent, optional) – Initial components. If None, creates empty mixture.
- components
Current mixture components.
- Type:
Examples
>>> import numpy as np >>> gm = GaussianMixture() >>> gm.add_component(0.5, np.array([0., 0.]), np.eye(2) * 0.1) >>> gm.add_component(0.5, np.array([2., 2.]), np.eye(2) * 0.1) >>> len(gm) 2 >>> gm.mean # Mixture mean array([1., 1.])
- components: List[GaussianComponent]
- add_component(weight, mean, covariance)[source]
Add a component to the mixture.
- Parameters:
weight (float) – Component weight.
mean (array_like) – Mean vector.
covariance (array_like) – Covariance matrix.
- property covariances: List[ndarray[tuple[Any, ...], dtype[floating]]]
List of component covariances.
- property covariance: ndarray[tuple[Any, ...], dtype[floating]]
Mixture covariance (moment-matched).
- pdf(x)[source]
Evaluate mixture probability density at x.
- Parameters:
x (array_like) – Point at which to evaluate density.
- Returns:
density – Mixture probability density.
- Return type:
- sample(n_samples=1, rng=None)[source]
Draw samples from the mixture.
- Parameters:
n_samples (int) – Number of samples to draw.
rng (numpy.random.Generator, optional) – Random number generator.
- Returns:
samples – Samples, shape (n_samples, n) or (n,) if n_samples=1.
- Return type:
ndarray
- prune(weight_threshold=1e-05)[source]
Remove low-weight components.
- Parameters:
weight_threshold (float) – Minimum weight to retain.
- Returns:
pruned – New mixture with low-weight components removed.
- Return type:
- reduce_runnalls(max_components)[source]
Reduce mixture using Runnalls’ algorithm.
- Parameters:
max_components (int) – Maximum number of components.
- Returns:
reduced – Reduced mixture.
- Return type:
- pytcl.clustering.moment_match(weights, means, covariances)[source]
Compute moment-matched mean and covariance from mixture components.
Given a Gaussian mixture, computes a single Gaussian that matches the first two moments (mean and covariance) of the mixture.
- Parameters:
- Returns:
mean (ndarray) – Moment-matched mean, shape (n,).
covariance (ndarray) – Moment-matched covariance, shape (n, n).
- Return type:
Tuple[ndarray[tuple[Any, …], dtype[floating]], ndarray[tuple[Any, …], dtype[floating]]]
Examples
>>> import numpy as np >>> weights = [0.5, 0.5] >>> means = [np.array([0., 0.]), np.array([2., 0.])] >>> covs = [np.eye(2) * 0.1, np.eye(2) * 0.1] >>> m, P = moment_match(weights, means, covs) >>> m # Should be [1., 0.] array([1., 0.])
- pytcl.clustering.runnalls_merge_cost(c1, c2)[source]
Compute Runnalls’ KL-divergence-based merge cost for two components.
The cost approximates the increase in KL-divergence when merging components c1 and c2 into a single moment-matched Gaussian.
- Parameters:
c1 (GaussianComponent) – First component.
c2 (GaussianComponent) – Second component.
- Returns:
cost – Merge cost (non-negative). Lower cost means components are more similar and merging causes less information loss.
- Return type:
Notes
- The cost function is:
cost = 0.5 * [w_m * log|P_m| - w_1 * log|P_1| - w_2 * log|P_2|]
where w_m = w_1 + w_2, P_m is the moment-matched covariance.
Examples
>>> c1 = GaussianComponent(0.3, np.array([0., 0.]), np.eye(2) * 0.1) >>> c2 = GaussianComponent(0.2, np.array([0.1, 0.]), np.eye(2) * 0.1) # Close >>> c3 = GaussianComponent(0.2, np.array([5., 5.]), np.eye(2) * 0.1) # Far >>> cost_close = runnalls_merge_cost(c1, c2) >>> cost_far = runnalls_merge_cost(c1, c3) >>> cost_close < cost_far # Closer components have lower merge cost True
- pytcl.clustering.merge_gaussians(c1, c2)[source]
Merge two Gaussian components via moment matching.
- Parameters:
c1 (GaussianComponent) – First component.
c2 (GaussianComponent) – Second component.
- Returns:
result – Merged component and merge cost.
- Return type:
Examples
>>> c1 = GaussianComponent(0.3, np.array([0., 0.]), np.eye(2) * 0.1) >>> c2 = GaussianComponent(0.2, np.array([1., 0.]), np.eye(2) * 0.1) >>> result = merge_gaussians(c1, c2) >>> result.component.weight 0.5
- pytcl.clustering.prune_mixture(components, weight_threshold=1e-05)[source]
Remove components with weights below threshold and renormalize.
- Parameters:
components (list of GaussianComponent) – Input components.
weight_threshold (float) – Components with weight below this are removed.
- Returns:
pruned – Pruned and renormalized components.
- Return type:
Examples
>>> comps = [ ... GaussianComponent(0.9, np.array([0.]), np.array([[1.]])), ... GaussianComponent(1e-6, np.array([10.]), np.array([[1.]])), ... ] >>> pruned = prune_mixture(comps, weight_threshold=1e-5) >>> len(pruned) 1 >>> pruned[0].weight # Renormalized 1.0
- pytcl.clustering.reduce_mixture_runnalls(components, max_components, weight_threshold=1e-05)[source]
Reduce mixture using Runnalls’ greedy algorithm.
Iteratively merges the pair of components with the smallest KL-divergence merge cost until the target number is reached.
- Parameters:
components (list of GaussianComponent) – Input mixture components.
max_components (int) – Maximum number of components in output.
weight_threshold (float) – Components below this weight are pruned first.
- Returns:
result – Reduced mixture with cost information.
- Return type:
References
Examples
>>> import numpy as np >>> # 4 components, reduce to 2 >>> comps = [ ... GaussianComponent(0.25, np.array([0., 0.]), np.eye(2) * 0.1), ... GaussianComponent(0.25, np.array([0.1, 0.]), np.eye(2) * 0.1), ... GaussianComponent(0.25, np.array([5., 5.]), np.eye(2) * 0.1), ... GaussianComponent(0.25, np.array([5.1, 5.]), np.eye(2) * 0.1), ... ] >>> result = reduce_mixture_runnalls(comps, max_components=2) >>> len(result.components) 2
- pytcl.clustering.west_merge_cost(c1, c2)[source]
Compute West’s merge cost for two components.
West’s algorithm uses a simpler cost based on weighted Mahalanobis distance between component means.
- Parameters:
c1 (GaussianComponent) – First component.
c2 (GaussianComponent) – Second component.
- Returns:
cost – Merge cost based on weighted mean separation.
- Return type:
Examples
>>> c1 = GaussianComponent(0.3, np.array([0., 0.]), np.eye(2) * 0.1) >>> c2 = GaussianComponent(0.2, np.array([0.1, 0.]), np.eye(2) * 0.1) # Close >>> c3 = GaussianComponent(0.2, np.array([5., 5.]), np.eye(2) * 0.1) # Far >>> cost_close = west_merge_cost(c1, c2) >>> cost_far = west_merge_cost(c1, c3) >>> cost_close < cost_far # Closer components have lower merge cost True
- pytcl.clustering.reduce_mixture_west(components, max_components, weight_threshold=1e-05)[source]
Reduce mixture using West’s algorithm.
Similar to Runnalls’ but uses a simpler cost function based on weighted Mahalanobis distance between means.
- Parameters:
components (list of GaussianComponent) – Input mixture components.
max_components (int) – Maximum number of components in output.
weight_threshold (float) – Components below this weight are pruned first.
- Returns:
result – Reduced mixture with cost information.
- Return type:
References
[1] M. West, “Approximating posterior distributions by mixture,” Journal of the Royal Statistical Society, Series B, 1993.
Examples
>>> import numpy as np >>> comps = [ ... GaussianComponent(0.25, np.array([0., 0.]), np.eye(2) * 0.1), ... GaussianComponent(0.25, np.array([0.1, 0.]), np.eye(2) * 0.1), ... GaussianComponent(0.25, np.array([5., 5.]), np.eye(2) * 0.1), ... GaussianComponent(0.25, np.array([5.1, 5.]), np.eye(2) * 0.1), ... ] >>> result = reduce_mixture_west(comps, max_components=2) >>> len(result.components) 2
- class pytcl.clustering.KMeansResult(labels, centers, inertia, n_iter, converged)[source]
Bases:
NamedTupleResult of K-means clustering.
- labels
Cluster assignment for each point, shape (n_samples,).
- Type:
ndarray
- centers
Cluster centroids, shape (n_clusters, n_features).
- Type:
ndarray
- pytcl.clustering.kmeans_plusplus_init(X, n_clusters, rng=None)[source]
K-means++ initialization for cluster centers.
Selects initial centers by sampling proportional to squared distance from nearest existing center, providing better initialization than random selection.
- Parameters:
X (array_like) – Data points, shape (n_samples, n_features).
n_clusters (int) – Number of clusters.
rng (numpy.random.Generator, optional) – Random number generator.
- Returns:
centers – Initial cluster centers, shape (n_clusters, n_features).
- Return type:
ndarray
Examples
>>> import numpy as np >>> X = np.array([[0, 0], [1, 0], [0, 1], [10, 10], [11, 10], [10, 11]]) >>> centers = kmeans_plusplus_init(X, n_clusters=2, rng=np.random.default_rng(42)) >>> centers.shape (2, 2)
- pytcl.clustering.assign_clusters(X, centers)[source]
Assign each point to its nearest cluster center.
- Parameters:
X (array_like) – Data points, shape (n_samples, n_features).
centers (array_like) – Cluster centers, shape (n_clusters, n_features).
- Returns:
labels (ndarray) – Cluster assignment for each point, shape (n_samples,).
inertia (float) – Sum of squared distances to assigned centers.
- Return type:
Examples
>>> X = np.array([[0, 0], [1, 0], [10, 10], [11, 10]]) >>> centers = np.array([[0.5, 0], [10.5, 10]]) >>> labels, inertia = assign_clusters(X, centers) >>> labels array([0, 0, 1, 1])
- pytcl.clustering.update_centers(X, labels, n_clusters)[source]
Update cluster centers as mean of assigned points.
- Parameters:
X (array_like) – Data points, shape (n_samples, n_features).
labels (array_like) – Cluster assignments, shape (n_samples,).
n_clusters (int) – Number of clusters.
- Returns:
centers – Updated cluster centers, shape (n_clusters, n_features). Empty clusters retain their previous position (zeros).
- Return type:
ndarray
Examples
>>> X = np.array([[0, 0], [1, 0], [10, 10], [11, 10]]) >>> labels = np.array([0, 0, 1, 1]) >>> centers = update_centers(X, labels, n_clusters=2) >>> centers array([[ 0.5, 0. ], [10.5, 10. ]])
- pytcl.clustering.kmeans(X, n_clusters, init='kmeans++', max_iter=300, tol=0.0001, n_init=10, rng=None)[source]
K-means clustering algorithm.
Partitions data into k clusters by minimizing within-cluster sum of squared distances.
- Parameters:
X (array_like) – Data points, shape (n_samples, n_features).
n_clusters (int) – Number of clusters.
init ({'kmeans++', 'random'} or array_like) – Initialization method or explicit initial centers.
max_iter (int) – Maximum number of iterations.
tol (float) – Convergence tolerance. Algorithm stops when center movement is below this threshold.
n_init (int) – Number of random initializations. Best result is returned.
rng (numpy.random.Generator, optional) – Random number generator.
- Returns:
result – Clustering result with labels, centers, and diagnostics.
- Return type:
Examples
>>> import numpy as np >>> # Two well-separated clusters >>> rng = np.random.default_rng(42) >>> X1 = rng.normal(0, 0.5, (50, 2)) >>> X2 = rng.normal(5, 0.5, (50, 2)) >>> X = np.vstack([X1, X2]) >>> result = kmeans(X, n_clusters=2, rng=rng) >>> result.converged True >>> len(np.unique(result.labels)) 2
- pytcl.clustering.kmeans_elbow(X, k_range=None, **kwargs)[source]
Compute K-means for a range of k values for elbow method.
- Parameters:
- Returns:
results – Dictionary with keys ‘k_values’ and ‘inertias’.
- Return type:
Examples
>>> import numpy as np >>> rng = np.random.default_rng(42) >>> X = np.vstack([rng.normal(0, 1, (50, 2)), rng.normal(5, 1, (50, 2))]) >>> results = kmeans_elbow(X, k_range=range(1, 6), rng=rng) >>> len(results['inertias']) 5
- class pytcl.clustering.DBSCANResult(labels, n_clusters, core_sample_indices, n_noise)[source]
Bases:
NamedTupleResult of DBSCAN clustering.
- labels
Cluster labels for each point, shape (n_samples,). -1 indicates noise points.
- Type:
ndarray
- core_sample_indices
Indices of core samples.
- Type:
ndarray
- pytcl.clustering.compute_neighbors(X, eps)[source]
Compute neighbors within eps distance for all points.
- Parameters:
X (ndarray) – Data points, shape (n_samples, n_features).
eps (float) – Maximum distance between neighbors.
- Returns:
neighbors – neighbors[i] contains indices of points within eps of point i.
- Return type:
list of ndarray
Examples
>>> X = np.array([[0.0, 0.0], [0.5, 0.0], [3.0, 0.0]]) >>> neighbors = compute_neighbors(X, eps=1.0) >>> 0 in neighbors[1] and 1 in neighbors[0] # Points 0 and 1 are neighbors True >>> 2 in neighbors[0] # Point 2 is far from point 0 False
- pytcl.clustering.dbscan(X, eps=0.5, min_samples=5)[source]
DBSCAN clustering algorithm.
Finds core samples of high density and expands clusters from them. Points that are in low-density regions are marked as noise.
- Parameters:
- Returns:
result – Clustering result with labels and diagnostics.
- Return type:
Examples
>>> import numpy as np >>> # Two dense clusters with noise >>> rng = np.random.default_rng(42) >>> cluster1 = rng.normal(0, 0.3, (30, 2)) >>> cluster2 = rng.normal(3, 0.3, (30, 2)) >>> noise = rng.uniform(-2, 5, (5, 2)) >>> X = np.vstack([cluster1, cluster2, noise]) >>> result = dbscan(X, eps=0.5, min_samples=5) >>> result.n_clusters 2
Notes
A point is a core point if it has at least min_samples points within distance eps (including itself). A point is reachable from a core point if it is within eps distance. Noise points are not reachable from any core point.
- pytcl.clustering.dbscan_predict(X_new, X_train, labels_train, eps)[source]
Predict cluster labels for new points based on trained DBSCAN.
Assigns new points to the cluster of the nearest core point within eps distance, or -1 if no core point is within range.
- Parameters:
X_new (array_like) – New data points, shape (n_new, n_features).
X_train (array_like) – Training data points, shape (n_train, n_features).
labels_train (array_like) – Cluster labels from training, shape (n_train,).
eps (float) – Maximum distance threshold.
- Returns:
labels – Predicted cluster labels, shape (n_new,). -1 indicates no cluster assignment.
- Return type:
ndarray
Examples
>>> # After running dbscan on X_train >>> X_new = np.array([[0.1, 0.1], [10.0, 10.0]]) >>> labels_new = dbscan_predict(X_new, X_train, result.labels, eps=0.5)
- class pytcl.clustering.LinkageType(value)[source]
Bases:
EnumLinkage methods for hierarchical clustering.
- SINGLE = 'single'
- COMPLETE = 'complete'
- AVERAGE = 'average'
- WARD = 'ward'
- class pytcl.clustering.DendrogramNode(left, right, distance, count)[source]
Bases:
NamedTupleA node in the dendrogram (merge tree).
- class pytcl.clustering.HierarchicalResult(labels, n_clusters, linkage_matrix, dendrogram)[source]
Bases:
NamedTupleResult of hierarchical clustering.
- labels
Cluster labels for each point, shape (n_samples,).
- Type:
ndarray
- linkage_matrix
Linkage matrix of shape (n_samples-1, 4). Each row [i, j, dist, count] represents a merge.
- Type:
ndarray
- dendrogram
List of merge operations.
- Type:
- dendrogram: List[DendrogramNode]
Alias for field number 3
- pytcl.clustering.compute_distance_matrix(X)[source]
Compute pairwise Euclidean distance matrix.
- Parameters:
X (ndarray) – Data points, shape (n_samples, n_features).
- Returns:
distances – Distance matrix, shape (n_samples, n_samples).
- Return type:
ndarray
Examples
>>> X = np.array([[0.0, 0.0], [1.0, 0.0], [0.0, 1.0]]) >>> D = compute_distance_matrix(X) >>> D.shape (3, 3) >>> D[0, 1] # Distance between points 0 and 1 1.0
- pytcl.clustering.agglomerative_clustering(X, n_clusters=None, distance_threshold=None, linkage='ward')[source]
Agglomerative hierarchical clustering.
Recursively merges pairs of clusters that minimize the linkage criterion until the desired number of clusters is reached.
- Parameters:
X (array_like) – Data points, shape (n_samples, n_features).
n_clusters (int, optional) – Number of clusters to find. If None, uses distance_threshold.
distance_threshold (float, optional) – Distance threshold for cluster merging. Merging stops when the minimum linkage distance exceeds this. If None, uses n_clusters.
linkage ({'single', 'complete', 'average', 'ward'}) – Linkage criterion. Default ‘ward’.
- Returns:
result – Clustering result with labels and dendrogram.
- Return type:
Examples
>>> import numpy as np >>> rng = np.random.default_rng(42) >>> X = np.vstack([rng.normal(0, 0.5, (20, 2)), ... rng.normal(3, 0.5, (20, 2))]) >>> result = agglomerative_clustering(X, n_clusters=2) >>> result.n_clusters 2
Notes
Linkage methods: - ‘single’: Distance between closest points (can create chains) - ‘complete’: Distance between farthest points (tends to create compact clusters) - ‘average’: Average distance between all pairs - ‘ward’: Minimizes within-cluster variance (requires Euclidean distance)
- pytcl.clustering.cut_dendrogram(linkage_matrix, n_samples, n_clusters=None, distance_threshold=None)[source]
Cut a dendrogram to obtain cluster labels.
- Parameters:
- Returns:
labels – Cluster labels, shape (n_samples,).
- Return type:
ndarray
Examples
>>> import numpy as np >>> rng = np.random.default_rng(42) >>> X = np.vstack([rng.normal(0, 0.5, (10, 2)), rng.normal(3, 0.5, (10, 2))]) >>> result = agglomerative_clustering(X) >>> labels = cut_dendrogram(result.linkage_matrix, n_samples=20, n_clusters=2) >>> len(np.unique(labels)) 2
- pytcl.clustering.fcluster(linkage_matrix, n_samples, t, criterion='distance')[source]
Form flat clusters from hierarchical clustering.
Compatible interface with scipy.cluster.hierarchy.fcluster.
- Parameters:
- Returns:
labels – Cluster labels (1-indexed for scipy compatibility).
- Return type:
ndarray
Examples
>>> import numpy as np >>> rng = np.random.default_rng(42) >>> X = np.vstack([rng.normal(0, 0.5, (10, 2)), rng.normal(3, 0.5, (10, 2))]) >>> result = agglomerative_clustering(X) >>> labels = fcluster(result.linkage_matrix, n_samples=20, t=2, criterion='maxclust') >>> labels.min() # 1-indexed 1
K-Means
K-means clustering algorithm.
K-means clustering for tracking applications.
This module provides K-means clustering with K-means++ initialization, commonly used in track clustering and data association scenarios.
References
D. Arthur and S. Vassilvitskii, “k-means++: The Advantages of Careful Seeding,” SODA 2007.
- class pytcl.clustering.kmeans.KMeansResult(labels, centers, inertia, n_iter, converged)[source]
Bases:
NamedTupleResult of K-means clustering.
- labels
Cluster assignment for each point, shape (n_samples,).
- Type:
ndarray
- centers
Cluster centroids, shape (n_clusters, n_features).
- Type:
ndarray
- pytcl.clustering.kmeans.kmeans_plusplus_init(X, n_clusters, rng=None)[source]
K-means++ initialization for cluster centers.
Selects initial centers by sampling proportional to squared distance from nearest existing center, providing better initialization than random selection.
- Parameters:
X (array_like) – Data points, shape (n_samples, n_features).
n_clusters (int) – Number of clusters.
rng (numpy.random.Generator, optional) – Random number generator.
- Returns:
centers – Initial cluster centers, shape (n_clusters, n_features).
- Return type:
ndarray
Examples
>>> import numpy as np >>> X = np.array([[0, 0], [1, 0], [0, 1], [10, 10], [11, 10], [10, 11]]) >>> centers = kmeans_plusplus_init(X, n_clusters=2, rng=np.random.default_rng(42)) >>> centers.shape (2, 2)
- pytcl.clustering.kmeans.assign_clusters(X, centers)[source]
Assign each point to its nearest cluster center.
- Parameters:
X (array_like) – Data points, shape (n_samples, n_features).
centers (array_like) – Cluster centers, shape (n_clusters, n_features).
- Returns:
labels (ndarray) – Cluster assignment for each point, shape (n_samples,).
inertia (float) – Sum of squared distances to assigned centers.
- Return type:
Examples
>>> X = np.array([[0, 0], [1, 0], [10, 10], [11, 10]]) >>> centers = np.array([[0.5, 0], [10.5, 10]]) >>> labels, inertia = assign_clusters(X, centers) >>> labels array([0, 0, 1, 1])
- pytcl.clustering.kmeans.update_centers(X, labels, n_clusters)[source]
Update cluster centers as mean of assigned points.
- Parameters:
X (array_like) – Data points, shape (n_samples, n_features).
labels (array_like) – Cluster assignments, shape (n_samples,).
n_clusters (int) – Number of clusters.
- Returns:
centers – Updated cluster centers, shape (n_clusters, n_features). Empty clusters retain their previous position (zeros).
- Return type:
ndarray
Examples
>>> X = np.array([[0, 0], [1, 0], [10, 10], [11, 10]]) >>> labels = np.array([0, 0, 1, 1]) >>> centers = update_centers(X, labels, n_clusters=2) >>> centers array([[ 0.5, 0. ], [10.5, 10. ]])
- pytcl.clustering.kmeans.kmeans(X, n_clusters, init='kmeans++', max_iter=300, tol=0.0001, n_init=10, rng=None)[source]
K-means clustering algorithm.
Partitions data into k clusters by minimizing within-cluster sum of squared distances.
- Parameters:
X (array_like) – Data points, shape (n_samples, n_features).
n_clusters (int) – Number of clusters.
init ({'kmeans++', 'random'} or array_like) – Initialization method or explicit initial centers.
max_iter (int) – Maximum number of iterations.
tol (float) – Convergence tolerance. Algorithm stops when center movement is below this threshold.
n_init (int) – Number of random initializations. Best result is returned.
rng (numpy.random.Generator, optional) – Random number generator.
- Returns:
result – Clustering result with labels, centers, and diagnostics.
- Return type:
Examples
>>> import numpy as np >>> # Two well-separated clusters >>> rng = np.random.default_rng(42) >>> X1 = rng.normal(0, 0.5, (50, 2)) >>> X2 = rng.normal(5, 0.5, (50, 2)) >>> X = np.vstack([X1, X2]) >>> result = kmeans(X, n_clusters=2, rng=rng) >>> result.converged True >>> len(np.unique(result.labels)) 2
- pytcl.clustering.kmeans.kmeans_elbow(X, k_range=None, **kwargs)[source]
Compute K-means for a range of k values for elbow method.
- Parameters:
- Returns:
results – Dictionary with keys ‘k_values’ and ‘inertias’.
- Return type:
Examples
>>> import numpy as np >>> rng = np.random.default_rng(42) >>> X = np.vstack([rng.normal(0, 1, (50, 2)), rng.normal(5, 1, (50, 2))]) >>> results = kmeans_elbow(X, k_range=range(1, 6), rng=rng) >>> len(results['inertias']) 5
DBSCAN
Density-based spatial clustering (DBSCAN).
DBSCAN (Density-Based Spatial Clustering of Applications with Noise).
DBSCAN is a density-based clustering algorithm that groups points that are closely packed together, marking points in low-density regions as outliers.
References
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” KDD 1996.
- class pytcl.clustering.dbscan.DBSCANResult(labels, n_clusters, core_sample_indices, n_noise)[source]
Bases:
NamedTupleResult of DBSCAN clustering.
- labels
Cluster labels for each point, shape (n_samples,). -1 indicates noise points.
- Type:
ndarray
- core_sample_indices
Indices of core samples.
- Type:
ndarray
- pytcl.clustering.dbscan.compute_neighbors(X, eps)[source]
Compute neighbors within eps distance for all points.
- Parameters:
X (ndarray) – Data points, shape (n_samples, n_features).
eps (float) – Maximum distance between neighbors.
- Returns:
neighbors – neighbors[i] contains indices of points within eps of point i.
- Return type:
list of ndarray
Examples
>>> X = np.array([[0.0, 0.0], [0.5, 0.0], [3.0, 0.0]]) >>> neighbors = compute_neighbors(X, eps=1.0) >>> 0 in neighbors[1] and 1 in neighbors[0] # Points 0 and 1 are neighbors True >>> 2 in neighbors[0] # Point 2 is far from point 0 False
- pytcl.clustering.dbscan.dbscan(X, eps=0.5, min_samples=5)[source]
DBSCAN clustering algorithm.
Finds core samples of high density and expands clusters from them. Points that are in low-density regions are marked as noise.
- Parameters:
- Returns:
result – Clustering result with labels and diagnostics.
- Return type:
Examples
>>> import numpy as np >>> # Two dense clusters with noise >>> rng = np.random.default_rng(42) >>> cluster1 = rng.normal(0, 0.3, (30, 2)) >>> cluster2 = rng.normal(3, 0.3, (30, 2)) >>> noise = rng.uniform(-2, 5, (5, 2)) >>> X = np.vstack([cluster1, cluster2, noise]) >>> result = dbscan(X, eps=0.5, min_samples=5) >>> result.n_clusters 2
Notes
A point is a core point if it has at least min_samples points within distance eps (including itself). A point is reachable from a core point if it is within eps distance. Noise points are not reachable from any core point.
- pytcl.clustering.dbscan.dbscan_predict(X_new, X_train, labels_train, eps)[source]
Predict cluster labels for new points based on trained DBSCAN.
Assigns new points to the cluster of the nearest core point within eps distance, or -1 if no core point is within range.
- Parameters:
X_new (array_like) – New data points, shape (n_new, n_features).
X_train (array_like) – Training data points, shape (n_train, n_features).
labels_train (array_like) – Cluster labels from training, shape (n_train,).
eps (float) – Maximum distance threshold.
- Returns:
labels – Predicted cluster labels, shape (n_new,). -1 indicates no cluster assignment.
- Return type:
ndarray
Examples
>>> # After running dbscan on X_train >>> X_new = np.array([[0.1, 0.1], [10.0, 10.0]]) >>> labels_new = dbscan_predict(X_new, X_train, result.labels, eps=0.5)
Hierarchical Clustering
Agglomerative hierarchical clustering algorithms.
Hierarchical (Agglomerative) Clustering.
Hierarchical clustering builds a tree of clusters by iteratively merging the closest pairs. This is useful for track fusion and understanding cluster relationships.
References
S. C. Johnson, “Hierarchical clustering schemes,” Psychometrika, 1967.
- class pytcl.clustering.hierarchical.LinkageType(value)[source]
Bases:
EnumLinkage methods for hierarchical clustering.
- SINGLE = 'single'
- COMPLETE = 'complete'
- AVERAGE = 'average'
- WARD = 'ward'
- class pytcl.clustering.hierarchical.DendrogramNode(left, right, distance, count)[source]
Bases:
NamedTupleA node in the dendrogram (merge tree).
- class pytcl.clustering.hierarchical.HierarchicalResult(labels, n_clusters, linkage_matrix, dendrogram)[source]
Bases:
NamedTupleResult of hierarchical clustering.
- labels
Cluster labels for each point, shape (n_samples,).
- Type:
ndarray
- linkage_matrix
Linkage matrix of shape (n_samples-1, 4). Each row [i, j, dist, count] represents a merge.
- Type:
ndarray
- dendrogram
List of merge operations.
- Type:
- dendrogram: List[DendrogramNode]
Alias for field number 3
- pytcl.clustering.hierarchical.compute_distance_matrix(X)[source]
Compute pairwise Euclidean distance matrix.
- Parameters:
X (ndarray) – Data points, shape (n_samples, n_features).
- Returns:
distances – Distance matrix, shape (n_samples, n_samples).
- Return type:
ndarray
Examples
>>> X = np.array([[0.0, 0.0], [1.0, 0.0], [0.0, 1.0]]) >>> D = compute_distance_matrix(X) >>> D.shape (3, 3) >>> D[0, 1] # Distance between points 0 and 1 1.0
- pytcl.clustering.hierarchical.agglomerative_clustering(X, n_clusters=None, distance_threshold=None, linkage='ward')[source]
Agglomerative hierarchical clustering.
Recursively merges pairs of clusters that minimize the linkage criterion until the desired number of clusters is reached.
- Parameters:
X (array_like) – Data points, shape (n_samples, n_features).
n_clusters (int, optional) – Number of clusters to find. If None, uses distance_threshold.
distance_threshold (float, optional) – Distance threshold for cluster merging. Merging stops when the minimum linkage distance exceeds this. If None, uses n_clusters.
linkage ({'single', 'complete', 'average', 'ward'}) – Linkage criterion. Default ‘ward’.
- Returns:
result – Clustering result with labels and dendrogram.
- Return type:
Examples
>>> import numpy as np >>> rng = np.random.default_rng(42) >>> X = np.vstack([rng.normal(0, 0.5, (20, 2)), ... rng.normal(3, 0.5, (20, 2))]) >>> result = agglomerative_clustering(X, n_clusters=2) >>> result.n_clusters 2
Notes
Linkage methods: - ‘single’: Distance between closest points (can create chains) - ‘complete’: Distance between farthest points (tends to create compact clusters) - ‘average’: Average distance between all pairs - ‘ward’: Minimizes within-cluster variance (requires Euclidean distance)
- pytcl.clustering.hierarchical.cut_dendrogram(linkage_matrix, n_samples, n_clusters=None, distance_threshold=None)[source]
Cut a dendrogram to obtain cluster labels.
- Parameters:
- Returns:
labels – Cluster labels, shape (n_samples,).
- Return type:
ndarray
Examples
>>> import numpy as np >>> rng = np.random.default_rng(42) >>> X = np.vstack([rng.normal(0, 0.5, (10, 2)), rng.normal(3, 0.5, (10, 2))]) >>> result = agglomerative_clustering(X) >>> labels = cut_dendrogram(result.linkage_matrix, n_samples=20, n_clusters=2) >>> len(np.unique(labels)) 2
- pytcl.clustering.hierarchical.fcluster(linkage_matrix, n_samples, t, criterion='distance')[source]
Form flat clusters from hierarchical clustering.
Compatible interface with scipy.cluster.hierarchy.fcluster.
- Parameters:
- Returns:
labels – Cluster labels (1-indexed for scipy compatibility).
- Return type:
ndarray
Examples
>>> import numpy as np >>> rng = np.random.default_rng(42) >>> X = np.vstack([rng.normal(0, 0.5, (10, 2)), rng.normal(3, 0.5, (10, 2))]) >>> result = agglomerative_clustering(X) >>> labels = fcluster(result.linkage_matrix, n_samples=20, t=2, criterion='maxclust') >>> labels.min() # 1-indexed 1
Gaussian Mixtures
Gaussian mixture models and moment matching.
Gaussian Mixture operations for tracking applications.
This module provides Gaussian mixture representation, moment matching, and mixture reduction algorithms (Runnalls’, West’s) commonly used in multi-target tracking for hypothesis pruning.
References
A. R. Runnalls, “Kullback-Leibler Approach to Gaussian Mixture Reduction,” IEEE Trans. Aerospace and Electronic Systems, vol. 43, no. 3, 2007.
M. West, “Approximating posterior distributions by mixture,” Journal of the Royal Statistical Society, Series B, vol. 55, no. 2, 1993.
- class pytcl.clustering.gaussian_mixture.GaussianComponent(weight, mean, covariance)[source]
Bases:
NamedTupleA single Gaussian component in a mixture.
- mean
Mean vector, shape (n,).
- Type:
ndarray
- covariance
Covariance matrix, shape (n, n).
- Type:
ndarray
- class pytcl.clustering.gaussian_mixture.MergeResult(component, cost)[source]
Bases:
NamedTupleResult of merging two Gaussian components.
- component
The merged Gaussian component.
- Type:
- component: GaussianComponent
Alias for field number 0
- class pytcl.clustering.gaussian_mixture.ReductionResult(components, n_original, n_reduced, total_cost)[source]
Bases:
NamedTupleResult of mixture reduction.
- components
Reduced set of components.
- Type:
- components: List[GaussianComponent]
Alias for field number 0
- pytcl.clustering.gaussian_mixture.moment_match(weights, means, covariances)[source]
Compute moment-matched mean and covariance from mixture components.
Given a Gaussian mixture, computes a single Gaussian that matches the first two moments (mean and covariance) of the mixture.
- Parameters:
- Returns:
mean (ndarray) – Moment-matched mean, shape (n,).
covariance (ndarray) – Moment-matched covariance, shape (n, n).
- Return type:
Tuple[ndarray[tuple[Any, …], dtype[floating]], ndarray[tuple[Any, …], dtype[floating]]]
Examples
>>> import numpy as np >>> weights = [0.5, 0.5] >>> means = [np.array([0., 0.]), np.array([2., 0.])] >>> covs = [np.eye(2) * 0.1, np.eye(2) * 0.1] >>> m, P = moment_match(weights, means, covs) >>> m # Should be [1., 0.] array([1., 0.])
- pytcl.clustering.gaussian_mixture.runnalls_merge_cost(c1, c2)[source]
Compute Runnalls’ KL-divergence-based merge cost for two components.
The cost approximates the increase in KL-divergence when merging components c1 and c2 into a single moment-matched Gaussian.
- Parameters:
c1 (GaussianComponent) – First component.
c2 (GaussianComponent) – Second component.
- Returns:
cost – Merge cost (non-negative). Lower cost means components are more similar and merging causes less information loss.
- Return type:
Notes
- The cost function is:
cost = 0.5 * [w_m * log|P_m| - w_1 * log|P_1| - w_2 * log|P_2|]
where w_m = w_1 + w_2, P_m is the moment-matched covariance.
Examples
>>> c1 = GaussianComponent(0.3, np.array([0., 0.]), np.eye(2) * 0.1) >>> c2 = GaussianComponent(0.2, np.array([0.1, 0.]), np.eye(2) * 0.1) # Close >>> c3 = GaussianComponent(0.2, np.array([5., 5.]), np.eye(2) * 0.1) # Far >>> cost_close = runnalls_merge_cost(c1, c2) >>> cost_far = runnalls_merge_cost(c1, c3) >>> cost_close < cost_far # Closer components have lower merge cost True
- pytcl.clustering.gaussian_mixture.merge_gaussians(c1, c2)[source]
Merge two Gaussian components via moment matching.
- Parameters:
c1 (GaussianComponent) – First component.
c2 (GaussianComponent) – Second component.
- Returns:
result – Merged component and merge cost.
- Return type:
Examples
>>> c1 = GaussianComponent(0.3, np.array([0., 0.]), np.eye(2) * 0.1) >>> c2 = GaussianComponent(0.2, np.array([1., 0.]), np.eye(2) * 0.1) >>> result = merge_gaussians(c1, c2) >>> result.component.weight 0.5
- pytcl.clustering.gaussian_mixture.prune_mixture(components, weight_threshold=1e-05)[source]
Remove components with weights below threshold and renormalize.
- Parameters:
components (list of GaussianComponent) – Input components.
weight_threshold (float) – Components with weight below this are removed.
- Returns:
pruned – Pruned and renormalized components.
- Return type:
Examples
>>> comps = [ ... GaussianComponent(0.9, np.array([0.]), np.array([[1.]])), ... GaussianComponent(1e-6, np.array([10.]), np.array([[1.]])), ... ] >>> pruned = prune_mixture(comps, weight_threshold=1e-5) >>> len(pruned) 1 >>> pruned[0].weight # Renormalized 1.0
- pytcl.clustering.gaussian_mixture.reduce_mixture_runnalls(components, max_components, weight_threshold=1e-05)[source]
Reduce mixture using Runnalls’ greedy algorithm.
Iteratively merges the pair of components with the smallest KL-divergence merge cost until the target number is reached.
- Parameters:
components (list of GaussianComponent) – Input mixture components.
max_components (int) – Maximum number of components in output.
weight_threshold (float) – Components below this weight are pruned first.
- Returns:
result – Reduced mixture with cost information.
- Return type:
References
[1] A. R. Runnalls, “Kullback-Leibler Approach to Gaussian Mixture Reduction,” IEEE Trans. Aerospace and Electronic Systems, 2007.
Examples
>>> import numpy as np >>> # 4 components, reduce to 2 >>> comps = [ ... GaussianComponent(0.25, np.array([0., 0.]), np.eye(2) * 0.1), ... GaussianComponent(0.25, np.array([0.1, 0.]), np.eye(2) * 0.1), ... GaussianComponent(0.25, np.array([5., 5.]), np.eye(2) * 0.1), ... GaussianComponent(0.25, np.array([5.1, 5.]), np.eye(2) * 0.1), ... ] >>> result = reduce_mixture_runnalls(comps, max_components=2) >>> len(result.components) 2
- pytcl.clustering.gaussian_mixture.west_merge_cost(c1, c2)[source]
Compute West’s merge cost for two components.
West’s algorithm uses a simpler cost based on weighted Mahalanobis distance between component means.
- Parameters:
c1 (GaussianComponent) – First component.
c2 (GaussianComponent) – Second component.
- Returns:
cost – Merge cost based on weighted mean separation.
- Return type:
Examples
>>> c1 = GaussianComponent(0.3, np.array([0., 0.]), np.eye(2) * 0.1) >>> c2 = GaussianComponent(0.2, np.array([0.1, 0.]), np.eye(2) * 0.1) # Close >>> c3 = GaussianComponent(0.2, np.array([5., 5.]), np.eye(2) * 0.1) # Far >>> cost_close = west_merge_cost(c1, c2) >>> cost_far = west_merge_cost(c1, c3) >>> cost_close < cost_far # Closer components have lower merge cost True
- pytcl.clustering.gaussian_mixture.reduce_mixture_west(components, max_components, weight_threshold=1e-05)[source]
Reduce mixture using West’s algorithm.
Similar to Runnalls’ but uses a simpler cost function based on weighted Mahalanobis distance between means.
- Parameters:
components (list of GaussianComponent) – Input mixture components.
max_components (int) – Maximum number of components in output.
weight_threshold (float) – Components below this weight are pruned first.
- Returns:
result – Reduced mixture with cost information.
- Return type:
References
[1] M. West, “Approximating posterior distributions by mixture,” Journal of the Royal Statistical Society, Series B, 1993.
Examples
>>> import numpy as np >>> comps = [ ... GaussianComponent(0.25, np.array([0., 0.]), np.eye(2) * 0.1), ... GaussianComponent(0.25, np.array([0.1, 0.]), np.eye(2) * 0.1), ... GaussianComponent(0.25, np.array([5., 5.]), np.eye(2) * 0.1), ... GaussianComponent(0.25, np.array([5.1, 5.]), np.eye(2) * 0.1), ... ] >>> result = reduce_mixture_west(comps, max_components=2) >>> len(result.components) 2
- class pytcl.clustering.gaussian_mixture.GaussianMixture(components=None)[source]
Bases:
objectGaussian Mixture representation with reduction operations.
Provides an object-oriented interface for Gaussian mixture operations including density evaluation, sampling, and reduction.
- Parameters:
components (list of GaussianComponent, optional) – Initial components. If None, creates empty mixture.
- components
Current mixture components.
- Type:
Examples
>>> import numpy as np >>> gm = GaussianMixture() >>> gm.add_component(0.5, np.array([0., 0.]), np.eye(2) * 0.1) >>> gm.add_component(0.5, np.array([2., 2.]), np.eye(2) * 0.1) >>> len(gm) 2 >>> gm.mean # Mixture mean array([1., 1.])
- components: List[GaussianComponent]
- add_component(weight, mean, covariance)[source]
Add a component to the mixture.
- Parameters:
weight (float) – Component weight.
mean (array_like) – Mean vector.
covariance (array_like) – Covariance matrix.
- property covariances: List[ndarray[tuple[Any, ...], dtype[floating]]]
List of component covariances.
- property covariance: ndarray[tuple[Any, ...], dtype[floating]]
Mixture covariance (moment-matched).
- pdf(x)[source]
Evaluate mixture probability density at x.
- Parameters:
x (array_like) – Point at which to evaluate density.
- Returns:
density – Mixture probability density.
- Return type:
- sample(n_samples=1, rng=None)[source]
Draw samples from the mixture.
- Parameters:
n_samples (int) – Number of samples to draw.
rng (numpy.random.Generator, optional) – Random number generator.
- Returns:
samples – Samples, shape (n_samples, n) or (n,) if n_samples=1.
- Return type:
ndarray
- prune(weight_threshold=1e-05)[source]
Remove low-weight components.
- Parameters:
weight_threshold (float) – Minimum weight to retain.
- Returns:
pruned – New mixture with low-weight components removed.
- Return type:
- reduce_runnalls(max_components)[source]
Reduce mixture using Runnalls’ algorithm.
- Parameters:
max_components (int) – Maximum number of components.
- Returns:
reduced – Reduced mixture.
- Return type: