{ "cells": [ { "cell_type": "markdown", "id": "cell-0", "metadata": {}, "source": [ "# GPU Acceleration: CuPy Integration for Tracking\n", "\n", "This notebook demonstrates GPU acceleration techniques for tracking algorithms using CuPy. We cover:\n", "\n", "1. **CuPy Basics** - GPU array operations and NumPy compatibility\n", "2. **Accelerating Matrix Operations** - Core operations for Kalman filters\n", "3. **Batch Processing** - Processing multiple tracks in parallel\n", "4. **Particle Filter Acceleration** - GPU-based resampling and weight computation\n", "5. **Performance Comparison** - Benchmarking CPU vs GPU\n", "\n", "## Prerequisites\n", "\n", "```bash\n", "# GPU-enabled version\n", "pip install cupy-cuda12x # For CUDA 12.x\n", "# or pip install cupy-cuda11x for CUDA 11.x\n", "\n", "pip install nrl-tracker plotly numpy\n", "```\n", "\n", "**Note:** This notebook requires an NVIDIA GPU with CUDA support. If running without a GPU, code examples will demonstrate concepts with NumPy fallbacks." ] }, { "cell_type": "code", "execution_count": 16, "id": "cell-1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CuPy not available. Running in CPU-only mode.\n", "Install with: pip install cupy-cuda12x\n" ] } ], "source": [ "import numpy as np\n", "import plotly.graph_objects as go\n", "from plotly.subplots import make_subplots\n", "import time\n", "\n", "# Check for GPU availability\n", "try:\n", " import cupy as cp\n", " GPU_AVAILABLE = True\n", " print(f\"CuPy version: {cp.__version__}\")\n", " print(f\"CUDA version: {cp.cuda.runtime.runtimeGetVersion()}\")\n", " print(f\"GPU: {cp.cuda.Device().name}\")\n", " print(f\"GPU memory: {cp.cuda.Device().mem_info[1] / 1e9:.1f} GB\")\n", "except ImportError:\n", " GPU_AVAILABLE = False\n", " print(\"CuPy not available. Running in CPU-only mode.\")\n", " print(\"Install with: pip install cupy-cuda12x\")\n", "\n", "# Use appropriate array module\n", "xp = cp if GPU_AVAILABLE else np\n", "\n", "np.random.seed(42)\n", "\n", "# Plotly dark theme template\n", "dark_template = go.layout.Template()\n", "dark_template.layout = go.Layout(\n", " paper_bgcolor='#0d1117',\n", " plot_bgcolor='#0d1117',\n", " font=dict(color='#e6edf3'),\n", " xaxis=dict(gridcolor='#30363d', zerolinecolor='#30363d'),\n", " yaxis=dict(gridcolor='#30363d', zerolinecolor='#30363d'),\n", ")" ] }, { "cell_type": "markdown", "id": "cell-2", "metadata": {}, "source": [ "## 1. CuPy Basics\n", "\n", "CuPy provides a NumPy-compatible interface for GPU computing. Arrays and operations mirror NumPy exactly, making migration straightforward.\n", "\n", "### Key Concepts\n", "\n", "| NumPy | CuPy | Location |\n", "|-------|------|----------|\n", "| `numpy.array()` | `cupy.array()` | GPU |\n", "| `numpy.zeros()` | `cupy.zeros()` | GPU |\n", "| `numpy.linalg.inv()` | `cupy.linalg.inv()` | GPU |\n", "| Transfer to GPU | `cupy.asarray(np_array)` | CPU→GPU |\n", "| Transfer to CPU | `cupy.asnumpy(cp_array)` | GPU→CPU |" ] }, { "cell_type": "code", "execution_count": 17, "id": "cell-3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a + b = [6 6 6 6 6]\n", "a · b = 35\n" ] } ], "source": [ "# Basic array operations\n", "if GPU_AVAILABLE:\n", " # Create arrays on GPU\n", " a_gpu = cp.array([1, 2, 3, 4, 5])\n", " b_gpu = cp.array([5, 4, 3, 2, 1])\n", " \n", " # Operations happen on GPU\n", " c_gpu = a_gpu + b_gpu\n", " d_gpu = cp.dot(a_gpu, b_gpu)\n", " \n", " print(f\"a + b = {c_gpu}\")\n", " print(f\"a · b = {d_gpu}\")\n", " print(f\"Array type: {type(c_gpu)}\")\n", " \n", " # Transfer back to CPU when needed\n", " c_cpu = cp.asnumpy(c_gpu)\n", " print(f\"Transferred type: {type(c_cpu)}\")\n", "else:\n", " # NumPy fallback\n", " a_cpu = np.array([1, 2, 3, 4, 5])\n", " b_cpu = np.array([5, 4, 3, 2, 1])\n", " print(f\"a + b = {a_cpu + b_cpu}\")\n", " print(f\"a · b = {np.dot(a_cpu, b_cpu)}\")" ] }, { "cell_type": "code", "execution_count": 18, "id": "cell-4", "metadata": {}, "outputs": [], "source": [ "# Memory management\n", "if GPU_AVAILABLE:\n", " # Check memory before\n", " mempool = cp.get_default_memory_pool()\n", " print(f\"Memory used before: {mempool.used_bytes() / 1e6:.2f} MB\")\n", " \n", " # Create large array\n", " large_array = cp.random.randn(10000, 10000)\n", " print(f\"Memory after 10000x10000 float64: {mempool.used_bytes() / 1e6:.2f} MB\")\n", " \n", " # Free memory explicitly\n", " del large_array\n", " mempool.free_all_blocks()\n", " print(f\"Memory after free: {mempool.used_bytes() / 1e6:.2f} MB\")" ] }, { "cell_type": "markdown", "id": "cell-5", "metadata": {}, "source": [ "## 2. Accelerating Matrix Operations\n", "\n", "Kalman filters involve repeated matrix operations that benefit from GPU parallelization:\n", "\n", "- Matrix multiplication: $P \\cdot H^T$\n", "- Matrix inversion: $(HPH^T + R)^{-1}$\n", "- Cholesky decomposition: $P = LL^T$\n", "\n", "Let's implement core Kalman filter operations for both CPU and GPU." ] }, { "cell_type": "code", "execution_count": 19, "id": "cell-6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Kalman filter functions defined.\n" ] } ], "source": [ "def kalman_predict(x, P, F, Q, xp=np):\n", " \"\"\"\n", " Kalman filter prediction step.\n", " \n", " Parameters\n", " ----------\n", " x : array_like\n", " State estimate [n, 1].\n", " P : array_like\n", " State covariance [n, n].\n", " F : array_like\n", " State transition matrix [n, n].\n", " Q : array_like\n", " Process noise covariance [n, n].\n", " xp : module\n", " Array module (numpy or cupy).\n", " \n", " Returns\n", " -------\n", " x_pred : array\n", " Predicted state.\n", " P_pred : array\n", " Predicted covariance.\n", " \"\"\"\n", " x_pred = xp.dot(F, x)\n", " P_pred = xp.dot(xp.dot(F, P), F.T) + Q\n", " return x_pred, P_pred\n", "\n", "\n", "def kalman_update(x, P, z, H, R, xp=np):\n", " \"\"\"\n", " Kalman filter update step.\n", " \n", " Parameters\n", " ----------\n", " x : array_like\n", " Predicted state [n, 1].\n", " P : array_like\n", " Predicted covariance [n, n].\n", " z : array_like\n", " Measurement [m, 1].\n", " H : array_like\n", " Measurement matrix [m, n].\n", " R : array_like\n", " Measurement noise covariance [m, m].\n", " xp : module\n", " Array module (numpy or cupy).\n", " \n", " Returns\n", " -------\n", " x_upd : array\n", " Updated state.\n", " P_upd : array\n", " Updated covariance.\n", " \"\"\"\n", " # Innovation\n", " y = z - xp.dot(H, x)\n", " \n", " # Innovation covariance\n", " S = xp.dot(xp.dot(H, P), H.T) + R\n", " \n", " # Kalman gain\n", " K = xp.dot(xp.dot(P, H.T), xp.linalg.inv(S))\n", " \n", " # State update\n", " x_upd = x + xp.dot(K, y)\n", " \n", " # Covariance update (Joseph form for numerical stability)\n", " I = xp.eye(P.shape[0])\n", " IKH = I - xp.dot(K, H)\n", " P_upd = xp.dot(xp.dot(IKH, P), IKH.T) + xp.dot(xp.dot(K, R), K.T)\n", " \n", " return x_upd, P_upd\n", "\n", "\n", "print(\"Kalman filter functions defined.\")" ] }, { "cell_type": "code", "execution_count": 20, "id": "cell-7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "System matrices defined.\n", "State dimension: 4\n", "Measurement dimension: 2\n" ] } ], "source": [ "# Test Kalman filter on CPU and GPU\n", "state_dim = 4 # [x, vx, y, vy]\n", "meas_dim = 2 # [x, y]\n", "\n", "dt = 1.0\n", "\n", "# State transition (constant velocity)\n", "F = np.array([\n", " [1, dt, 0, 0],\n", " [0, 1, 0, 0],\n", " [0, 0, 1, dt],\n", " [0, 0, 0, 1]\n", "])\n", "\n", "# Measurement matrix (observe position only)\n", "H = np.array([\n", " [1, 0, 0, 0],\n", " [0, 0, 1, 0]\n", "])\n", "\n", "# Process noise\n", "q = 0.1\n", "Q = q * np.array([\n", " [dt**3/3, dt**2/2, 0, 0],\n", " [dt**2/2, dt, 0, 0],\n", " [0, 0, dt**3/3, dt**2/2],\n", " [0, 0, dt**2/2, dt]\n", "])\n", "\n", "# Measurement noise\n", "R = 10 * np.eye(meas_dim)\n", "\n", "# Initial state and covariance\n", "x = np.array([0, 1, 0, 1]).reshape(-1, 1)\n", "P = 100 * np.eye(state_dim)\n", "\n", "# Measurement\n", "z = np.array([5, 5]).reshape(-1, 1)\n", "\n", "print(\"System matrices defined.\")\n", "print(f\"State dimension: {state_dim}\")\n", "print(f\"Measurement dimension: {meas_dim}\")" ] }, { "cell_type": "code", "execution_count": 21, "id": "cell-8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU Results:\n", "Predicted state: [1. 1. 1. 1.]\n", "Updated state: [4.80955404 2.90541184 4.80955404 2.90541184]\n" ] } ], "source": [ "# Run on CPU\n", "x_pred_cpu, P_pred_cpu = kalman_predict(x, P, F, Q, xp=np)\n", "x_upd_cpu, P_upd_cpu = kalman_update(x_pred_cpu, P_pred_cpu, z, H, R, xp=np)\n", "\n", "print(\"CPU Results:\")\n", "print(f\"Predicted state: {x_pred_cpu.flatten()}\")\n", "print(f\"Updated state: {x_upd_cpu.flatten()}\")\n", "\n", "if GPU_AVAILABLE:\n", " # Transfer to GPU\n", " x_gpu = cp.asarray(x)\n", " P_gpu = cp.asarray(P)\n", " F_gpu = cp.asarray(F)\n", " Q_gpu = cp.asarray(Q)\n", " H_gpu = cp.asarray(H)\n", " R_gpu = cp.asarray(R)\n", " z_gpu = cp.asarray(z)\n", " \n", " # Run on GPU\n", " x_pred_gpu, P_pred_gpu = kalman_predict(x_gpu, P_gpu, F_gpu, Q_gpu, xp=cp)\n", " x_upd_gpu, P_upd_gpu = kalman_update(x_pred_gpu, P_pred_gpu, z_gpu, H_gpu, R_gpu, xp=cp)\n", " \n", " print(\"\\nGPU Results:\")\n", " print(f\"Predicted state: {cp.asnumpy(x_pred_gpu).flatten()}\")\n", " print(f\"Updated state: {cp.asnumpy(x_upd_gpu).flatten()}\")\n", " \n", " # Verify results match\n", " error = np.max(np.abs(cp.asnumpy(x_upd_gpu) - x_upd_cpu))\n", " print(f\"\\nMax difference CPU vs GPU: {error:.2e}\")" ] }, { "cell_type": "markdown", "id": "cell-9", "metadata": {}, "source": [ "## 3. Batch Processing Multiple Tracks\n", "\n", "The real power of GPU acceleration comes from processing many tracks simultaneously. Instead of looping over tracks, we can stack them into batched arrays." ] }, { "cell_type": "code", "execution_count": 22, "id": "cell-10", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Batch Kalman functions defined.\n" ] } ], "source": [ "def batch_kalman_predict(x_batch, P_batch, F, Q, xp=np):\n", " \"\"\"\n", " Batch Kalman filter prediction.\n", " \n", " Parameters\n", " ----------\n", " x_batch : array_like\n", " Batch of states [batch_size, n, 1].\n", " P_batch : array_like\n", " Batch of covariances [batch_size, n, n].\n", " F : array_like\n", " State transition matrix [n, n] (shared).\n", " Q : array_like\n", " Process noise [n, n] (shared).\n", " xp : module\n", " Array module.\n", " \n", " Returns\n", " -------\n", " x_pred : array\n", " Predicted states [batch_size, n, 1].\n", " P_pred : array\n", " Predicted covariances [batch_size, n, n].\n", " \"\"\"\n", " batch_size = x_batch.shape[0]\n", " \n", " # Vectorized prediction\n", " # x_pred = F @ x for each batch element\n", " x_pred = xp.einsum('ij,bjk->bik', F, x_batch)\n", " \n", " # P_pred = F @ P @ F.T + Q for each batch element\n", " FP = xp.einsum('ij,bjk->bik', F, P_batch)\n", " FPFt = xp.einsum('bij,kj->bik', FP, F)\n", " P_pred = FPFt + Q\n", " \n", " return x_pred, P_pred\n", "\n", "\n", "def batch_kalman_update(x_batch, P_batch, z_batch, H, R, xp=np):\n", " \"\"\"\n", " Batch Kalman filter update.\n", " \n", " Parameters\n", " ----------\n", " x_batch : array_like\n", " Batch of predicted states [batch_size, n, 1].\n", " P_batch : array_like\n", " Batch of predicted covariances [batch_size, n, n].\n", " z_batch : array_like\n", " Batch of measurements [batch_size, m, 1].\n", " H : array_like\n", " Measurement matrix [m, n] (shared).\n", " R : array_like\n", " Measurement noise [m, m] (shared).\n", " xp : module\n", " Array module.\n", " \n", " Returns\n", " -------\n", " x_upd : array\n", " Updated states [batch_size, n, 1].\n", " P_upd : array\n", " Updated covariances [batch_size, n, n].\n", " \"\"\"\n", " batch_size = x_batch.shape[0]\n", " n = x_batch.shape[1]\n", " \n", " # Innovation: y = z - H @ x\n", " Hx = xp.einsum('ij,bjk->bik', H, x_batch)\n", " y = z_batch - Hx\n", " \n", " # Innovation covariance: S = H @ P @ H.T + R\n", " HP = xp.einsum('ij,bjk->bik', H, P_batch)\n", " HPHt = xp.einsum('bij,kj->bik', HP, H)\n", " S = HPHt + R\n", " \n", " # Kalman gain: K = P @ H.T @ S^{-1}\n", " PHt = xp.einsum('bij,kj->bik', P_batch, H)\n", " S_inv = xp.linalg.inv(S)\n", " K = xp.einsum('bij,bjk->bik', PHt, S_inv)\n", " \n", " # State update: x_upd = x + K @ y\n", " Ky = xp.einsum('bij,bjk->bik', K, y)\n", " x_upd = x_batch + Ky\n", " \n", " # Covariance update (simplified form)\n", " I = xp.eye(n)\n", " KH = xp.einsum('bij,jk->bik', K, H)\n", " IKH = I - KH\n", " P_upd = xp.einsum('bij,bjk->bik', IKH, P_batch)\n", " \n", " return x_upd, P_upd\n", "\n", "\n", "print(\"Batch Kalman functions defined.\")" ] }, { "cell_type": "code", "execution_count": 23, "id": "cell-11", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Batch size 10: CPU= 0.043ms\n", "Batch size 100: CPU= 0.146ms\n", "Batch size 1000: CPU= 1.205ms\n" ] } ], "source": [ "# Benchmark batch processing\n", "batch_sizes = [10, 100, 1000, 5000]\n", "\n", "if not GPU_AVAILABLE:\n", " batch_sizes = [10, 100, 1000] # Smaller sizes for CPU-only\n", "\n", "cpu_times = []\n", "gpu_times = []\n", "\n", "for batch_size in batch_sizes:\n", " # Create batch data\n", " x_batch_np = np.random.randn(batch_size, state_dim, 1)\n", " P_batch_np = np.tile(P.reshape(1, state_dim, state_dim), (batch_size, 1, 1))\n", " z_batch_np = np.random.randn(batch_size, meas_dim, 1)\n", " \n", " # CPU timing\n", " n_iterations = 100\n", " start = time.time()\n", " for _ in range(n_iterations):\n", " x_pred, P_pred = batch_kalman_predict(x_batch_np, P_batch_np, F, Q, xp=np)\n", " x_upd, P_upd = batch_kalman_update(x_pred, P_pred, z_batch_np, H, R, xp=np)\n", " cpu_time = (time.time() - start) / n_iterations\n", " cpu_times.append(cpu_time)\n", " \n", " if GPU_AVAILABLE:\n", " # Transfer to GPU\n", " x_batch_gpu = cp.asarray(x_batch_np)\n", " P_batch_gpu = cp.asarray(P_batch_np)\n", " z_batch_gpu = cp.asarray(z_batch_np)\n", " \n", " # Warm up\n", " x_pred, P_pred = batch_kalman_predict(x_batch_gpu, P_batch_gpu, F_gpu, Q_gpu, xp=cp)\n", " x_upd, P_upd = batch_kalman_update(x_pred, P_pred, z_batch_gpu, H_gpu, R_gpu, xp=cp)\n", " cp.cuda.Stream.null.synchronize()\n", " \n", " # GPU timing\n", " start = time.time()\n", " for _ in range(n_iterations):\n", " x_pred, P_pred = batch_kalman_predict(x_batch_gpu, P_batch_gpu, F_gpu, Q_gpu, xp=cp)\n", " x_upd, P_upd = batch_kalman_update(x_pred, P_pred, z_batch_gpu, H_gpu, R_gpu, xp=cp)\n", " cp.cuda.Stream.null.synchronize()\n", " gpu_time = (time.time() - start) / n_iterations\n", " gpu_times.append(gpu_time)\n", " \n", " speedup = cpu_time / gpu_time\n", " print(f\"Batch size {batch_size:5d}: CPU={cpu_time*1e3:7.3f}ms, \"\n", " f\"GPU={gpu_time*1e3:7.3f}ms, Speedup={speedup:.1f}x\")\n", " else:\n", " print(f\"Batch size {batch_size:5d}: CPU={cpu_time*1e3:7.3f}ms\")" ] }, { "cell_type": "code", "execution_count": 24, "id": "cell-12", "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "line": { "color": "#00d4ff", "width": 2 }, "marker": { "size": 8 }, "mode": "lines+markers", "name": "CPU (NumPy)", "type": "scatter", "x": [ 10, 100, 1000 ], "y": { "bdata": "AAAAACAupj8AAAAAKMDCPwAAAACvR/M/", "dtype": "f8" } } ], "layout": { "height": 400, "template": { "layout": { "font": { "color": "#e6edf3" }, "paper_bgcolor": "#0d1117", "plot_bgcolor": "#0d1117", "xaxis": { "gridcolor": "#30363d", "zerolinecolor": "#30363d" }, "yaxis": { "gridcolor": "#30363d", "zerolinecolor": "#30363d" } } }, "title": { "text": "Kalman Filter CPU Performance" }, "xaxis": { "title": { "text": "Batch Size" }, "type": "log" }, "yaxis": { "title": { "text": "Time (ms)" }, "type": "log" } } } }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Visualize performance\n", "if GPU_AVAILABLE and len(gpu_times) > 0:\n", " fig = make_subplots(rows=1, cols=2, \n", " subplot_titles=['Kalman Filter Batch Performance', 'GPU Speedup Factor'])\n", " \n", " fig.add_trace(\n", " go.Scatter(x=batch_sizes, y=np.array(cpu_times)*1e3, mode='lines+markers',\n", " name='CPU (NumPy)', line=dict(color='#00d4ff', width=2),\n", " marker=dict(size=8)),\n", " row=1, col=1\n", " )\n", " fig.add_trace(\n", " go.Scatter(x=batch_sizes, y=np.array(gpu_times)*1e3, mode='lines+markers',\n", " name='GPU (CuPy)', line=dict(color='#ff4757', width=2),\n", " marker=dict(size=8, symbol='square')),\n", " row=1, col=1\n", " )\n", " \n", " speedups = np.array(cpu_times) / np.array(gpu_times)\n", " fig.add_trace(\n", " go.Scatter(x=batch_sizes, y=speedups, mode='lines+markers',\n", " name='Speedup', line=dict(color='#00ff88', width=2),\n", " marker=dict(size=10, symbol='triangle-up')),\n", " row=1, col=2\n", " )\n", " fig.add_trace(\n", " go.Scatter(x=[batch_sizes[0], batch_sizes[-1]], y=[1, 1], mode='lines',\n", " line=dict(color='white', dash='dash', width=1),\n", " showlegend=False),\n", " row=1, col=2\n", " )\n", " \n", " fig.update_xaxes(type='log', title_text='Batch Size (number of tracks)', row=1, col=1)\n", " fig.update_yaxes(type='log', title_text='Time per iteration (ms)', row=1, col=1)\n", " fig.update_xaxes(type='log', title_text='Batch Size (number of tracks)', row=1, col=2)\n", " fig.update_yaxes(title_text='Speedup (CPU time / GPU time)', row=1, col=2)\n", " \n", " fig.update_layout(template=dark_template, height=400)\n", " fig.show()\n", "else:\n", " fig = go.Figure()\n", " fig.add_trace(\n", " go.Scatter(x=batch_sizes, y=np.array(cpu_times)*1e3, mode='lines+markers',\n", " name='CPU (NumPy)', line=dict(color='#00d4ff', width=2),\n", " marker=dict(size=8))\n", " )\n", " fig.update_layout(\n", " template=dark_template,\n", " title='Kalman Filter CPU Performance',\n", " xaxis_title='Batch Size',\n", " xaxis_type='log',\n", " yaxis_title='Time (ms)',\n", " yaxis_type='log',\n", " height=400\n", " )\n", " fig.show()" ] }, { "cell_type": "markdown", "id": "cell-13", "metadata": {}, "source": [ "## 4. GPU-Accelerated Particle Filter\n", "\n", "Particle filters involve operations on large numbers of particles that parallelize well:\n", "\n", "1. **Particle propagation**: Apply dynamics to each particle\n", "2. **Weight computation**: Evaluate likelihood for each particle\n", "3. **Resampling**: Select particles based on weights" ] }, { "cell_type": "code", "execution_count": 25, "id": "cell-14", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Particle filter functions defined.\n" ] } ], "source": [ "def particle_filter_step(particles, weights, z, f_dynamics, h_meas, Q_std, R_std, xp=np):\n", " \"\"\"\n", " GPU-accelerated particle filter step.\n", " \n", " Parameters\n", " ----------\n", " particles : array_like\n", " Particle states [n_particles, state_dim].\n", " weights : array_like\n", " Particle weights [n_particles].\n", " z : array_like\n", " Measurement [meas_dim].\n", " f_dynamics : callable\n", " Dynamics function f(x) -> x_next.\n", " h_meas : callable\n", " Measurement function h(x) -> z_pred.\n", " Q_std : float\n", " Process noise standard deviation.\n", " R_std : float\n", " Measurement noise standard deviation.\n", " xp : module\n", " Array module.\n", " \n", " Returns\n", " -------\n", " particles : array\n", " Updated particles.\n", " weights : array\n", " Updated weights.\n", " \"\"\"\n", " n_particles = particles.shape[0]\n", " state_dim = particles.shape[1]\n", " \n", " # Prediction: propagate particles through dynamics + noise\n", " particles_pred = f_dynamics(particles, xp)\n", " particles_pred += Q_std * xp.random.randn(n_particles, state_dim)\n", " \n", " # Update: compute likelihood weights\n", " z_pred = h_meas(particles_pred, xp)\n", " innovation = z - z_pred\n", " \n", " # Gaussian likelihood (vectorized)\n", " log_likelihood = -0.5 * xp.sum(innovation**2, axis=1) / R_std**2\n", " \n", " # Update weights\n", " log_weights = xp.log(weights + 1e-300) + log_likelihood\n", " log_weights = log_weights - xp.max(log_weights) # Numerical stability\n", " weights = xp.exp(log_weights)\n", " weights = weights / xp.sum(weights) # Normalize\n", " \n", " # Compute effective sample size\n", " ess = 1.0 / xp.sum(weights**2)\n", " \n", " # Resample if ESS is too low\n", " if float(ess) < n_particles / 2:\n", " # Systematic resampling\n", " indices = systematic_resample(weights, xp)\n", " particles_pred = particles_pred[indices]\n", " weights = xp.ones(n_particles) / n_particles\n", " \n", " return particles_pred, weights, float(ess)\n", "\n", "\n", "def systematic_resample(weights, xp=np):\n", " \"\"\"\n", " Systematic resampling for particle filters.\n", " \n", " Parameters\n", " ----------\n", " weights : array_like\n", " Normalized weights [n_particles].\n", " xp : module\n", " Array module.\n", " \n", " Returns\n", " -------\n", " indices : array\n", " Resampled indices.\n", " \"\"\"\n", " n = len(weights)\n", " \n", " # Generate systematic points\n", " positions = (xp.arange(n) + xp.random.rand()) / n\n", " \n", " # Cumulative sum of weights\n", " cumsum = xp.cumsum(weights)\n", " \n", " # Find indices using searchsorted\n", " indices = xp.searchsorted(cumsum, positions)\n", " indices = xp.clip(indices, 0, n - 1)\n", " \n", " return indices\n", "\n", "\n", "print(\"Particle filter functions defined.\")" ] }, { "cell_type": "code", "execution_count": 26, "id": "cell-15", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Benchmarking particle filter...\n", "N= 1000: CPU= 2.7ms\n", "N= 5000: CPU= 9.8ms\n", "N=10000: CPU= 18.5ms\n" ] } ], "source": [ "# Define nonlinear dynamics and measurement\n", "def f_nonlinear(x, xp=np):\n", " \"\"\"Nonlinear dynamics: x_next = x/2 + 25*x/(1+x^2) + 8*cos(1.2*k)\"\"\"\n", " return x / 2 + 25 * x / (1 + x**2 + 1e-10)\n", "\n", "def h_nonlinear(x, xp=np):\n", " \"\"\"Nonlinear measurement: z = x^2 / 20\"\"\"\n", " return x**2 / 20\n", "\n", "# Parameters\n", "n_particles_list = [1000, 5000, 10000, 50000]\n", "if not GPU_AVAILABLE:\n", " n_particles_list = [1000, 5000, 10000]\n", "\n", "state_dim_pf = 1\n", "Q_std = 3.0\n", "R_std = 1.0\n", "\n", "n_steps = 50\n", "n_trials = 5\n", "\n", "cpu_pf_times = []\n", "gpu_pf_times = []\n", "\n", "print(\"Benchmarking particle filter...\")\n", "\n", "for n_particles in n_particles_list:\n", " # Generate true trajectory and measurements\n", " np.random.seed(42)\n", " true_state = [0.0]\n", " measurements = []\n", " for k in range(n_steps):\n", " x_new = f_nonlinear(np.array([[true_state[-1]]]))[0, 0] + Q_std * np.random.randn()\n", " true_state.append(x_new)\n", " z = h_nonlinear(np.array([[x_new]]))[0, 0] + R_std * np.random.randn()\n", " measurements.append(z)\n", " \n", " # CPU timing\n", " cpu_times_trial = []\n", " for _ in range(n_trials):\n", " particles = np.random.randn(n_particles, state_dim_pf) * 5\n", " weights = np.ones(n_particles) / n_particles\n", " \n", " start = time.time()\n", " for z in measurements:\n", " z_arr = np.array([[z]])\n", " particles, weights, ess = particle_filter_step(\n", " particles, weights, z_arr, f_nonlinear, h_nonlinear, Q_std, R_std, xp=np\n", " )\n", " cpu_times_trial.append(time.time() - start)\n", " cpu_pf_times.append(np.mean(cpu_times_trial))\n", " \n", " if GPU_AVAILABLE:\n", " # GPU timing\n", " gpu_times_trial = []\n", " for _ in range(n_trials):\n", " particles_gpu = cp.random.randn(n_particles, state_dim_pf) * 5\n", " weights_gpu = cp.ones(n_particles) / n_particles\n", " \n", " # Warm up\n", " z_arr = cp.array([[measurements[0]]])\n", " _, _, _ = particle_filter_step(\n", " particles_gpu, weights_gpu, z_arr, f_nonlinear, h_nonlinear, Q_std, R_std, xp=cp\n", " )\n", " cp.cuda.Stream.null.synchronize()\n", " \n", " particles_gpu = cp.random.randn(n_particles, state_dim_pf) * 5\n", " weights_gpu = cp.ones(n_particles) / n_particles\n", " \n", " start = time.time()\n", " for z in measurements:\n", " z_arr = cp.array([[z]])\n", " particles_gpu, weights_gpu, ess = particle_filter_step(\n", " particles_gpu, weights_gpu, z_arr, f_nonlinear, h_nonlinear, Q_std, R_std, xp=cp\n", " )\n", " cp.cuda.Stream.null.synchronize()\n", " gpu_times_trial.append(time.time() - start)\n", " gpu_pf_times.append(np.mean(gpu_times_trial))\n", " \n", " speedup = cpu_pf_times[-1] / gpu_pf_times[-1]\n", " print(f\"N={n_particles:5d}: CPU={cpu_pf_times[-1]*1e3:7.1f}ms, \"\n", " f\"GPU={gpu_pf_times[-1]*1e3:7.1f}ms, Speedup={speedup:.1f}x\")\n", " else:\n", " print(f\"N={n_particles:5d}: CPU={cpu_pf_times[-1]*1e3:7.1f}ms\")" ] }, { "cell_type": "code", "execution_count": 27, "id": "cell-16", "metadata": {}, "outputs": [], "source": [ "# Visualize particle filter performance\n", "if GPU_AVAILABLE and len(gpu_pf_times) > 0:\n", " fig = make_subplots(rows=1, cols=2, \n", " subplot_titles=[f'Particle Filter ({n_steps} steps)', 'GPU Speedup for Particle Filter'])\n", " \n", " fig.add_trace(\n", " go.Scatter(x=n_particles_list, y=np.array(cpu_pf_times)*1e3, mode='lines+markers',\n", " name='CPU', line=dict(color='#00d4ff', width=2),\n", " marker=dict(size=8)),\n", " row=1, col=1\n", " )\n", " fig.add_trace(\n", " go.Scatter(x=n_particles_list, y=np.array(gpu_pf_times)*1e3, mode='lines+markers',\n", " name='GPU', line=dict(color='#ff4757', width=2),\n", " marker=dict(size=8, symbol='square')),\n", " row=1, col=1\n", " )\n", " \n", " speedups = np.array(cpu_pf_times) / np.array(gpu_pf_times)\n", " fig.add_trace(\n", " go.Scatter(x=n_particles_list, y=speedups, mode='lines+markers',\n", " name='Speedup', line=dict(color='#00ff88', width=2),\n", " marker=dict(size=10, symbol='triangle-up')),\n", " row=1, col=2\n", " )\n", " fig.add_trace(\n", " go.Scatter(x=[n_particles_list[0], n_particles_list[-1]], y=[1, 1], mode='lines',\n", " line=dict(color='white', dash='dash', width=1),\n", " showlegend=False),\n", " row=1, col=2\n", " )\n", " \n", " fig.update_xaxes(type='log', title_text='Number of Particles', row=1, col=1)\n", " fig.update_yaxes(type='log', title_text='Time (ms)', row=1, col=1)\n", " fig.update_xaxes(type='log', title_text='Number of Particles', row=1, col=2)\n", " fig.update_yaxes(title_text='Speedup Factor', row=1, col=2)\n", " \n", " fig.update_layout(template=dark_template, height=400)\n", " fig.show()" ] }, { "cell_type": "markdown", "id": "cell-17", "metadata": {}, "source": [ "## 5. Memory Management Best Practices\n", "\n", "Efficient GPU memory management is crucial for large-scale tracking applications." ] }, { "cell_type": "code", "execution_count": 28, "id": "cell-18", "metadata": {}, "outputs": [], "source": [ "if GPU_AVAILABLE:\n", " print(\"GPU Memory Management Best Practices\")\n", " print(\"=\" * 50)\n", " \n", " # 1. Memory pools\n", " print(\"\\n1. Memory Pools\")\n", " mempool = cp.get_default_memory_pool()\n", " pinned_mempool = cp.get_default_pinned_memory_pool()\n", " \n", " print(f\" GPU memory pool used: {mempool.used_bytes() / 1e6:.2f} MB\")\n", " print(f\" Pinned memory pool used: {pinned_mempool.n_free_blocks()} free blocks\")\n", " \n", " # 2. Pre-allocation\n", " print(\"\\n2. Pre-allocation for Repeated Operations\")\n", " \n", " n = 1000\n", " \n", " # Bad: Allocate new arrays each iteration\n", " start = time.time()\n", " for _ in range(1000):\n", " temp = cp.zeros((n, n))\n", " result = cp.dot(temp, temp)\n", " cp.cuda.Stream.null.synchronize()\n", " time_alloc = time.time() - start\n", " \n", " # Good: Reuse pre-allocated arrays\n", " temp = cp.zeros((n, n))\n", " result = cp.zeros((n, n))\n", " start = time.time()\n", " for _ in range(1000):\n", " cp.dot(temp, temp, out=result)\n", " cp.cuda.Stream.null.synchronize()\n", " time_reuse = time.time() - start\n", " \n", " print(f\" With allocation each time: {time_alloc*1e3:.1f} ms\")\n", " print(f\" With pre-allocated arrays: {time_reuse*1e3:.1f} ms\")\n", " print(f\" Speedup: {time_alloc/time_reuse:.1f}x\")\n", " \n", " # 3. Async transfers\n", " print(\"\\n3. Asynchronous Data Transfers\")\n", " \n", " data_cpu = np.random.randn(10000, 1000)\n", " \n", " # Synchronous\n", " start = time.time()\n", " data_gpu = cp.asarray(data_cpu)\n", " cp.cuda.Stream.null.synchronize()\n", " time_sync = time.time() - start\n", " \n", " # With pinned memory (faster transfers)\n", " data_pinned = cp.cuda.alloc_pinned_memory(data_cpu.nbytes)\n", " data_view = np.frombuffer(data_pinned, dtype=data_cpu.dtype).reshape(data_cpu.shape)\n", " np.copyto(data_view, data_cpu)\n", " \n", " start = time.time()\n", " data_gpu2 = cp.asarray(data_view)\n", " cp.cuda.Stream.null.synchronize()\n", " time_pinned = time.time() - start\n", " \n", " print(f\" Standard transfer: {time_sync*1e3:.2f} ms\")\n", " print(f\" Pinned memory transfer: {time_pinned*1e3:.2f} ms\")\n", " \n", " # Clean up\n", " del data_gpu, data_gpu2, temp, result\n", " mempool.free_all_blocks()" ] }, { "cell_type": "markdown", "id": "cell-19", "metadata": {}, "source": [ "## 6. Integration with pyTCL\n", "\n", "Here's how to integrate GPU acceleration with existing pyTCL code." ] }, { "cell_type": "code", "execution_count": 29, "id": "cell-20", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GPU-accelerated tracking function defined.\n" ] } ], "source": [ "# Import from pyTCL\n", "from pytcl.dynamic_estimation.kalman import kf_predict, kf_update\n", "\n", "def gpu_accelerated_tracking(measurements, F, H, Q, R, x0, P0, use_gpu=True):\n", " \"\"\"\n", " Run Kalman filter tracking with optional GPU acceleration.\n", " \n", " Parameters\n", " ----------\n", " measurements : array_like\n", " List of measurements [n_timesteps, meas_dim].\n", " F, H, Q, R : array_like\n", " Kalman filter matrices.\n", " x0, P0 : array_like\n", " Initial state and covariance.\n", " use_gpu : bool\n", " Whether to use GPU acceleration.\n", " \n", " Returns\n", " -------\n", " estimates : array\n", " State estimates [n_timesteps, state_dim].\n", " covariances : array\n", " Covariance estimates [n_timesteps, state_dim, state_dim].\n", " \"\"\"\n", " if use_gpu and GPU_AVAILABLE:\n", " xp = cp\n", " F = cp.asarray(F)\n", " H = cp.asarray(H)\n", " Q = cp.asarray(Q)\n", " R = cp.asarray(R)\n", " x = cp.asarray(x0)\n", " P = cp.asarray(P0)\n", " measurements = cp.asarray(measurements)\n", " else:\n", " xp = np\n", " x = x0.copy()\n", " P = P0.copy()\n", " \n", " estimates = []\n", " covariances = []\n", " \n", " for z in measurements:\n", " # Predict\n", " x, P = kalman_predict(x, P, F, Q, xp)\n", " \n", " # Update\n", " z = z.reshape(-1, 1)\n", " x, P = kalman_update(x, P, z, H, R, xp)\n", " \n", " estimates.append(x.flatten())\n", " covariances.append(P.copy())\n", " \n", " if use_gpu and GPU_AVAILABLE:\n", " estimates = cp.asnumpy(cp.array(estimates))\n", " covariances = cp.asnumpy(cp.array(covariances))\n", " else:\n", " estimates = np.array([xp.asnumpy(e) if hasattr(xp, 'asnumpy') else e for e in estimates])\n", " covariances = np.array([xp.asnumpy(c) if hasattr(xp, 'asnumpy') else c for c in covariances])\n", " \n", " return np.array(estimates), np.array(covariances)\n", "\n", "print(\"GPU-accelerated tracking function defined.\")" ] }, { "cell_type": "code", "execution_count": 30, "id": "cell-21", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Tracking complete: 100 time steps\n" ] } ], "source": [ "# Run tracking example\n", "# Generate trajectory\n", "n_timesteps = 100\n", "true_state = []\n", "measurements_list = []\n", "\n", "x_true = np.array([0, 1, 0, 0.5]) # Start at origin, moving NE\n", "for t in range(n_timesteps):\n", " true_state.append(x_true.copy())\n", " \n", " # Propagate\n", " x_true = F @ x_true + np.random.multivariate_normal(np.zeros(4), Q * 0.1)\n", " \n", " # Generate noisy measurement\n", " z = H @ x_true + np.random.multivariate_normal(np.zeros(2), R)\n", " measurements_list.append(z)\n", "\n", "true_state = np.array(true_state)\n", "measurements_arr = np.array(measurements_list)\n", "\n", "# Run tracking\n", "x0 = np.array([0, 0, 0, 0]).reshape(-1, 1)\n", "P0 = 100 * np.eye(4)\n", "\n", "estimates_cpu, _ = gpu_accelerated_tracking(\n", " measurements_arr, F, H, Q, R, x0, P0, use_gpu=False\n", ")\n", "\n", "if GPU_AVAILABLE:\n", " estimates_gpu, _ = gpu_accelerated_tracking(\n", " measurements_arr, F, H, Q, R, x0, P0, use_gpu=True\n", " )\n", "\n", "print(f\"Tracking complete: {n_timesteps} time steps\")" ] }, { "cell_type": "code", "execution_count": 31, "id": "cell-22", "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "line": { "color": "#00ff88", "width": 2 }, "mode": "lines", "name": "True", "type": "scatter", "x": { "bdata": "AAAAAAAAAACLX4AOJW3uP63OL6zr5fw/2bxHyvw6BUADo//Og2sMQIo5+yUAexFAC7EOAz54FEA/fGzXEDcXQK2t6dLryBpA0so2X2EYH0D7yTNr860hQGKXdo+S6yNATKq2IscoJkCRtyaYjVkoQHNq7YXGdypAmX69UQ+JLECyF4RxN7guQG/1L857lTBAhqd5bbXnMUDlbgWbMTozQDLFUDPNgDRADiR2me3jNUC7rjtJZ0o3QCaD19bQiThA3NWnC0q/OUB2pGvMYfE6QMR0f/2xCzxACDcWiwT6PEAX38XT68E9QKOGVeJufz5Aew8sl+VZP0BxvbcyEiNAQEvabvolnUBAHCRvNVkRQUDw5JvthYRBQKAHkVJ5+UFAgsUjvEtvQkCYwHwf2t9CQHOBIKW2U0NAuIoOoSHJQ0DqosGUpjlEQCG51Qlro0RArcXriZX+RECKmL7lW3FFQC0PT86K+EVAoesSIzqBRkAcoPmKmwxHQPG/k7lCnkdARMPv/78mSEBgFNdNP6NIQM4b3kFuIUlAeHNv9JuaSUCDmTv+vgVKQBJNHwcpd0pAuDS7pYzxSkALnF2XAV9LQC5C9gA+zktAbIgLRQQ7TEC4fvyqT69MQEbYBd41LE1A0NtsS0iPTUACOTAzJuZNQM/louKeNU5AiPm6pBKSTkAFTTjQhvFOQMoDWESBZk9AM+y9OBTST0BA1kUFVxxQQK4czi8WUlBA/wSVk9WIUEAsYA3SLcBQQHerIBgS9VBADld/nrslUUDYGzTVvFBRQKk5M4AqeFFAamRPBmWiUUAMSYuDstBRQOpSU/uBClJAvoERJ6xFUkCer86TR4dSQHccqRBKz1JACWpsgCkbU0B8s0UTZ2tTQDC03SOjwVNAUFFdYJUUVECVD2nksGVUQINbQUbQtFRABOiha/UIVUAnsoNd+mNVQKJVzdzww1VAs7Yj8NkeVkDapa+6Qm9WQCajkp2qvlZAnh4izPcKV0CPi/vno1JXQPDOpc5Ol1dAXFfuyk3kV0Du4TZaCTNYQNx6WEc4flhA0VelPIzJWEA=", "dtype": "f8" }, "xaxis": "x", "y": { "bdata": "AAAAAAAAAAAC0g3pQ07iP1nBSsnQw/E/mSkFj0Op+D8XPGW6gC8BQFWzGPcO5wVAp2X0TXkrC0CetIEWDKAQQAu153xrhxNAZ8Qrefu7FkA893KNyaUaQDONaJpJ8h5Ae34La33PIUBPRyKj1wUkQP4taSkaNCZAg6qa/15oKEDorMWqvMwqQFhHTd3IOS1ATANKSlB+L0DdSQNnC90wQEekmqDY6jFAgWL49p/xMkCbzDk71eYzQNRdSahS0DRAhJxGuRrGNUBI5+ebAMM2QPCszUCi4DdALGmdIen/OEAVCcERcgo6QPKUdrjRGTtAma3xfBEmPECDPC0Orx49QFCddoHtDD5A0wvnfH4EP0DtxKJ/0+4/QD3LfJw6fUBA8v62YOD+QECrYR3T539BQIZPHblx+EFAOI4jd4xzQkCQVE/6pPNCQETNgQYKeENAhd3PSd78Q0AXdEqL+YhEQJ7x+fGUD0VABr3f5l+aRUBTd1TeazFGQPsiRBjR10ZAPlhQezF6R0C1fgUzfBdIQHyygU3ns0hAN7ppQDNKSUBXMnzLZudJQDmpYIxikkpADLGBQrhVS0AE1OQTSRxMQNmiyvDU2kxA3M1+gZySTUCMlmfIREZOQIlDkr1j7E5AZK/JpMaPT0DxnSuS4xhQQGLLYd+AYlBATTALSTunUECtXs43uO5QQKD/rFGtLVFAYIbPvJFqUUC++6hCCadRQCyvDcKD21FAOSJu4lYNUkCEHaV16T9SQBtRbFl0dVJAYFA0QdqxUkCWbglrDepSQFlBE7ryH1NAwuguMk9YU0DmBjCsSoxTQDUq1ljFwFNAGMXL36j8U0Alv3yYsjFUQGcfQuNNYlRAeAmjORqPVECdW6xjk7hUQMfsJoQ551RA7Du0YaMaVUC1ro7HQEtVQHhcbXBqc1VAeYoG9iuaVUBzHiDyaMNVQMYFrrou71VAX39UKjAVVkClXHkOez5WQBzHIkaFY1ZAzhvlQlaGVkAUHZ9QOaJWQJkL8kcExlZAJAX/neXoVkC40H44lwZXQNM6Axu9H1dAkr1duf05V0A=", "dtype": "f8" }, "yaxis": "y" }, { "marker": { "color": "gray", "opacity": 0.5, "size": 4 }, "mode": "markers", "name": "Measurements", "type": "scatter", "x": { "bdata": "xyyiUHtpDcAo44dHQk8ZQOCveE30r+w/osioooenFUDe6D2hAggKQLZtSPzr8yVAxg3x0vMzKED+0lzV5rgWQKhJvSDv6BdAXBW3BU1mLUCJpoecvYEkQAl6dHu5lyZAECxr+gmwMEBIz1LILWAgQGiuwVspBDNAwXaQhz9fLUCTMdpZTUAuQO45cBrcVDFACYESakxeMkDODj9QizIvQOjXnmhtiTJATKbLoqhcN0C1MJmv2dc1QJRFb4gBqDxAPmsn8HpYPkDKzeFcDig9QJJ9jxdRtT5AEKs+mEumOkD9vtRqRfQ/QM5swI/qwjxA7jmV7kOtQUBu0NgCMAdAQA0JZM8tjkBAq0t+tNbQQEAsSmVIOQxDQNGWvR10oEJAmjV9QC28QUDSU9lwHuZCQHTg7zAYj0FADWWH7zApRkAGY2FF03FEQMMmzzy0/0JAmlBDXT6/RkDvjKq2ielFQN5IovRz1EVA+IWVFsxPR0DDdYMgBvVFQPcFZj91HUlAOtMPVm4HSkC+4Lb1jTNKQNvxEUgN1kpA+EpWDH32SkCH9Ogl3fFKQGqj+pdoLUlAS6Hm1b5vS0DXt6eKJ9lLQCCeuAs1zktAVk/hb/zsTECCSl3Q1eNOQDUGdsWIUE9ASZaQcB6NTUD2AW1bBv9PQLIRHAjmpk9AYKpTN4+HUEBwt4gLGzBPQOheqqTH/U5Aj2CfBt5HUEBMsiw0oFFQQISC8x8N31BALZfGpL8cUUCC88JxhYFQQLZvEeQZV1FAKoitnzP8UECKYykPsvtQQNpl8QVH3VFAylgPEb6PUEDTinBPVfRSQAiTO5wtMVRAcnkcv6ZxUkDUr9+G0jlSQCGFJqul+1JAb0atNBfTVEBs2tfVzh9TQJ9rglnW8FRAxtqLu5FNVEC99o1PtnFVQIEu9WeWdVZAQqP9JADsVECx6NuWbSpXQCeWNk8F6VRAgR07mPksVkCltax+MrFXQH44JWrEjFhAQq+WLd6zVkBdmIdgcHtYQHGo52F8M1dAsWt21XfxV0DVz50hRCxXQH0U2TCQXFpAm4nFXkeeWEA=", "dtype": "f8" }, "xaxis": "x", "y": { "bdata": "QDLyrQINqL/eCltZk2oJQAXqqX033hpA/KClDLmpFUChyotq0kcSQFzzn8v9HQtASqqvMwpIB0DflrJeNRUUQJzdcktdmxZAT5BhqHuWB0CCEq33vjAgQAQ7UP2OtBVAJKpOwqF0IUC346b0GusqQEDqLZM/uiRAr9jNKg81KkDIK2h/ayMqQBQzT2lvuS5AUAFwcRaELEA/dRbpjB8uQIUKdJqaYTRAVIP53jhIM0Cku+grX6Y2QO4HS7fvyzBAEzooP3q5NUAGgNRgtos5QAPAJcASfzdAWh8p2drTP0BudjoZ/zU2QEkznZthIkBA403In/bIQUCXkBZlAeBCQLw27NRSoTZAKnqyWz0GPkCP732Tt8k9QMO0wtrVaEFAuKSSrGkIQUCdmBFo82ZBQMcPlkF6ST1AqHKEs9byQ0CQtrEQjmhEQH6W0xNu8UFApjn4tfc5Q0CmZm+JQZFDQF3/QMuCikNAF1LptQMsRUAuaeH+N/FGQAvCwFyliEdASB1QWfwwSUAW6QE3CT9KQDa3P3nOSklA4Aj1jg4nSkCm/7JYKNJLQFzrZqqmEUxAoF1/ot7pS0DMX4315JhOQKqmTWHPSExAEJ0GrI6EUED1M51ado1QQMOXR0F0FU9AsogpQsddT0AI+cjG2UpQQJBBWlx7oU9A4F5qpA2vT0AMvRsRYmxSQNM1jN66r1FA1QVAbRIlUUDAYHsTfNNSQKmmyha+kVJAFNvXiF/hUkCME9Sd369SQCbQj8N+EFNAG+iGJOLsUkAPCfYQ+dJSQKlB46W271RALlV8hIpyU0A75145xzJTQPVzVbkCyFNAGCcL9LdsVEDg0hEOeRNVQExnhEIislRAtDEDlss/VEBmvySNKhNUQJMuigCLRlVAJNM/zLBYVUDISw0FrZ1UQKAptdWCeFVAq8++ZZd9VkA6V7G0Y89UQMc6lFdTlFZATQ2sQhLXVUDn54FgyipWQJIRIHMzvVZAckGom6/ZVEDLARTJhMpWQAJgGYTKXFZAJ5D/sCN/V0AorcGpKSFWQIUfndicR1dAHpBwocioV0A=", "dtype": "f8" }, "yaxis": "y" }, { "line": { "color": "#00d4ff", "dash": "dash", "width": 1.5 }, "mode": "lines", "name": "CPU Estimate", "type": "scatter", "x": { "bdata": "yGkw6v0CDMC0WNvzuaATQAxgfeAlkglAH5t5JkRDFUC4SiDen74TQOzoudI8USFAM+oUIG6WJkDhImoQPhskQPIVgZ24XCJAy1EhWSKqJ0BY6ti/PwMoQB97JS+gwChAUSYSfb79LECwO0HKn98pQGl4NEbw9S5ASus0ftelL0A1UevTeBwwQHN0IqgKAzFAhLeWOr78MUCS8+9nW6QxQIjXe9VaUTJAvnArWf6BNEDb7BvZmpY1QK2Ys6SwwThA1RIiEQy0O0AxXk3brV49QJG+bENa+j5AiFYkm2SFPkDg1AtVCdU/QGiC7Ulqhz9A/7Czz9W+QEDeHwKhRtpAQAbTJ4WeC0FA1I15FS05QUDH7XPQ/xtCQFRq2z5hmUJA2GgB1cyXQkA8tgrtqfBCQDmfSe/yrEJAkbKkC5sTREBuy/9LK4VEQOH1muzFRkRAHIqmpLleRUBpTgRW2+BFQI7H2LUnLUZApQce66LfRkCqKkgZr91GQFeKsJFY7UdAOr2CP7wHSUCPnNx7p+BJQG0SKACUrEpAawM5Q/s9S0B+38u4kZNLQL8ROHI2GktA8rpK78B1S0Dhw3Cc0NVLQL64RYqOD0xAQFhUd6mWTECRk7l5RqlNQOJQgec2mU5AjgJACYSeTkAmCSuDvW1PQBMSusYU3k9APwPSRLJRUEDGfgd9uD5QQNnhwscvHFBAHRmAhv9BUEBpHTvNKV5QQDNx86cqolBAyDZJvMPoUEDMnKywFuJQQNPpc/8cI1FASKTK8BEwUUDgXctTBjRRQC4D4Vxkg1FALAk8yA5FUUDILPrnmOtRQMmXqzmB3VJAfAepJfjxUkDv0amQqeBSQB1QRI8lDlNAT+qUxb/TU0C8nRdpjslTQB5hcqomXVRAXwIs8HWMVEATiyC0Bg9VQORAdGgcyFVAt1yUO2e+VUBUjPBhgXZWQFPKdJZmKlZAG5x1/9pUVkAb7IVXzfhWQPH5QgaNv1dA/X2zxYSkV0BkunkXRCRYQJjnJYOpBlhA1EVgWlonWECqc+pn1vBXQH6r5ISc5FhAoC8ERZ79WEA=", "dtype": "f8" }, "xaxis": "x", "y": { "bdata": "mVqus93npr/gvEcBnDsGQCTAg88KYRlAkGHZkjrJGkBN4mx5ofEYQNHQ24ZCLBVA+62S0NPVEUCPqRZ2uz8TQMYtxNyxKRVA/Jy/T046EkCacU6+GbUXQHqt0Y4N2RdAIVkeOAqXHECV1SIpWHojQBrAXKnDIiVA7JQPb/8jKECZnRcYkjMqQNybdDkYKy1AKG+IPn1kLkCgqsdy0ZovQEpQumm1DDJAj/2n6LhGM0BupkeXjUY1QP1dlHiFjTRALLhWgUiRNUAIiOtLWJs3QBKJzzWnUzhA57we25K5O0A2E6AGD8I6QMFHAuQnZz1AfB6dScJBQEBTdi7nQsRBQCTCvas2GEBAFkgg1+zzP0BGZZRs8I8/QMMxEFUOfkBADh/Y7HziQEA4hJnG00dBQFgxtnx1jEBALLGAwl/XQUAmPeJ0jABDQO97c/rF90JAgkhaC6dTQ0Dtuo+L0qtDQDQhA3mk30NA03Yxf4SQREBJ6WX5d7FFQIvlJeBlukZAex7uMk0NSEBrwbB/s15JQDqOEQVP8ElAgIxXA5mPSkD4ciEXGYdLQFIc0rs/QkxA1L6wGK+nTECQhbeAGdFNQNkjHcW+yU1AyiTr9tlcT0AujoVWokNQQPg33h2lS1BA8DJ7cmFOUECNBbBnnXtQQFViMl6baFBAm+XQmsFSUECqQ7X0OilRQETqjSguiFFA1QV0RA6WUUDiO2u0GDFSQDA1MYWCiVJAnYOvHmrfUkBtukVwpgRTQB1S+FiFOFNAMipuRKVJU0Doh5Co1UNTQLgI/a4U+lNAfxfz8c34U0DTRxaKQtZTQNszZkzp6FNAJkVFQZktVEDX1wgEsppUQAwclsQCx1RAh5KyknK4VEC4yDYlZZVUQPjxhQFo41RA3R0pKMIjVUBvbVLPsw1VQK0I3z6NRVVAwP5kBHvLVUDPK0gUO5RVQPQuLhN9BFZAKZoreYYTVkAfMVqHfjZWQEoXIlS2gFZA5hCFN6cGVkBk0ZHzsVNWQIsezxjkZlZAsUwE+Q3bVkBfg5OpQLRWQN5To1Hb+VZAvTA/bJ5PV0A=", "dtype": "f8" }, "yaxis": "y" }, { "line": { "color": "#00d4ff", "width": 1.5 }, "mode": "lines", "name": "CPU Error", "type": "scatter", "xaxis": "x2", "y": { "bdata": "1AelwJMDDEDuB/jO2h4SQA68GxXqqRVAhynAeRs1F0Af6VqFLEMRQPh8JZr69xNAI5F+jogRGUDUw1Re9DIRQN1PoE1XJQRATJqgBgzZEEB7EQEhngEKQJuXou2O+wdAfUEBQ7y7DkCkgPuhO+TpP695ZErJeAJAvOiGa679+D8fH8Moh+HpP+ljZedzc9s/toIHwybP4T9F1rd0qIX+P+Idk9jDgwFAna8D11XA9j+SAvw8MoABQFBvU6+mxdU/5yNFoZN4/z/vi+3SFo8EQGTulkhYuwdA2u//CEIPCUA4EZWzqo8BQK4ZIJYRLgRAgksTTvNxE0CE2y9s6UgaQPsYh9YwcwJAZxzIEKyK7z+mhUHmbdfzPxObew4P/fM/UUFHaye72D85tXAcpUXdP3jVHD/pBQlADl4/17ug9T957ARGOyfjP5kDw7k0x/M/kC3Z+tlT+D/4hqqAa/X+Px4QIPxZRwNAD36oovWhAUDUdPFWIgnxPxPxUDZPGOU/hQllDvnMAEBQy4Wu630MQAgYePtwow9ABhnS3nWWEEC+g/WOEvsRQDQLfK3p2QxAjlVoD26uBkCTsLqH7koMQIib2UbG9f4/fOtwHgg1DUARQW3NiKYTQIeOk8RZjRFAk6/NFu7fB0B7fgis4mgLQOgyfBMtigpAgW+1twhdEUCzJW1fjM0JQFmgUJ1IUgFAuaiyKTPB+D/TAIF9XR4DQMMlbfY98QdAKAxKhCTeDEB2zOjFc/QIQGkwAK2mDQlARGZBhqAEA0ASYzkZwJD3PxlXP3d7TQtAABgvzzA1B0A48GKDS63zPzYBb2wA2QpA4aFPSZhjBkBSvhFcfzwBQNcN/+Knrf0/cjLfkiSlB0AY6mSOviD5P+zf4sjwcQNAW+kI4E8O/j8uFTyAloUGQBxT7IZ3cRFAmCOw186AB0BHljWs1GgRQBx6deG0Kfo/GyVnWc4D6z+PWe55vTgBQFxoXnyaKBBAffxNLXj2CEAtAe51Q/sLQEyJXLaETgJAdha6dKId8T83pdvilWn6P4uRpGcrS/s/UFnHuRMx7D8=", "dtype": "f8" }, "yaxis": "y2" } ], "layout": { "annotations": [ { "font": { "size": 16 }, "showarrow": false, "text": "Tracking Results", "x": 0.225, "xanchor": "center", "xref": "paper", "y": 1, "yanchor": "bottom", "yref": "paper" }, { "font": { "size": 16 }, "showarrow": false, "text": "Tracking Error", "x": 0.775, "xanchor": "center", "xref": "paper", "y": 1, "yanchor": "bottom", "yref": "paper" } ], "height": 450, "template": { "layout": { "font": { "color": "#e6edf3" }, "paper_bgcolor": "#0d1117", "plot_bgcolor": "#0d1117", "xaxis": { "gridcolor": "#30363d", "zerolinecolor": "#30363d" }, "yaxis": { "gridcolor": "#30363d", "zerolinecolor": "#30363d" } } }, "xaxis": { "anchor": "y", "domain": [ 0, 0.45 ], "title": { "text": "X Position" } }, "xaxis2": { "anchor": "y2", "domain": [ 0.55, 1 ], "title": { "text": "Time Step" } }, "yaxis": { "anchor": "x", "domain": [ 0, 1 ], "scaleanchor": "x", "scaleratio": 1, "title": { "text": "Y Position" } }, "yaxis2": { "anchor": "x2", "domain": [ 0, 1 ], "title": { "text": "Position Error" } } } } }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "CPU RMSE: 2.9987\n" ] } ], "source": [ "# Visualize results\n", "fig = make_subplots(rows=1, cols=2, subplot_titles=['Tracking Results', 'Tracking Error'])\n", "\n", "# Trajectory plot\n", "fig.add_trace(\n", " go.Scatter(x=true_state[:, 0], y=true_state[:, 2], mode='lines',\n", " name='True', line=dict(color='#00ff88', width=2)),\n", " row=1, col=1\n", ")\n", "fig.add_trace(\n", " go.Scatter(x=measurements_arr[:, 0], y=measurements_arr[:, 1], mode='markers',\n", " name='Measurements', marker=dict(color='gray', size=4, opacity=0.5)),\n", " row=1, col=1\n", ")\n", "fig.add_trace(\n", " go.Scatter(x=estimates_cpu[:, 0], y=estimates_cpu[:, 2], mode='lines',\n", " name='CPU Estimate', line=dict(color='#00d4ff', width=1.5, dash='dash')),\n", " row=1, col=1\n", ")\n", "if GPU_AVAILABLE:\n", " fig.add_trace(\n", " go.Scatter(x=estimates_gpu[:, 0], y=estimates_gpu[:, 2], mode='lines',\n", " name='GPU Estimate', line=dict(color='#ff4757', width=1.5, dash='dot')),\n", " row=1, col=1\n", " )\n", "\n", "# Error plot\n", "error_cpu = np.sqrt((estimates_cpu[:, 0] - true_state[:, 0])**2 + \n", " (estimates_cpu[:, 2] - true_state[:, 2])**2)\n", "fig.add_trace(\n", " go.Scatter(y=error_cpu, mode='lines', name='CPU Error',\n", " line=dict(color='#00d4ff', width=1.5)),\n", " row=1, col=2\n", ")\n", "if GPU_AVAILABLE:\n", " error_gpu = np.sqrt((estimates_gpu[:, 0] - true_state[:, 0])**2 + \n", " (estimates_gpu[:, 2] - true_state[:, 2])**2)\n", " fig.add_trace(\n", " go.Scatter(y=error_gpu, mode='lines', name='GPU Error',\n", " line=dict(color='#ff4757', width=1.5, dash='dash')),\n", " row=1, col=2\n", " )\n", "\n", "fig.update_xaxes(title_text='X Position', row=1, col=1)\n", "fig.update_yaxes(title_text='Y Position', row=1, col=1, scaleanchor='x', scaleratio=1)\n", "fig.update_xaxes(title_text='Time Step', row=1, col=2)\n", "fig.update_yaxes(title_text='Position Error', row=1, col=2)\n", "\n", "fig.update_layout(template=dark_template, height=450)\n", "fig.show()\n", "\n", "print(f\"CPU RMSE: {np.sqrt(np.mean(error_cpu**2)):.4f}\")\n", "if GPU_AVAILABLE:\n", " print(f\"GPU RMSE: {np.sqrt(np.mean(error_gpu**2)):.4f}\")\n", " print(f\"Max difference: {np.max(np.abs(estimates_cpu - estimates_gpu)):.2e}\")" ] }, { "cell_type": "markdown", "id": "cell-23", "metadata": {}, "source": [ "## Summary\n", "\n", "Key takeaways for GPU acceleration:\n", "\n", "1. **CuPy provides NumPy compatibility** - Easy migration with minimal code changes\n", "2. **Batch processing is key** - GPU shines when processing many tracks/particles simultaneously\n", "3. **Memory management matters** - Pre-allocate, use pinned memory, free unused memory\n", "4. **Transfer overhead** - Minimize CPU↔GPU transfers for best performance\n", "5. **Problem size determines speedup** - Small problems may not benefit from GPU\n", "\n", "### Performance Guidelines\n", "\n", "| Scenario | Recommendation |\n", "|----------|---------------|\n", "| < 100 tracks | CPU (transfer overhead dominates) |\n", "| 100-1000 tracks | GPU beneficial for complex filters |\n", "| > 1000 tracks | GPU strongly recommended |\n", "| < 1000 particles | CPU usually faster |\n", "| > 10000 particles | GPU provides significant speedup |\n", "\n", "## Exercises\n", "\n", "1. Implement GPU-accelerated UKF with sigma point generation on GPU\n", "2. Add GPU memory pooling for repeated filter operations\n", "3. Benchmark different matrix sizes to find the CPU/GPU crossover point\n", "4. Implement GPU-accelerated JPDA likelihood computation\n", "\n", "## References\n", "\n", "1. CuPy Documentation: https://docs.cupy.dev/\n", "2. CUDA Programming Guide: https://docs.nvidia.com/cuda/\n", "3. Gustafsson, F. (2010). *Particle Methods for Tracking*." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.0" } }, "nbformat": 4, "nbformat_minor": 5 }