Reference¶
All neighbor implementations can be invoked via the main NearestNeighbors
class, which exhibits a similar behaviour as the corresponding class from the scikitlearn package:

class
bufferkdtree.neighbors.
NearestNeighbors
(n_neighbors=5, algorithm='buffer_kd_tree', tree_depth=None, leaf_size=30, splitting_type='cyclic', n_train_chunks=1, plat_dev_ids={0: [0]}, allowed_train_mem_percent_chunk=0.2, allowed_test_mem_percent=0.8, n_jobs=1, verbose=0, **kwargs)¶ The ‘NearestNeighbors’ provides access to all nearest neighbor implementations. It has simmilar parameters as the corresponding implementation of the scikitlearn package.
The main method is the “buffer_kd_tree”, which can be seen as mix between the “brute” and the “kd_tree” implementations.
Parameters: n_neighbors : int (default 5)
Number of neighbors used
algorithm : {“brute”, “kd_tree”, “buffer_kd_tree”}, optional (default=”buffer_kd_tree”)
The algorithm that shall be used to compute the nearest neighbors. One of  ‘brute’: bruteforce search  ‘kd_tree’: kd tree based search  ‘buffer_kd_tree’: buffer kd tree based search (with GPUs)
tree_depth : int or None, optional (default=None)
Passed to the ‘kd_tree’ and ‘buffer_kd_tree’ implementation. In case ‘tree_depth’ is specified, a tree of such a depth is built (‘tree_depth’ has priority over ‘leaf_size’).
leaf_size : int, optional (default=30)
Passed to the ‘kd_tree’ and ‘buffer_kd_tree’ implementation. In case ‘leaf_size’ is set, the corresponding tree depth is computed (is ignored in case tree_depth is not None).
splitting_type : {‘cyclic’}, optional (default=’cyclic’)
Passed to the ‘kd_tree’ and ‘buffer_kd_tree’ implementation. The splitting rule that shall be used to construct the kd tree. Currently, only “cyclic” is supported.
n_train_chunks : int, optional (default=1)
Passed to the ‘buffer_kd_tree’ implementation. The number of chunks the training patterns shall be processed in; only needed in case the training patterns do not fit on the GPU (in case n_train_chunks is too small, it is increased automatically).
plat_dev_ids : dict, optional (default={0:[0]})
Passed to the ‘brute’ and the ‘buffer_kd_tree’ implementation. The platforms and devices that shall be used. E.g., plat_dev_ids={0:[0,1]} makes use of platform 0 and the first two devices.
allowed_train_mem_percent_chunk : float, optional (default=0.2)
Passed to the ‘buffer_kd_tree’ implementation. The amount of memory (OpenCL) used for the training patterns (in percent).
allowed_test_mem_percent : float, optional (default=0.8)
Passed to the ‘buffer_kd_tree’ implementation. The amount of memory (OpenCL) used for the test/query patterns (in percent).
n_jobs : int, optional (default=1)
Passed to the ‘kd_tree’ implementation. The number of threads used for the querying phase.
verbose : int, optional (default=0)
The verbosity level (0=no output, 1=output)
Notes
The bruteforce implementatation is only used for comparison in relatively lowdimensional spaces; the performance is suboptimal for higher dimensional feature spaces (but even superior over other matrix based implementations making use e.g., CUBLAS).
The performance of the GPU implementations depends on the corresponding architecture. An important ingredient is the support of automatic hardware caches.
Only singleprecision is supported until now.
Examples
>>> import numpy >>> from bufferkdtree.neighbors.base import NearestNeighbors >>> X = numpy.random.uniform(low=1,high=1,size=(10000,10)) >>> nbrs = NearestNeighbors(n_neighbors=10, algorithm="buffer_kd_tree", tree_depth=9, plat_dev_ids={0:[0]}) >>> nbrs.fit(X) >>> dists, inds = nbrs.kneighbors(X)

fit
(X)¶ Fit the model to the given data.
Parameters: X : arraylike, shape (n_samples, n_features)
The set of training/reference points, where ‘n_samples’ is the number points and ‘n_features’ the number of features.
Returns: self : instance of NearestNeighbors
The object itself

kneighbors
(X=None, n_neighbors=None, return_distance=True)¶ Finds the nearest neighbors for a given set of points.
Parameters: X : arraylike, shape (n_samples, n_features)
The set of query points. If not provided, the neighbors of each point in the training data are returned (in this case, the query point itself is not considered its own neighbor.
n_neighbors : int or None, optional (default=None)
The number of nearest neighbors that shall be returned for each query points. If ‘None’, then the default values provided to the constructor is used.
return_distance : bool, optional (default=True)
If False, then the distances associated with each query will not be returned. Otherwise, they will be returned.
Returns: dist : array
The array containing the distances
idx : array
The array containing the indices

compute_optimal_tree_depth
(Xtrain, Xtest, target='test', tree_depths=None)¶ Computes the optimal tree depth for the treebased implementations. The method tests various assignments of the parameters and simply measures the time needed for the approach tp finish.
Parameters: Xtrain : arraylike, shape (n_samples, n_features)
The set of training/reference points, where ‘n_samples’ is the number points and ‘n_features’ the number of features.
Xtest : arraylike, shape (n_samples, n_features)
The set of testing/querying points, where ‘n_samples’ is the number points and ‘n_features’ the number of features.
target : {‘train’, ‘test’, ‘both’}, optional (default=’test’)
The runtime target, i.e., which phase shall be optimized. Three choices:  ‘train’ : The training phase  ‘test’ : The testing phase  ‘both’ : Both phases
tree_depths : list or None, optional
The range of different tree depths that shall be tested. If None, then the default ranges are used by the different implementations:
 buffer_kd_tree : range(2, max_depth  1)
 kd_tree : range(4, max_depth  1)
where max_depth = int(math.floor(math.log(len(Xtrain), 2)))
Returns: opt_height : int
The optimal tree depth
