This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Top-k scoring across multiple dimensions

In e.g. Natural Language Processing in Machine Learning, a beam-search is often used to predict the next objects to add on to a sequence and rank them. A key part of the beam-search is the top-k score metric, which is effective: Given a list of choices of length N of probability scores, return the top k scoring items of N. This is as simple as sorting a list and then taking the top values.

Referring to a visual example https://www.researchgate.net/figure/A-partially-completed-beam-search-procedure-with-a-beam-width-of-5-for-an-example-input_fig2_317377611 in a beam-search (in the above case, k=5, and a ‘top’ score is a minimal value), at each iteration, each node selects the top k items from the list of choices N, resulting in k2 total potential paths. From these paths, the top k overall are filtered, which form the nodes for the next iteration. In the previous example, you can see only the filtered nodes at each time-step. https://d2l.ai/_images/beam-search.svg expands the case of k=2, N=5 comprehensively.

Imagine, instead of optimizing one choice from N for each branch/node, you had to choose multiple values: When exploring from a node, you have a set of choices of dimension (N, q) from which you want to select q values, one from each column q. Then, to find the highest-scoring sets of choices, you need to consider combinations of the values in these columns. For example: For a matrix of choices N=5, q=4:

If k=5, this top-k function should return the following:

  1. 3.6376 = q0[0] + q1[2] + q2[4] + q3[1]
  2. 3.6301 = q0[0] + q1[4] + q2[4] + q3[1]
  3. 3.6181 = q0[0] + q1[2] + q2[4] + q3[3]
  4. 3.6106 = q0[0] + q1[4] + q2[4] + q3[3]
  5. 3.4755 = q0[2] + q1[2] + q2[4] + q3[1]

which are the largest possible sums, using one value from each column.

Solving this for arbitrary N and q, the naive approach would be to calculate all Nq sums, sort them, then take the top k results. The first step of optimization would be to sort each column, then only calculate the combinations of sums from the top k values in each column, reducing the complexity to kq.

However, given this function to find top scores must be called k times every time-step of the beam-search, every possible speedup is vital if one wishes to scale to high k or high q. The best solution I’ve come up with (condensed to a minimum example, assuming matrix is a NumPy array of shape (N, q), and taking q to be 4):

This method creates partitions of integers p into a given width q for integers 0 ≤ p ≤ k, e.g. for q=4:

etc.

These are then used to index the are sorted input matrix, to select each combination for summation. The length of pi in the case q=4 follows the triangular pyramidal sequence (https://oeis.org/A000292): This reduces the search space to the sum of all p0...k which is the Binomial coefficient (k,4) = k(k-1)(k-2)(k-3)/24 (https://oeis.org/A000332). This is a vast improvement over the k4 solution for small k (for k < 30, this is less than k3), but still grows on the order of k4. Does there exist a solution to the arbitrary case with complexity <O(kq)?

0