Yury Malkov - Staff Engineer, Twitter - Author of the most adopted ANN algorithm HNSW

Yury Malkov - Staff Engineer, Twitter - Author of the most adopted ANN algorithm HNSW

Vector Podcast · 2022-01-31

Topics:

00:00 Introduction

01:04 Yury’s background in laser physics, computer vision and startups

05:14 How Yury entered the field of nearest neighbor search and his impression of it

09:03 “Not all Small Worlds are Navigable”

10:10 Gentle introduction into the theory of Small World Navigable Graphs and related concepts

13:55 Further clarification on the input constraints for the NN search algorithm design

15:03 What did not work in NSW algorithm and how did Yury set up to invent new algorithm called HNSW

24:06 Collaboration with Leo Boytsov on integrating HNSW in nmslib

26:01 Differences between HNSW and NSW

27:55 Does algorithm always converge?

31:56 How FAISS’s implementation is different from the original HNSW

33:13 Could Yury predict that his algorithm would be implemented in so many frameworks and vector databases in languages like Go and Rust?

36:51 How our perception of high-dimensional spaces change compared to 3D?

38:30 ANN Benchmarks

41:33 Feeling proud of the invention and publication process during 2,5 years!

48:10 Yury’s effort to maintain HNSW and its GitHub community and the algorithm’s design principles

53:29 Dmitry’s ANN algorithm KANNDI, which uses HNSW as a building block

1:02:16 Java / Python Virtual Machines, profiling and benchmarking. “Your analysis of performance contradicts the profiler”

1:05:36 What are Yury’s hopes and goals for HNSW and role of symbolic filtering in ANN in general

1:13:05 The future of ANN field: search inside a neural network, graph ANN

1:15:14 Multistage ranking with graph based nearest neighbor search

1:18:18 Do we have the “best” ANN algorithm? How ANN algorithms influence each other

1:21:27 Yury’s plans on publishing his ideas

1:23:42 The intriguing question of Why

Show notes:

- HNSW library: https://github.com/nmslib/hnswlib/

- HNSW paper Malkov, Y. A., & Yashunin, D. A. (2018). Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. TPAMI, 42(4), 824-836. (arxiv:1603.09320)

- NSW paper Malkov, Y., Ponomarenko, A., Logvinov, A., & Krylov, V. (2014). Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems, 45, 61-68.

- Yury Lifshits’s paper: https://yury.name/papers/lifshits2009combinatorial.pdf

- Sergey Brin’s work in nearest neighbour search: GNAT - Geometric Near-neighbour Access Tree: [CiteSeerX — Near neighbor search in large metric spaces](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.173.8156)

- Podcast with Leo Boytsov: https://rare-technologies.com/rrp-4-leo-boytsov-knn-search/

- Million-Scale ANN Benchmarks: http://ann-benchmarks.com/

- Billion Scale ANN Benchmarks: https://github.com/harsha-simhadri/big-ann-benchmarks

- FALCONN algorithm: https://github.com/falconn-lib/falconn

- Mentioned navigable small world papers:

Kleinberg, J. M. (2000). Navigation in a small world. Nature, 406(6798), 845-845.;

Boguna, M., Krioukov, D., & Claffy, K. C. (2009). Navigability of complex networks. Nature Physics, 5(1), 74-80.

Vector Podcast

Vector Podcast is here to bring you the depth and breadth of Search Engine Technology, Product, Marketing, Business. In the podcast we talk with engineers, entrepreneurs, thinkers and tinkerers, who put their soul into search. Depending on your interest, you should find a matching topic for you -- whether it is deep algorithmic aspect of search engines and information retrieval field, or examples of products offering deep tech to its users. "Vector" -- because it aims to cover an emerging field of vector similarity search, giving you the ability to search content beyond text: audio, video, images and more. "Vector" also because it is all about vector in your profession, product, marketing and business.

Podcast website: https://www.vectorpodcast.com/

Dmitry is blogging on https://dmitry-kan.medium.com/

Waar kun je luisteren?

Apple Podcasts Logo Spotify Logo Podtail Logo Google Podcasts Logo RSS

Afleveringen