I am interested in adding RocksDB as a persistent backend to this implementation.
Following this I would love to introduce a concensus layer, Raft most likely, to introduce sharding in this system.
I really need to clean the code up as well, its quite disgusting at the moment.
My main concern about this project at the moment however, is efficient fuzzy search.
Obviously generating all words with edit distance n
is not a viable solution.
I have started rewriting the search engine in Rust. So far I have added quite a lot more functionality when compared to the C++ implementation.
The system is extremely fast at writes according to my benchmarks.It doesn't favour concurrency however, I will most likely expose a channel that the Inverted Index can read write jobs from. I think this will reduce thread contention and improve performance by a significant margin.
I was able to achieve on average 15ms for an insertion of 148KB. Including all deserialization and serialization costs, the average insertion jumped to about 50ms for this document. The most important criteria was thread contention, when concurrency increased the time increased significantly.