For non commercial use? To answer your question, finetune a llama based instruction model, maybe using the lit-llama repo. For this you will need to rent a pretty beefy cloud instance, and you will need to resume the finetuning (or use a LORA) to put new data in. Then host it on a cheaper server with a llama.cpp frontend.
But what you really might want is a vector search. This seems like a better fit.
You might want both: finetuning provides what is loosely analogous to “understanding” across the whole corpus, while vector search provides exact recall of the most relevant specific items.
But what you really might want is a vector search. This seems like a better fit.