Every research paper on RAG I’ve gone through so far: the crux is they all assume the embedding model understands the language-semantic chunking, hierarchical all break when it down to nomic-embed-text maps every thaana word to the same region.
hallu-own
#dhivehi model.