As large language models (LLMs) evolve from experimental tools to valuable assets, transparency in their data sourcing is rapidly declining. Initially, datasets were openly shared, allowing the public ...