Cheshireai
Greenlighter
- Joined
- Dec 8, 2023
- Messages
- 2
I'm an amateur AI enthusiast with an interest in training open source LLM models. To simplify it for anyone who's not familiar, open source LLM's are chatbot models (like a program) similar to ChatGPT, but can be run on a personal computer without the need for any internet access. So it's possible to ask a question, and you don't need to worry about your information or chat history being sent back to some company mining your data for god knows what. It never has to leave your computer or phone.
What I need to do is create a set of questions and answers that will be used to to train the model. Things like, factual questions and answers about the effects of drugs and their routes of administration, correction of common myths or misconceptions, and emotional support dealing with isolation, paranoia, or other crises. So it could basically be like a personal judgement free companion that can answer drug related questions with infinite patience and empathy.
Obviously it's going to be a lot of work; I need to make sure that it's not hallucinating dangerously wrong information. But it seems like something like this is inevitable (other people do models for general therapy, medical information, finance, etc), and something like this could potentially do some good eventually. I also do want to open source the training datasets, so if for some reason I can't continue with the project, anyone else with beginner level machine learning skills could easily pick up where I left off, or even just fork the project if they hate my guts and think they can do it better.
What I'm looking for is some feedback on the concept, and maybe some direction for collecting the kind of data I need. The main thing for something like this is, the quality is insanely important. The accuracy and trustworthiness of the information needs to be beyond reproach. Scraping random forum threads and hoping that they're mostly good quality is not really an option. If anyone has leads for textbook quality sources data that revolves around harm reduction, coping with addiction, or supporting loved ones with addiction, I'd greatly appreciate any leads. Also, I didn't really know where to post this, so if there's some other place I should be asking, or even other platforms. I'm open to any and all suggestions.
Thank you for reading.
What I need to do is create a set of questions and answers that will be used to to train the model. Things like, factual questions and answers about the effects of drugs and their routes of administration, correction of common myths or misconceptions, and emotional support dealing with isolation, paranoia, or other crises. So it could basically be like a personal judgement free companion that can answer drug related questions with infinite patience and empathy.
Obviously it's going to be a lot of work; I need to make sure that it's not hallucinating dangerously wrong information. But it seems like something like this is inevitable (other people do models for general therapy, medical information, finance, etc), and something like this could potentially do some good eventually. I also do want to open source the training datasets, so if for some reason I can't continue with the project, anyone else with beginner level machine learning skills could easily pick up where I left off, or even just fork the project if they hate my guts and think they can do it better.
What I'm looking for is some feedback on the concept, and maybe some direction for collecting the kind of data I need. The main thing for something like this is, the quality is insanely important. The accuracy and trustworthiness of the information needs to be beyond reproach. Scraping random forum threads and hoping that they're mostly good quality is not really an option. If anyone has leads for textbook quality sources data that revolves around harm reduction, coping with addiction, or supporting loved ones with addiction, I'd greatly appreciate any leads. Also, I didn't really know where to post this, so if there's some other place I should be asking, or even other platforms. I'm open to any and all suggestions.
Thank you for reading.