This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Training help hybrid based model that integrates contextual and numerical features for a classification problem

I want a critical production RISK analysis problem. So, based on a record I want to risk rank each record from 0 to 5. The training set is fairly imbalanced.

> "0.0 964
> 1.0 393
> 2.0 396
> 3.0 286
> 4.0 109
> 5.0 44"

Now, this is what the current training set look like:

2 Risk Rank float64
3 a_weights int64
4 b_weights float64
5 c_weights float64
6 d_weights float64
7 e_weights float64
8 f_weights float64
9 g_weights float64
10 FinalDesc object

Where the FinalDesc column contains a string(description of the Job Order).
For example:
"HVAC REPALCEMENT TOOLS EULDUE TO HARSH ENVIROMENT. Please fix with caution"
I also have weights of KEY words in the Final Desc that will help ranking.

But, the problem right now is, my supervisor gave me Job/environment specific context that might help with the predictions. For example:
"
Records for firewatch are considered lower risk,
Valve 4/5 on Autoclave or generally lower risk due to higher stocking levels.
REL records to review PM details do not present immediate risks.
"
There are more context. What is the best way to do these rankings? Should I leverage the power of LLM's? Please let me know the best way to incorporate context.

My current approach was:
1) vectorize the description and add to dataframe
2) Use a random Forrest classifier to rank the work orders(train, predict). With both nuemerical and the description

It gets an accuracy of 66%. I want to add more complex AI/ML features to solve this problem