I have a number of java projects that use similar libraries. To be specific I have a number of build.gradle files. They all have java dependencies in this format:
implementation group: “com.google.guava”, name: “guava”, version: “30.0-jre”
implementation group: ‘com.fasterxml.jackson.core’, name: ‘jackson-core’, version: ‘2.13.4’
implementation group: ‘com.fasterxml.jackson.core’, name: ‘jackson-databind’, version: ‘18.104.22.168’
We have to keep updating these files for security vulnerabilities. If we do a fix in one file, it needs to be applied to several other projects/microservices. I want to automate the code fixes which are mostly update the version of jars e.g. 2.13.4 => 2.14.2
if a model can learn this pattern, it can suggest the changes in other files. What type of model should I look for this.
To me it looks like feeding all files along with historical changes (which are going to be updating few lines every time). What algorithm can learn computer code and suggest edits ?
I am not sure if an ML model is the best way to solve this. I don’t see much of prediction here, nor the need to uncover trends or patterns. This is a very deterministic, rule-based challenge.
Have you considered building a dictionary? and even a script generator based on this dictionary?
I feel like you are right. This is rule based but rules are dynamic as vulnerabilities are updated daily and we fix them following certain rules. I thought machine can learn those rules, at least easy ones by doing automatic check-ins and we fix only the hard ones. I was also thinking that the model will constantly be refreshed with human updates and predict that project Y, Z can be updated based on changes in Project X
Maybe I am wrong, but I thought that with neural networks, the data plays the role of rules. Those rules that instead of being coded as programs rely on data and change over time.
Now when I think more about it, I feel like i can read the build.gradle file and create dependency vectors from it like this:
ChildAa, ChildAb, ChildAc
ChildBa, ChildBb, ChildBc
Parent C (no children)
Parent D (no children)
Parent Aa (added due to childAa)
Parent Ab (added due to childAb)
Parent Ac (added due to childAc)
and so on
Keeping it abstract, for brainstorming, you can see that my training data is hierarchical. I have history of changes in Git where:
- Some parents like C and D were upgraded
- children were added to specific parents (not just blindly to any parent)
- new parents (e.g. Parent E, F, etc) had to be added because of children being added in #2 scenario
I guess my rules data is going to be like:
parent C => updated to => parent C’
parent D => updated to => parent D’
parent D’ => updated to => parent D’’
childAd added under Parent A and Parent Ad added
childBd, childBe added under Parent B and Parent Bd and Parent Be added
From this rules data I have to predict, if :
- parent C, D or D’ exist in any other project, they should be updated as per training data
- parent A or B exist in any project, add similar children and add corresponding parents
I can see that it’s a strict rule based logic. I thought that rules will be dynamic (derived from Git) so a neural network makes sense. But looks like rules can still be dynamic (read from Git) without using a neural network.