Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Wizard #4

Open
richard-churchman opened this issue Dec 29, 2022 · 3 comments
Open

Model Wizard #4

richard-churchman opened this issue Dec 29, 2022 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@richard-churchman
Copy link
Contributor

richard-churchman commented Dec 29, 2022

Creating a machine learning model in Jube can be a convoluted process involving creating a model, specifying fields to be extracted, specifying tags and then loading data via HTTP endpoint, before being available for training in the embedded Exhaustive machine learning algorithm. The requirements contrast to products which can achieve the same through the application of a CSV file. It follows that despite having more advanced capabilities the adoption may be reduced to other products. While Jube was not designed as an automated machine learning Wizard, there appears increasing overlap

It is proposed that a Model Wizard be created to take a CSV file and parse the metadata and data itself, automatically creating all configuration elements that are otherwise created manually. The file will be parsed for its data to identify the universe of categorical variables, with these being created as Boolean XPath expressions (a process which currently is done typically outside of Jube).

Task: Ensure JSON Path Expression returns a Boolean value

As categorical data pivoting will be done in Jube, JSON Path must be available in the Request XPath Model Configuration to return based on Expression, for example, $.[?(@.=='Politician')].

Task: Create a new page to parse the CSV file

The new page called Model Wizard, existing under the Models menu item, will accept a CSV file as an upload and proceed to parse the headers. For each header the data will be inspected:

  • Is all numeric, in which case will be treated as Float for the purpose of model configuration.
  • Has the presence of string data, in which case will be treated as String for the purpose of model configuration.

In keeping with the stateless nature of the design, the parsing will be stored in tables in the database for recall by the user interface. At this stage, the model will not be created.

Task: Allocate Dependent Variable

With the metadata having been established, the page must accept further configuration parameters, specifically including the dependent variable, which will go on to be a tag value, corresponding Exhaustive Model and Activation Rule.

Task: Create Model

Based on metadata and configuration create the model in Jube comprising:

  • Headers will be transposed to Request XPath configuration elements.
  • For each String in Categorical variables the header will be transposed as an expression (i.e. Categorical Data Pivoting).
  • For each String in the Categorical variable specified as Dependent Variable a Tag element will be created and;
  • An Exhaustive configuration element will be created to target the Tag disposition for machine learning and;
  • For good measure, an Activation Rule element will be created targeting the return value from Exhaustive models, where > 0.5 will drive activation. The Activation Rule is not strictly necessary as the Exhaustive recall values are available in their raw form on recall.

Task: Load Data from CSV into JSON for storage in the Archive

Transpose the CSV file to a JSON representation and store it in the Archive table which will make the data available for Exhaustive training.

Task: Synchronise Model

Insert data to cause the model to synchronise and thus start Exhaustive training.

@richard-churchman
Copy link
Contributor Author

Created branch.

richard-churchman added a commit that referenced this issue Jan 3, 2023
…when used in conjunction with Boolean data types in the Request XPath definitions. See #4 for specification.
@richard-churchman
Copy link
Contributor Author

Completed task to refactor JSON parsing to support JSON Path expressions. Committed to branch.

A working example of a JSON Path expression is:

$..[?(@.Brand == 'ZTE')]

Note the double stop \ point to signify that the JSON Path is to be tested against a singular object, and not an array object. This JSON Path does not work on all parsers but seems fine in JSON.net.

richard-churchman added a commit that referenced this issue Jan 5, 2023
…gration which inserts a new Permission Specification and allocates to Administrator in Role Registry Permission. Created a page called EntityAnalysisModelWizard and included the validation of the new Permission Specification in code behind. Included the page location in the shared layout menu also on validation of the new Permission Specification. See #4 for specification.
@richard-churchman richard-churchman self-assigned this Dec 26, 2023
@richard-churchman richard-churchman added the enhancement New feature or request label Dec 26, 2023
richard-churchman added a commit that referenced this issue Dec 26, 2023
…when used in conjunction with Boolean data types in the Request XPath definitions. See #4 for specification.
richard-churchman added a commit that referenced this issue Dec 26, 2023
…gration which inserts a new Permission Specification and allocates to Administrator in Role Registry Permission. Created a page called EntityAnalysisModelWizard and included the validation of the new Permission Specification in code behind. Included the page location in the shared layout menu also on validation of the new Permission Specification. See #4 for specification.
@richard-churchman
Copy link
Contributor Author

richard-churchman commented Dec 26, 2023

Picking this issue back up after a period of inactivity. Some changes to the design will include categorical variables being laid out in the Inline Function rather than using Json XPath processing. This means that the original file format is fully respected as Json and the user does not need to worry about breaking out categorical variables on recall. It was too convoluted to achieve the categorical variable pivoting in the Request XPath, which is what the Inline Function is for anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Status: 🏗 In progress
Development

No branches or pull requests

1 participant