Data Science: Visual Programming with Orange Tool

  • Training data for building a model
  • Validation data for testing which parameters and which model to use
  • Test data for estimating the accuracy of the model

Here, we load the data set from Browse documentation data sets in File widget. We have 303 patients diagnosed with blood vessel narrowing (1) or diagnosed as healthy (0).

  1. Drag the Data Sampler widget to the canvas.
  2. At the right side of the File widget, there is a semi-circular shape. Mouse down on it and drag it to the Data Info widget.
  3. Notice that there is a link between both widgets with the word data on top.

Now, we will split the data into two parts, 85% of data for training and 15% for testing. We will send the first 85% onwards to build a model.

A fixed proportion of data and went with 85%, which is 258 out of 303 patients.

Now send the sample data from Data Sampler to Test and Score widget.

Now we will use Naive Bayes, Logistic Regression, and Tree. Now we will send the models to Test & Score widget. We used cross-validation and discovered Logistic Regression scores the highest AUC.

Now it is time to bring in our test data (the remaining 15%) for testing. Connect Data Sampler to Test & Score once again and set the connection Remaining Data — Test Data.

Now get the comparison scores of the three different algorithms. To do so double click on the Test and Score widget and choose the option of Test on test data there and get the scores for all three algorithms.

Here we had learned how to split our data into training and testing data in the orange tool.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store