Commit b5b4a082 authored by ahmedaj's avatar ahmedaj
Browse files

explanation for encoding

parent d02744cc
------ Report ------
2 2 0.174078 0.7 70.0851 0.975621
------ Report ------
2 2 0.267261 0.7 75.0396 1.65572
------ Report ------
2 2 0.174078 0.7 73.8829 1.14562
------ Report ------
9 12 0.174078 0.7 71.8901 8.30705
------ Report ------
2 2 0.0980581 0.7 73.5956 1.20595
------ Report ------
6 10 0.0980581 0.7 71.7169 5.98302
------ Report ------
6 8 0.0980581 0.7 72.5274 5.74115
------ Report ------
6 4 0.0980581 0.7 72.9768 5.06558
---------- Report--------------
Testing accuracy for forest with 2 trees depth 2 and feature selection weight 0.174078
Total tested data is 9049
The accuracy of the tree is 77.8208% with 7042 correct predictions and 2007 incorrect predictions
Label <=50K was predicted right 6080 times
Label >50K was predicted right 962 times
Label <=50K was predicted wrong 755 times
Label >50K was predicted wrong 1252 times
---------- Report--------------
Testing accuracy for forest with 2 trees depth 2 and feature selection weight 0.174078
Total tested data is 9049
The accuracy of the tree is 70.1514% with 6348 correct predictions and 2701 incorrect predictions
Label <=50K was predicted right 4644 times
Label >50K was predicted right 1704 times
Label <=50K was predicted wrong 2190 times
Label >50K was predicted wrong 511 times
---------- Report--------------
Testing accuracy for forest with 2 trees depth 2 and feature selection weight 0.174078
Total tested data is 9049
The accuracy of the tree is 72.8478% with 6592 correct predictions and 2457 incorrect predictions
Label <=50K was predicted right 4932 times
Label >50K was predicted right 1660 times
Label <=50K was predicted wrong 1898 times
Label >50K was predicted wrong 559 times
---------- Report--------------
Testing accuracy for forest with 2 trees depth 2 and feature selection weight 0.174078
Total tested data is 9049
The accuracy of the tree is 65.6979% with 5945 correct predictions and 3104 incorrect predictions
Label <=50K was predicted right 4079 times
Label >50K was predicted right 1866 times
Label <=50K was predicted wrong 2683 times
Label >50K was predicted wrong 421 times
---------- Report--------------
Testing accuracy for forest with 2 trees depth 2 and feature selection weight 0.174078
Total tested data is 9049
The accuracy of the tree is 71.7096% with 6489 correct predictions and 2560 incorrect predictions
Label <=50K was predicted right 4753 times
Label >50K was predicted right 1736 times
Label <=50K was predicted wrong 2086 times
Label >50K was predicted wrong 474 times
---------- Report--------------
Testing accuracy for forest with 2 trees depth 2 and feature selection weight 0.267261
Total tested data is 9049
The accuracy of the tree is 82.1969% with 7438 correct predictions and 1611 incorrect predictions
Label <=50K was predicted right 6416 times
Label >50K was predicted right 1022 times
Label <=50K was predicted wrong 347 times
Label >50K was predicted wrong 1264 times
---------- Report--------------
Testing accuracy for forest with 2 trees depth 2 and feature selection weight 0.267261
Total tested data is 9049
The accuracy of the tree is 70.3503% with 6366 correct predictions and 2683 incorrect predictions
Label <=50K was predicted right 4477 times
Label >50K was predicted right 1889 times
Label <=50K was predicted wrong 2357 times
Label >50K was predicted wrong 326 times
---------- Report--------------
Testing accuracy for forest with 2 trees depth 2 and feature selection weight 0.267261
Total tested data is 9049
The accuracy of the tree is 72.5716% with 6567 correct predictions and 2482 incorrect predictions
Label <=50K was predicted right 4699 times
Label >50K was predicted right 1868 times
Label <=50K was predicted wrong 2052 times
Label >50K was predicted wrong 430 times
---------- Report--------------
Testing accuracy for forest with 2 trees depth 2 and feature selection weight 0.174078
Total tested data is 9049
The accuracy of the tree is 70.6266% with 6391 correct predictions and 2658 incorrect predictions
Label <=50K was predicted right 4471 times
Label >50K was predicted right 1920 times
Label <=50K was predicted wrong 2353 times
Label >50K was predicted wrong 305 times
---------- Report--------------
Testing accuracy for forest with 2 trees depth 2 and feature selection weight 0.174078
Total tested data is 9049
The accuracy of the tree is 79.5226% with 7196 correct predictions and 1853 incorrect predictions
Label <=50K was predicted right 5984 times
Label >50K was predicted right 1212 times
Label <=50K was predicted wrong 868 times
Label >50K was predicted wrong 985 times
---------- Report--------------
Testing accuracy for forest with 2 trees depth 2 and feature selection weight 0.174078
Total tested data is 9049
The accuracy of the tree is 71.4996% with 6470 correct predictions and 2579 incorrect predictions
Label <=50K was predicted right 4552 times
Label >50K was predicted right 1918 times
Label <=50K was predicted wrong 2263 times
Label >50K was predicted wrong 316 times
---------- Report--------------
Testing accuracy for forest with 9 trees depth 12 and feature selection weight 0.174078
Total tested data is 9049
The accuracy of the tree is 72.008% with 6516 correct predictions and 2533 incorrect predictions
Label <=50K was predicted right 4564 times
Label >50K was predicted right 1952 times
Label <=50K was predicted wrong 2197 times
Label >50K was predicted wrong 336 times
---------- Report--------------
Testing accuracy for forest with 9 trees depth 12 and feature selection weight 0.174078
Total tested data is 9049
The accuracy of the tree is 71.0244% with 6427 correct predictions and 2622 incorrect predictions
Label <=50K was predicted right 4422 times
Label >50K was predicted right 2005 times
Label <=50K was predicted wrong 2377 times
Label >50K was predicted wrong 245 times
---------- Report--------------
Testing accuracy for forest with 9 trees depth 12 and feature selection weight 0.174078
Total tested data is 9049
The accuracy of the tree is 72.6379% with 6573 correct predictions and 2476 incorrect predictions
Label <=50K was predicted right 4602 times
Label >50K was predicted right 1971 times
Label <=50K was predicted wrong 2153 times
Label >50K was predicted wrong 323 times
---------- Report--------------
Testing accuracy for forest with 2 trees depth 2 and feature selection weight 0.0980581
Total tested data is 9049
The accuracy of the tree is 77.1245% with 6979 correct predictions and 2070 incorrect predictions
Label <=50K was predicted right 5848 times
Label >50K was predicted right 1131 times
Label <=50K was predicted wrong 918 times
Label >50K was predicted wrong 1152 times
---------- Report--------------
Testing accuracy for forest with 2 trees depth 2 and feature selection weight 0.0980581
Total tested data is 9049
The accuracy of the tree is 74.0966% with 6705 correct predictions and 2344 incorrect predictions
Label <=50K was predicted right 4824 times
Label >50K was predicted right 1881 times
Label <=50K was predicted wrong 2001 times
Label >50K was predicted wrong 343 times
---------- Report--------------
Testing accuracy for forest with 2 trees depth 2 and feature selection weight 0.0980581
Total tested data is 9049
The accuracy of the tree is 69.5657% with 6295 correct predictions and 2754 incorrect predictions
Label <=50K was predicted right 4412 times
Label >50K was predicted right 1883 times
Label <=50K was predicted wrong 2341 times
Label >50K was predicted wrong 413 times
---------- Report--------------
Testing accuracy for forest with 6 trees depth 10 and feature selection weight 0.0980581
Total tested data is 9049
The accuracy of the tree is 71.9195% with 6508 correct predictions and 2541 incorrect predictions
Label <=50K was predicted right 4713 times
Label >50K was predicted right 1795 times
Label <=50K was predicted wrong 2090 times
Label >50K was predicted wrong 451 times
---------- Report--------------
Testing accuracy for forest with 6 trees depth 10 and feature selection weight 0.0980581
Total tested data is 9049
The accuracy of the tree is 72.0522% with 6520 correct predictions and 2529 incorrect predictions
Label <=50K was predicted right 4526 times
Label >50K was predicted right 1994 times
Label <=50K was predicted wrong 2259 times
Label >50K was predicted wrong 270 times
---------- Report--------------
Testing accuracy for forest with 6 trees depth 10 and feature selection weight 0.0980581
Total tested data is 9049
The accuracy of the tree is 71.1791% with 6441 correct predictions and 2608 incorrect predictions
Label <=50K was predicted right 4511 times
Label >50K was predicted right 1930 times
Label <=50K was predicted wrong 2300 times
Label >50K was predicted wrong 308 times
---------- Report--------------
Testing accuracy for forest with 6 trees depth 8 and feature selection weight 0.0980581
Total tested data is 9049
The accuracy of the tree is 73.4667% with 6648 correct predictions and 2401 incorrect predictions
Label <=50K was predicted right 4942 times
Label >50K was predicted right 1706 times
Label <=50K was predicted wrong 1901 times
Label >50K was predicted wrong 500 times
---------- Report--------------
Testing accuracy for forest with 6 trees depth 8 and feature selection weight 0.0980581
Total tested data is 9049
The accuracy of the tree is 72.5384% with 6564 correct predictions and 2485 incorrect predictions
Label <=50K was predicted right 4668 times
Label >50K was predicted right 1896 times
Label <=50K was predicted wrong 2138 times
Label >50K was predicted wrong 347 times
---------- Report--------------
Testing accuracy for forest with 6 trees depth 8 and feature selection weight 0.0980581
Total tested data is 9049
The accuracy of the tree is 71.577% with 6477 correct predictions and 2572 incorrect predictions
Label <=50K was predicted right 4573 times
Label >50K was predicted right 1904 times
Label <=50K was predicted wrong 2186 times
Label >50K was predicted wrong 386 times
---------- Report--------------
Testing accuracy for forest with 6 trees depth 4 and feature selection weight 0.0980581
Total tested data is 9049
The accuracy of the tree is 73.4667% with 6648 correct predictions and 2401 incorrect predictions
Label <=50K was predicted right 4786 times
Label >50K was predicted right 1862 times
Label <=50K was predicted wrong 2021 times
Label >50K was predicted wrong 380 times
---------- Report--------------
Testing accuracy for forest with 6 trees depth 4 and feature selection weight 0.0980581
Total tested data is 9049
The accuracy of the tree is 72.1295% with 6527 correct predictions and 2522 incorrect predictions
Label <=50K was predicted right 4793 times
Label >50K was predicted right 1734 times
Label <=50K was predicted wrong 2003 times
Label >50K was predicted wrong 519 times
---------- Report--------------
Testing accuracy for forest with 6 trees depth 4 and feature selection weight 0.0980581
Total tested data is 9049
The accuracy of the tree is 73.3341% with 6636 correct predictions and 2413 incorrect predictions
Label <=50K was predicted right 4770 times
Label >50K was predicted right 1866 times
Label <=50K was predicted wrong 2048 times
Label >50K was predicted wrong 365 times
......@@ -29,7 +29,6 @@ void encodeData(vector <vector<string>> datasetAsString, vector <vector<string>>
for (featUniqueItr = featureUniqueValues.begin(); featUniqueItr != featureUniqueValues.end(); featUniqueItr++) {
int featIdx = featUniqueItr ->first;
cout<<featIdx<<endl;
encodedFeatures.erase(encodedFeatures.begin()+featIdx);
encodedFeatureTypes.erase(encodedFeatureTypes.begin()+featIdx);
map<string, int> unique = featUniqueItr->second;
......
......@@ -65,8 +65,14 @@ int main(int argc, char *argv[]) {
encodedFeatureTypes = featureTypes;
vector<int> featuresToEncode;
featuresToEncode.push_back(1);
featuresToEncode.push_back(3);
featuresToEncode.push_back(5);
featuresToEncode.push_back(6);
featuresToEncode.push_back(7);
featuresToEncode.push_back(8);
featuresToEncode.push_back(9);
featuresToEncode.push_back(13);
std::sort(featuresToEncode.begin(), featuresToEncode.end(), std::greater<int>());
......@@ -141,7 +147,7 @@ int main(int argc, char *argv[]) {
cout << endl;
randomForest->predict("HARD",testData, randomForest, features);
randomForest->predict("HARD",testData, randomForest, encodedfeatures);
for (int i = 0; i<randomForest->trees.size(); i++){
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment