### Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

# Machine Learning Advanced Certification | March 22 - April 9th

#### Partha Chaudhuri

##### Active Member
Hello Friend;

This is a great video on Bayes theorem

hhttps://youtu.be/HZGCoVF3YvM

#### Ravindra Palli

##### Customer
Customer
Horse - Random forest.

#### Attachments

• horses_random_forest..pdf
100.7 KB · Views: 9

#### PAVITRA HUGAR

##### Member
error in calculating variance equals to zero

#### Attachments

• error in finding variance equals to zero.JPG
106.9 KB · Views: 19

#### Partha Chaudhuri

##### Active Member
error in calculating variance equals to zero
i created list of columns which has var is zero

#Creating list out of the index above
column_to_delete =list(data.var()[data.var() == 0].index)

then I use drop function and in drop, I pass all lists of columns to delete.

#### PAVITRA HUGAR

##### Member
i created list of columns which has var is zero

#Creating list out of the index above
column_to_delete =list(data.var()[data.var() == 0].index)

then I use drop function and in drop, I pass all lists of columns to delete.

Thank u for the help @Partha Chaudhuri .....there was a minor error which got sorted

#### _77862

##### Member
Alumni
I am facing some issue while calculating LogisticRegression and linearRegrassion using PCA
Please check and let me know where is the in my code...

#### Attachments

• Error_doc.pdf
264.2 KB · Views: 14

#### Partha Chaudhuri

##### Active Member
Question.

In the Feature data set, there are a lot of null values. Which strategy is recommended

a. Impute and then Train_Test split
b. Train_test split and then impute Train data and test data separately.

When we use impute with most frequent option, Point a and Point b is not the same.

#### Partha Chaudhuri

##### Active Member
Shared the Assisted Practice of Random Classifiers.

Quesion. Need help to understand and compare

a. Confusion Matrix
b. Classification Report

#### Attachments

• Partha_RandomForest.pdf
45.2 KB · Views: 6

#### mdwajtech

##### Well-Known Member
I am facing some issue while calculating LogisticRegression and linearRegrassion using PCA
Please check and let me know where is the in my code...
ID is a column which will not go in model..You need to drop that from feature set

#### HARSHIT PALIWAL

##### Member
ACF & PACF Plots by Harshit

#### Attachments

• ACF_PACF_Plots_Exc_Harshit.pdf
449.5 KB · Views: 6

beer production

#### Attachments

• beerProduction.pdf
231.1 KB · Views: 6

##### Member
In the project1-Mercedes greenary manufacturing, I found the accuracy for the train dataset as there is no dependent variable for test data how to find the accuracy of the test data?
please anyone help regarding the same

#### Raghavendra B M

##### Moderator
Staff member
Simplilearn Support
In the project1-Mercedes greenary manufacturing, I found the accuracy for the train dataset as there is no dependent variable for test data how to find the accuracy of the test data?
please anyone help regarding the same

Just download and go through the Project Mentoring recording for the project for the "Mercedes-Benz Greener Manufacturing" from the below link. The project is properly discussed in the recording :

# Project Name - Mercedes-Benz Greener Manufacturing - Project Mentoring :

Regards,
Raghavendra

##### Member

Just download and go through the Project Mentoring recording for the project for the "Mercedes-Benz Greener Manufacturing" from the below link. The project is properly discussed in the recording :

# Project Name - Mercedes-Benz Greener Manufacturing - Project Mentoring :

Regards,
Raghavendra
Thank you @Raghavendra B M sir for your valuable response.

#### Partha Chaudhuri

##### Active Member
Hi Wajahat,

I would need clarity on

1. Multicollinearity
2. Encoders wherein test dataset there are some new labels
3. How do i choose between lda and pca. In lda does x and y need to be of the same data type?
4. Fit, fit_transform, transform - when where.
5. We use PCA to reduce dimension. How do we translate to underlying features
6. In Project 1- I have been able to create the xgboost model. Now how do I create optimal testing practice?
7. All the libraries have a lot of parameters. Any suggestion on where to look for some understanding. Documentation has so many details. It is confusing
8. What forums which are active where we can continuously develop ML skills
9. What are ways to efficiently debug the code errors? Any tips or suggestions
10. Standardize and normalize routines. Eg

#### Partha Chaudhuri

##### Active Member
Hi Wajahat,

This is regarding my Assessment Project submission. I did Project 1. In the label encoder section, my label encoder transforms for the test were failing because of the absence of some label value in the train. I solved it in a way that is a bit different from what you shared. I found this approach from some data science forums. There are some cons to it. However, I did whatever best I could do.

Here are the details

___________________________
le.fit(pd.concat([feature['X0'], feature_test['X0']], axis = 0 , sort = False)) #there are some unrepresented values. Hence had to concatenate
feature['X0'] = le.transform(feature['X0'])
feature_test['X0'] = le.transform(feature_test['X0'])
_________________________

feature is the whole date set of train minus the "y' from train.csv

feature_test is the whole date set of test from test.csv

I will like to know your thoughts.

Thanks
Regards
Partha

#### Abhilash Sah

##### New Member
Alumni
ValueError Traceback (most recent call last)
<ipython-input-28-a85277fe1e0e> in <module>
----> 1 regressor.fit(x_train, y_train)
2 regressor.predict(x_test)

/opt/anaconda3/lib/python3.8/site-packages/sklearn/tree/_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
1240 """
1241
-> 1242 super().fit(
1243 X, y,
1244 sample_weight=sample_weight,

/opt/anaconda3/lib/python3.8/site-packages/sklearn/tree/_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
270
271 if len(y) != n_samples:
--> 272 raise ValueError("Number of labels=%d does not match "
273 "number of samples=%d" % (len(y), n_samples))
274 if not 0 <= self.min_weight_fraction_leaf <= 0.5:

ValueError: Number of labels=4 does not match number of samples=6

why I am receiving this error while executing the code ?

Thanks.

#### Partha Chaudhuri

##### Active Member
ValueError Traceback (most recent call last)
<ipython-input-28-a85277fe1e0e> in <module>
----> 1 regressor.fit(x_train, y_train)
2 regressor.predict(x_test)

/opt/anaconda3/lib/python3.8/site-packages/sklearn/tree/_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
1240 """
1241
-> 1242 super().fit(
1243 X, y,
1244 sample_weight=sample_weight,

/opt/anaconda3/lib/python3.8/site-packages/sklearn/tree/_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
270
271 if len(y) != n_samples:
--> 272 raise ValueError("Number of labels=%d does not match "
273 "number of samples=%d" % (len(y), n_samples))
274 if not 0 <= self.min_weight_fraction_leaf <= 0.5:

ValueError: Number of labels=4 does not match number of samples=6

why I am receiving this error while executing the code ?

Thanks.
this error might indicate that you are trying to train the Random Forest Classifier on a dataset that consists of 1459 samples and 1460 labels of your datset. And the number of testX != the number of y.