Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

Machine Learning Advanced Certification | March 22 - April 9th

_77862

Member
Alumni
I am facing some issue while calculating LogisticRegression and linearRegrassion using PCA
Please check and let me know where is the in my code...
 

Attachments

  • Error_doc.pdf
    264.2 KB · Views: 13

Partha Chaudhuri

Active Member
Question.

In the Feature data set, there are a lot of null values. Which strategy is recommended

a. Impute and then Train_Test split
b. Train_test split and then impute Train data and test data separately.

When we use impute with most frequent option, Point a and Point b is not the same.

Please recommend.
 

Partha Chaudhuri

Active Member
Shared the Assisted Practice of Random Classifiers.

Quesion. Need help to understand and compare

a. Confusion Matrix
b. Classification Report
 

Attachments

  • Partha_RandomForest.pdf
    45.2 KB · Views: 6

mdwajtech

Well-Known Member
I am facing some issue while calculating LogisticRegression and linearRegrassion using PCA
Please check and let me know where is the in my code...
ID is a column which will not go in model..You need to drop that from feature set
 
In the project1-Mercedes greenary manufacturing, I found the accuracy for the train dataset as there is no dependent variable for test data how to find the accuracy of the test data?
please anyone help regarding the same
 

Raghavendra B M

Moderator
Staff member
Simplilearn Support
In the project1-Mercedes greenary manufacturing, I found the accuracy for the train dataset as there is no dependent variable for test data how to find the accuracy of the test data?
please anyone help regarding the same
Hi SHRILAVANYA H ADIGOPULA,

Just download and go through the Project Mentoring recording for the project for the "Mercedes-Benz Greener Manufacturing" from the below link. The project is properly discussed in the recording :

# Project Name - Mercedes-Benz Greener Manufacturing - Project Mentoring :

Regards,
Raghavendra
 
Hi SHRILAVANYA H ADIGOPULA,

Just download and go through the Project Mentoring recording for the project for the "Mercedes-Benz Greener Manufacturing" from the below link. The project is properly discussed in the recording :

# Project Name - Mercedes-Benz Greener Manufacturing - Project Mentoring :

Regards,
Raghavendra
Thank you @Raghavendra B M sir for your valuable response.
 

Partha Chaudhuri

Active Member
Hi Wajahat,

I would need clarity on

1. Multicollinearity
2. Encoders wherein test dataset there are some new labels
3. How do i choose between lda and pca. In lda does x and y need to be of the same data type?
4. Fit, fit_transform, transform - when where.
5. We use PCA to reduce dimension. How do we translate to underlying features
6. In Project 1- I have been able to create the xgboost model. Now how do I create optimal testing practice?
7. All the libraries have a lot of parameters. Any suggestion on where to look for some understanding. Documentation has so many details. It is confusing
8. What forums which are active where we can continuously develop ML skills
9. What are ways to efficiently debug the code errors? Any tips or suggestions
10. Standardize and normalize routines. Eg
 

Partha Chaudhuri

Active Member
Hi Wajahat,

This is regarding my Assessment Project submission. I did Project 1. In the label encoder section, my label encoder transforms for the test were failing because of the absence of some label value in the train. I solved it in a way that is a bit different from what you shared. I found this approach from some data science forums. There are some cons to it. However, I did whatever best I could do.

Here are the details


___________________________
le.fit(pd.concat([feature['X0'], feature_test['X0']], axis = 0 , sort = False)) #there are some unrepresented values. Hence had to concatenate
feature['X0'] = le.transform(feature['X0'])
feature_test['X0'] = le.transform(feature_test['X0'])
_________________________


feature is the whole date set of train minus the "y' from train.csv

feature_test is the whole date set of test from test.csv



I will like to know your thoughts.

Thanks
Regards
Partha
 

Abhilash Sah

New Member
Alumni
ValueError Traceback (most recent call last)
<ipython-input-28-a85277fe1e0e> in <module>
----> 1 regressor.fit(x_train, y_train)
2 regressor.predict(x_test)

/opt/anaconda3/lib/python3.8/site-packages/sklearn/tree/_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
1240 """
1241
-> 1242 super().fit(
1243 X, y,
1244 sample_weight=sample_weight,

/opt/anaconda3/lib/python3.8/site-packages/sklearn/tree/_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
270
271 if len(y) != n_samples:
--> 272 raise ValueError("Number of labels=%d does not match "
273 "number of samples=%d" % (len(y), n_samples))
274 if not 0 <= self.min_weight_fraction_leaf <= 0.5:

ValueError: Number of labels=4 does not match number of samples=6

why I am receiving this error while executing the code ?

Thanks.
 

Partha Chaudhuri

Active Member
ValueError Traceback (most recent call last)
<ipython-input-28-a85277fe1e0e> in <module>
----> 1 regressor.fit(x_train, y_train)
2 regressor.predict(x_test)

/opt/anaconda3/lib/python3.8/site-packages/sklearn/tree/_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
1240 """
1241
-> 1242 super().fit(
1243 X, y,
1244 sample_weight=sample_weight,

/opt/anaconda3/lib/python3.8/site-packages/sklearn/tree/_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
270
271 if len(y) != n_samples:
--> 272 raise ValueError("Number of labels=%d does not match "
273 "number of samples=%d" % (len(y), n_samples))
274 if not 0 <= self.min_weight_fraction_leaf <= 0.5:

ValueError: Number of labels=4 does not match number of samples=6

why I am receiving this error while executing the code ?

Thanks.
this error might indicate that you are trying to train the Random Forest Classifier on a dataset that consists of 1459 samples and 1460 labels of your datset. And the number of testX != the number of y.

source: https://www.kaggle.com/questions-and-answers/104027

How did i find this out?

Searched in google with the search pattern -> Number of labels does not match number of samples

Hope this helps
 
Top