Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

Machine Learning | Jul 13 - Jul 31 | Aayushi (2020)

Hello Aayushi,

In linear regression problem, we find the model coefficient by using df.coef_ and
1) then we drop the independent variables(columns) that are not corellated to the target?
2) if true, then do we start the regression again in a new ipynb file or can continue in the same file where we found out the df.coef_?
 

Aayushi_6

Well-Known Member
Hello Aayushi,

In linear regression problem, we find the model coefficient by using df.coef_ and
1) then we drop the independent variables(columns) that are not corellated to the target?
2) if true, then do we start the regression again in a new ipynb file or can continue in the same file where we found out the df.coef_?

Hi Pritam,

You can continue regression again within the same file only. No need to create another notebook.
Hope it helps!
 

Naveen B

Member

Hi,I have couple of doubt regarding scaling, standardising, a normalization:

when to use log transformation ?
When do we need to transform the skewed columns in the data ?
how to transform left skewed data to normal distributed data ?
after transforming data with log transformation do we require to do scaling ?
what are the algorithms which require data to be normally distributed ?
while performing scaling do we need to scale every features in the data ?
if so then do we need to scale that one hot encoded too ?
when to perform normalization and when to perform standardization ?
Does transforming or scaling change the characteristics of data ?
 
Hi Aayushi,
I am getting some thing not good scatter plot output for project one. Can you please look the same code where I am getting struck.
Regards
Ajit Kumar
 

Attachments

  • Mercedes-Benz_Greener_Manufacturing_Ajit_Kumar.zip
    49.5 KB · Views: 43
Hello Ayushi,

user_id = 'A3R5OBKS7OM2IR'
muvee_id = 'Movie1'
r_rating = 5.0
formula.predict(user_id, muvee_id, r_rating = r_rating, verbose=True)

I have done this model for project amazon recommend system?
could you please suggest how can i iterate over entire column to get prediction for each and every column.

Br,
Nilesh Meher
 

Aayushi_6

Well-Known Member
Hi Aayushi,
I am getting some thing not good scatter plot output for project one. Can you please look the same code where I am getting struck.
Regards
Ajit Kumar

Hi Ajit,

Saw your script. I think it is just because of more number of data points. Can you please try to increase the figure size of matplotlib. Then check if the visualization seems better.
Anyways, don't concern about this much. Rest all is good.
 
Hi Ayushi,

In the Mercedes benz problem there are lot of categorical variable with lots of unique values, some of which are very low in number e.g ab,ag,p,ae etc. Should we combine these values into another value say "other" and then perform one hot encoding ?

whether converting these categorical column(X0,X1,X2..) in to multiple one hot encoded columns sounds a good approach ? if not please suggest.

upload_2020-7-27_21-51-49.png
 

Aayushi_6

Well-Known Member
Hi Ayushi,

In the Mercedes benz problem there are lot of categorical variable with lots of unique values, some of which are very low in number e.g ab,ag,p,ae etc. Should we combine these values into another value say "other" and then perform one hot encoding ?

whether converting these categorical column(X0,X1,X2..) in to multiple one hot encoded columns sounds a good approach ? if not please suggest.

View attachment 10729

Hi Harshvardhan,

You should combine these values and then perform one hot encoding.
 

Dr. K. Rajiv

Customer
Customer
Mam how can we create a automatic question answering system in NLP
Example suppose we take data from Wikipedia about a certain topic.
Then our model should create questions about that topic and stor
 

Dr. K. Rajiv

Customer
Customer
Mam how can we create a automatic question answering system in NLP
Example suppose we take data from Wikipedia about a certain topic.
Then our model should create questions about that topic and store in certain format.
And keep track of answers .
And we can also add it to feed our chatbot or
Just a structure like a pandas dataframe where questions and answers are stored
How can we do it.
Can you help me converting the idea into program.
Can you say a sample code for it
 

Dr. K. Rajiv

Customer
Customer
Hi Learners,

This thread is for you to discuss the queries and concepts related to Machine Learning only.

# Google Drive Link :
https://drive.google.com/drive/folders/12L3iJyPJ72W9wyuz7Ku0knys48yN42gB

# Batch Specific Folder (in Google Drive) :
ML | Jul 13 - Jul 31 | 2020

Happy Learning !!

Regards,
Team Simplilearn
Mam how can we create a automatic question answering system in NLP
Example suppose we take data from Wikipedia about a certain topic.
Then our model should create questions about that topic and store in certain format.
And keep track of answers .
And we can also add it to feed our chatbot or
Just a structure like a pandas dataframe where questions and answers are stored
How can we do it.
Can you help me converting the idea into program.
Can you say a sample code for it
 
Hi Ayushi ,

For Mercedez Benz project :
1. I have combined the test and train data.
2. Replaced all the categorical variable coming less than 150 with value others.
3. after that when trying to apply label encoding I am getting below error , could you assist.
Should I not combine train and test data ? or should not replace less frequent value with Others ?
 

Attachments

  • pic1.jpg
    pic1.jpg
    381.9 KB · Views: 33
  • pic2.jpg
    pic2.jpg
    412.7 KB · Views: 39

Aayushi_6

Well-Known Member
Hi Ayushi ,

For Mercedez Benz project :
1. I have combined the test and train data.
2. Replaced all the categorical variable coming less than 150 with value others.
3. after that when trying to apply label encoding I am getting below error , could you assist.
Should I not combine train and test data ? or should not replace less frequent value with Others ?

Hi Harshvardhan,

You should not concatenate the data before label encoding.
While doing fit_transform() for label encoder, you can fit your data.

Following is the sample lines of code for your help:

for var in categorical_columns:
lb = LabelEncoder()
full_var_data = pd.concat((train[var],test[var]),axis=0).astype('str')
lb.fit( full_var_data )
train[var] = lb.transform(train[var].astype('str'))
test[var] = lb.transform(test[var].astype('str'))

Hope it helps!
Happy Learning :)
 

Aayushi_6

Well-Known Member
Mam how can we create a automatic question answering system in NLP
Example suppose we take data from Wikipedia about a certain topic.
Then our model should create questions about that topic and store in certain format.
And keep track of answers .
And we can also add it to feed our chatbot or
Just a structure like a pandas dataframe where questions and answers are stored
How can we do it.
Can you help me converting the idea into program.
Can you say a sample code for it

Hi Rajiv,

In your use case, first of all you need to perform data preparation. Let say, you prepare an excel sheet of 100+ questions along with their corresponding answers.
Now, since its an excel sheet -> can be easily readable in a pandas data frame format.
In order to build a simple QnA Bot, you can use cosine similarity to match the similarity between user query and the question-answer pair available in your dataset. Whichever question is most similar to the user query. You can retrieve corresponding answer to it.
Please make a note: you also need to perform all text mining steps before computing cosine similarity as similar metric need vectors as an input.

This is how you can start.
Great question asked.
Hope you understand the approach.
Happy Learning :)
 
Hi Harshvardhan,

You should not concatenate the data before label encoding.
While doing fit_transform() for label encoder, you can fit your data.

Following is the sample lines of code for your help:

for var in categorical_columns:
lb = LabelEncoder()
full_var_data = pd.concat((train[var],test[var]),axis=0).astype('str')
lb.fit( full_var_data )
train[var] = lb.transform(train[var].astype('str'))
test[var] = lb.transform(test[var].astype('str'))

Hope it helps!
Happy Learning :)
Thanks Ayushi , that worked !
 

Renuka Badduri

Active Member
Hi All,
Anyone tried to combine train and test data sets of Mercedez project in order to obtain ZERO variance column?

Thanks,
Renuka.
 
Hi Aayushi,

In the Recommendation Model project(Building user-based recommendation model for Amazon) . How do I derive X and y before splitting into train and test
It is a sparse matrix.
Do I need to do pivot then derive X and y variable
 
Hi Rajiv,

In your use case, first of all you need to perform data preparation. Let say, you prepare an excel sheet of 100+ questions along with their corresponding answers.
Now, since its an excel sheet -> can be easily readable in a pandas data frame format.
In order to build a simple QnA Bot, you can use cosine similarity to match the similarity between user query and the question-answer pair available in your dataset. Whichever question is most similar to the user query. You can retrieve corresponding answer to it.
Please make a note: you also need to perform all text mining steps before computing cosine similarity as similar metric need vectors as an input.

This is how you can start.
Great question asked.
Hope you understand the approach.
Happy Learning :)
Hi Pritam,

You can continue regression again within the same file only. No need to create another notebook.
Hope it helps!
Hi Learners,

This thread is for you to discuss the queries and concepts related to Machine Learning only.

# Google Drive Link :
https://drive.google.com/drive/folders/12L3iJyPJ72W9wyuz7Ku0knys48yN42gB

# Batch Specific Folder (in Google Drive) :
ML | Jul 13 - Jul 31 | 2020

Happy Learning !!

Regards,
Team Simplilearn
 

Attachments

  • Screenshot (3).png
    Screenshot (3).png
    144.5 KB · Views: 29
Hello Ma'am,

I am facing problems in the KMeans model, as i am not able to achieve the mean_dist similar to your example shown in class. Below I have attached you sample model where first you get 4 outputs for the mean_dist and the next line when again you enter mean_dist there are 29 different outputs(len(mean_dist)). I am not being able to achieve that although I am trying to replicate your code. Kindly help me understand my mistake.
 

Attachments

  • Screenshot (92).png
    Screenshot (92).png
    112.8 KB · Views: 27

Aayushi_6

Well-Known Member
Hello Ma'am,

I am facing problems in the KMeans model, as i am not able to achieve the mean_dist similar to your example shown in class. Below I have attached you sample model where first you get 4 outputs for the mean_dist and the next line when again you enter mean_dist there are 29 different outputs(len(mean_dist)). I am not being able to achieve that although I am trying to replicate your code. Kindly help me understand my mistake.

Hi Pritam,

Sorry for the confusion.
If you run elbow test for range(1,5) - then u will get mean_dist - 4 values
If you run elbow test for range(1,30) - then u will get mean_dist - 29 values

I ran for both the cases one by one. So u are seeing 2 different mean_dist.

Hope it helps!
 

Naveen B

Member
Hi Mam,

Have you uploaded the additional example XGB for Regressor and End to End ML project example (like loan detection) in drive ?
If so under which folder to look for mam @Aayushi_6 ?
 

kashinath Adkine

Member
Simplilearn Support
Alumni
Hello Aayushi Mam,

This is kashinath Adkine,

As discussed earlier in the class i have facing some issue with project and while i tried to generate some codes it will give me an error. Please find below code and attached snap.

# define object preserving 95% variance
pca = PCA(n_components=0.95)
# fit your training data
pca.fit(X_train)
# Tranform X_train, X_test and new_data
X_train = pca.transform(X_train)
X_test = pca.transform(X_test)
new_data = pca.transform(new_data)
 

Attachments

  • dimetionaly reduction code.PNG
    dimetionaly reduction code.PNG
    37.6 KB · Views: 20
  • error.PNG
    error.PNG
    110.7 KB · Views: 19

Naveen B

Member
Hello Aayushi Mam,

This is kashinath Adkine,

As discussed earlier in the class i have facing some issue with project and while i tried to generate some codes it will give me an error. Please find below code and attached snap.

# define object preserving 95% variance
pca = PCA(n_components=0.95)
# fit your training data
pca.fit(X_train)
# Tranform X_train, X_test and new_data
X_train = pca.transform(X_train)
X_test = pca.transform(X_test)
new_data = pca.transform(new_data)


Hi Kashinath ,

try the following code , check whether it helps:
pca=PCA(.95,svd_solver="full")
pca.fit(X_train)

here .95 is the variance percentage
after fitting the object

you can check the number of components with total of 95% variance by :
pca.n_components_

you can check the variance by : pca.n_components

you can also check variance ration by : pca.explained_variance_ratio_

Finally with variance expressed by all the components you can choose the components which exhibits more variance by plotting them in a bar chart and choose the number of components

pd.DataFrame(pca.explained_variance_ratio_).plot(kind="bar",color="red",title="Variance exhibited by N Components");

Then you can use the number in pca = PCA(n_components="Here") and re-iterate as you did

#Problem am seeing in your code is you are mentioning variance inplace of number of components

Hope it helps..!
Cheers...!





 

kashinath Adkine

Member
Simplilearn Support
Alumni
Pls share the screen snap of it
# Perform Dimensionality Reduction
from sklearn.decomposition import PCA
# define your object preserving 95% variance
pca=PCA(.95,svd_solver="full")
# fit training data
pca.fit(X_train)
# Tranform X_train, X_test and new_data
X_train = pca.transform(X_train)
X_test = pca.transform(X_test)
new_data = pca.transform(new_data)


Error:

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-13-ccaebc730fe1> in <module>
14
15 X_train = pca.transform(X_train)
---> 16X_test = pca.transform(X_test)
17 new_data = pca.transform(new_data)

/usr/local/lib/python3.7/site-packages/sklearn/decomposition/_base.py in transform(self, X)
125 check_is_fitted(self)
126
--> 127X = check_array(X)
128 if self.mean_ is not None:
129 X = X - self.mean_

/usr/local/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
576 if force_all_finite:
577 _assert_all_finite(array,
--> 578 allow_nan=force_all_finite == 'allow-nan')
579
580 if ensure_min_samples > 0:

/usr/local/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
58 msg_err.format
59 (type_err,
---> 60 msg_dtype if msg_dtype is not None else X.dtype)
61 )
62 # for object dtype data, we only check for NaNs (GH-13254)

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
 

kashinath Adkine

Member
Simplilearn Support
Alumni
Hello Aayushi Mam,

This is kashinath Adkine,

As discussed earlier in the class i have facing some issue with project and while i tried to generate some codes it will give me an error. Please find below code and attached snap.

# define object preserving 95% variance
pca = PCA(n_components=0.95)
# fit your training data
pca.fit(X_train)
# Tranform X_train, X_test and new_data
X_train = pca.transform(X_train)
X_test = pca.transform(X_test)
new_data = pca.transform(new_data)
 

Attachments

  • dimetionaly reduction code.PNG
    dimetionaly reduction code.PNG
    37.6 KB · Views: 15
  • error.PNG
    error.PNG
    110.7 KB · Views: 15
Hello Ayushi,

user_id = 'A3R5OBKS7OM2IR'
muvee_id = 'Movie1'
r_rating = 5.0
formula.predict(user_id, muvee_id, r_rating = r_rating, verbose=True)

I have done this model for project amazon recommend system?
could you please suggest how can i iterate over entire column to get prediction for each and every column.

Br,
Nilesh Meher
Hi,

How did you get this . I am struck at splitting . After splitting when i am applying SVD, It is giving error .
Regards
Nikita Singh
 
Top