Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

Machine Learning | Feb 3 - Feb 25 | Prashant Nair

msvkrishna

Member
Alumni
Hi All

PFA the assignment for Todays session

Happy Learning


HI Prashant... I am doing the Cars assignment. Question : during the label/Onehotencoder for the categorical column "Type" for the cars. Which has 5 distinct values ( sedan, ..etc.). , from what I recall from the class, a) I should use Categorical_values as [5] , is that correct ? b) when I use that, I was expecting 5 new variables (columns to be added to the feature, making the total feature shape to be 804 x 12 columns(7+5 for type), instead I get 804 x 9. Can you please advice ? am I doing this wrong ?
 

msvkrishna

Member
Alumni
Hi Prashant,

PFA the assignment of 4th February.

Thanks
Priyavart
Hi Prashant,

PFA the assignment of 4th February.

Thanks
Priyavart
HI Prashant... I am doing the Cars assignment. Question : during the label/Onehotencoder for the categorical column "Type" for the cars. Which has 5 distinct values ( sedan, ..etc.). , from what I recall from the class, a) I should use Categorical_values as [5] , is that correct ? b) when I use that, I was expecting 5 new variables (columns to be added to the feature, making the total feature shape to be 804 x 12 columns(7+5 for type), instead I get 804 x 9. Can you please advice ? am I doing this wrong ?

Hi Prashant...., can you please look at my ? below , this is for the Cars problem on Day 4. thanks Krishna
 

Prashant_Nair

Well-Known Member
Simplilearn Support
Alumni
Trainer
a) I should use Categorical_values as [5] , is that correct ?
No. categorical_values[col_index_no] is the syntax

b) when I use that, I was expecting 5 new variables (columns to be added to the feature, making the total feature shape to be 804 x 12 columns(7+5 for type), instead I get 804 x 9. Can you please advice ? am I doing this wrong ?

Change as per the solution in a) and it will work
 

Charan Telapalli

Member
Alumni
Hi Prashant,

I am unable to upload my 4th day assignment. Please assist me.

I have attached the screen shot for your reference.

Regards,
Charan
 

Attachments

  • Error message.png
    Error message.png
    32.7 KB · Views: 1

msvkrishna

Member
Alumni
a) I should use Categorical_values as [5] , is that correct ?
No. categorical_values[col_index_no] is the syntax

b) when I use that, I was expecting 5 new variables (columns to be added to the feature, making the total feature shape to be 804 x 12 columns(7+5 for type), instead I get 804 x 9. Can you please advice ? am I doing this wrong ?

Change as per the solution in a) and it will work


========

Hi Prashant.. please find attached Day4 ( Cars, Physical ) assignments. Couple of points. 1) in Cars, the test score came out to be 62% compared to train 69% ( regular model and later r2 value from backward elim/final model.., almost match..).. is this overfitting and if so what can be done? 2) in Physical, test is 72% and train is 99% ( with test as 40% and random_state 5 ).. I am not sure if I have done anything incorrect..!! Thank you. Krishna
 

Attachments

  • Day4_Assgnmnts_Krishna.zip
    9.8 KB · Views: 6

Kulasekhar M V

Member
Alumni
Hi Prashant
started the Housing project on the step 6 of perform linear regression i executed the linear regression and observed
model.score for train data is 64.71
model score for test data is 63.8

two questions i have
question1: is this not very low percentage interms of accuracy is it due to data or any way i can improve the score?
question2 : over fitting happening as i see if you can confirm score is due to data any cant improve accuracy i can work on removing overfitting
awaiting for your response

kept the source code below for the same:
#create training and testing set
from sklearn.cross_validation import train_test_split
X_train,X_test,y_train,y_test = train_test_split(features,labels,test_size=0.2,random_state=0)

# create the model
from sklearn.linear_model import LinearRegression
model =LinearRegression()
model.fit(X_train,y_train)
# predections
y_pred = model.predict(X_test)
model.score (X_train,y_train)
model.score (X_test,y_test)
 

Charan Telapalli

Member
Alumni
Hi Prashant,

Please find my 4th day car model assignment.

Regards,
Charan Telapalli
 

Attachments

  • Car Model - Assigment 1.zip
    813 bytes · Views: 4

Kulasekhar M V

Member
Alumni
Hi prashant
i started housing project asmentioned below two questions from my end
1) root mean squared error what is the syntax need to be used i am not sure we covered in our classes also
2) when i applied decision tree regression i am getting below error
ValueError: Length of feature_names, 9 does not match number of features, 13
code prepared from my end attached here
could you let me know what i am doing wrong actual columns in the housing csv related to features are 9 and 10 is the label . not sure where it is saying 13 .
features1 = list(dataset.columns[1:])
dot_data =StringIO()
export_graphviz(model.fit(X_train,y_train),out_file =dot_data,feature_names=features1,filled = True )
graph =pydot.graph_from_dot_data(dot_data.getvalue())
Image(graph[0].create_png())
 

Charan Telapalli

Member
Alumni
Hi Prashant,

Still my pydot is not working. Please assist.

I am getting the below error message.

"ModuleNotFoundError: No module named 'pydot'"
 

Sanjay Joshi_4

Member
Alumni
Hi prashant
i started housing project asmentioned below two questions from my end
1) root mean squared error what is the syntax need to be used i am not sure we covered in our classes also
2) when i applied decision tree regression i am getting below error
ValueError: Length of feature_names, 9 does not match number of features, 13
code prepared from my end attached here
could you let me know what i am doing wrong actual columns in the housing csv related to features are 9 and 10 is the label . not sure where it is saying 13 .
features1 = list(dataset.columns[1:])
dot_data =StringIO()
export_graphviz(model.fit(X_train,y_train),out_file =dot_data,feature_names=features1,filled = True )
graph =pydot.graph_from_dot_data(dot_data.getvalue())
Image(graph[0].create_png())


I was getting the same error and resolved by doing foloiwng not sure whether it is right or wrong
list(dataset.column[1:]
shows the column name and update the count after ':' then it works
features1 = list(dataset.columns[1:2])
Also posted my issue. But resolved through above
 

Kulasekhar M V

Member
Alumni
I was getting the same error and resolved by doing foloiwng not sure whether it is right or wrong
list(dataset.column[1:]
shows the column name and update the count after ':' then it works
features1 = list(dataset.columns[1:2])
Also posted my issue. But resolved through above

sanjay
i did not get it have
 

msvkrishna

Member
Alumni
Hi All

PFA Day6 Notes and Codes

---------
Hi Prashanth.., couple of points/observations on the Project that I have come across...need your input
1) Linear regression completed.
Just an FYI.., what I noted is, if Standard scaling is used as asked in Step 5 of the project guide, the test score becomes a -ve and > 100%.
Once i skip feature scaling, and using random_state = 5 gave me a test score 65% compared to Train 63%
( found a good youtube video on R2, SSE,SST,SSR, RMSE etc...,
wanted to share for all )
2) Decision tree regression :
I recall , this method once the missing values are preprocessed, NOT
needing any conversion on categorical features , like OHE or label encoder. So I skipped this step.
But i seem to get cannot convert string to float 'INLAND' , ocean_proximity column.
=> I did numeric conversion to see,if it helps.., but still the same error.
I did research on line, and many have stated OHE is not needed/good for Decision tree,
inline with what you stated in class.
Besides, if we do use OHE, then because of the categorical variables added, the feature_names mis match of
9 vs 13 is encountered while doing the graph.
Need your guidance please.

Thanks
Krishna
 

_19584

Member
Alumni
Hello Prashant,

PFA the assignments for 10th and 11th feb (Day 3 and 4 of the class),
1. cars
2. physical

For the physical dataset, the model which I have obtained is clearly overfitting, however, making changes to the random_state variable removes the overfitting to a great extent. Is changing the same recommended or there is any other way to approach problem?

Please do let me know in case of any mistakes.

Thanks and Regards,
Swapna
 

Attachments

  • Swapna_Assignment_day3-4.zip
    12.2 KB · Views: 3

Priyavart Thakur

Member
Alumni
Hi Prashant,

While practicing the class examples, I have few question came into mind which I have listed below.

1. Any book/blog for machine learning Interview question
2. Any pandas and numpy book/blogs
3. Way to Identify outliers
4. What is the difference between fit and fit_transfer-- Already discussed
5. How to use the model for real data use.. is it predict?-- Already discussed
6. Standard scaling and minmax scaling (how to see the scaled values like 1cm=100km etc.)-- Already discussed
7. In panda basic example, graph get shown without using %matplotlib inline in jupyter notebook so when to use this?-- Already discussed. will cross check at my end
8. If we have some question after this training completion. How can we reach you?-- Already discussed
9. While running any jupyter notebook program as complete run, it take some time ...is the behaviour of the notebook or interpreter used in python. Are we saying that python is slower than other program like C# ,java(by the way my current experience is in .net)-- Already discussed

I will wait for your answer on above question.

I have already updated the discussed points as 'Already discussed '. Please reply for rest of the points.

Thanks,
Priyavart
 
Last edited:

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hey Prashanth, @Prasanth Nair If i have a column with lot of text in it and I need to preprocess the data giving particular number to each word so that it can be catoegorized automatically next time. Now, when a new word comes it will automatically assign a number to it.

Now my question is which tool or technique should i use?? If you can share a link of its example.

Regards
Desh
 
Top