Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

Machine Learning Advanced Certification | Nov 9-27

Sahana_19

Member
Hi,
1. when should use () brackets for example: df["CATEGORY*"].unique() ??
and
2.what is the difference b/w the output and how do we read that output??
 

Sahana_19

Member
2. what if i want to access from 10th row till 20th and only 25th row...2th column and 5th column using iloc????
df.iloc[[10:20,25],[3,5]] this is my code.. plz correct me if i am wrong..
 
Hi
Could you please help me with 2 things.I started using practice labs itself.But couldnt proceed further.

1) To find the squares of numbers in the list.
numbers_out= []
numbers=[1,2,3,4,5]
for num in numbers:
num_out=num*num
numbers_out.append(num_out)
print(numbers_out)

The output is as follows:

[1]
[1, 4]
[1, 4, 9]
[1, 4, 9, 16]
[1, 4, 9, 16, 25]

Kindly correct me where did I go wrong.

2)
import pandas as pd
df = pd.read_csv("UberDrives2016.csv")

This shows an error as follows which stopped me right away:

FileNotFoundError Traceback (most recent call last)
<ipython-input-6-6e8892c48356> in <module>
1 import pandas as pd
----> 2df = pd.read_csv("UberDrives2016.csv")

/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
674 )
675
--> 676return _read(filepath_or_buffer, kwds)
677
678 parser_f.__name__ = name

/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
446
447 # Create the parser.
--> 448parser = TextFileReader(fp_or_buf, **kwds)
449
450 if chunksize or iterator:

/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
878 self.options["has_index_names"] = kwds["has_index_names"]
879
--> 880self._make_engine(self.engine)
881
882 def close(self):

/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
1112 def _make_engine(self, engine="c"):
1113 if engine == "c":
-> 1114self._engine = CParserWrapper(self.f, **self.options)
1115 else:
1116 if engine == "python":

/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
1889 kwds["usecols"] = self.usecols
1890
-> 1891self._reader = parsers.TextReader(src, **kwds)
1892 self.unnamed_cols = self._reader.unnamed_cols
1893

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()



pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] File UberDrives2016.csv does not exist: 'UberDrives2016.csv'
 

Sahana_19

Member
Hi
Could you please help me with 2 things.I started using practice labs itself.But couldnt proceed further.

1) To find the squares of numbers in the list.
numbers_out= []
numbers=[1,2,3,4,5]
for num in numbers:
num_out=num*num
numbers_out.append(num_out)
print(numbers_out) #indentation error

The output is as follows:

[1]
[1, 4]
[1, 4, 9]
[1, 4, 9, 16]
[1, 4, 9, 16, 25]

==============================
numbers_out= []
numbers=[1,2,3,4,5]
for num in numbers:
num_out=num*num
numbers_out.append(num_out)
print(numbers_out)

output:
[1, 4, 9, 16, 25]
 

Dhilip prabhakaran

Customer
Customer
Unable to execute RSME code

rom sklearn.metrics import mean_squared_error
import numpy as np
print("RMSE value of the test dataset is :")
print(np.sqrt(mean_squared_error(y_test , y_test_predicted_value)))

Getting the below error

--------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-17-cdf2035deddb> in <module>
2 import numpy as np
3 print("RMSE value of the test dataset is :")
----> 4print(np.sqrt(mean_squared_error(y_test , y_test_predicted_value))) #this order of parameters should be

/usr/local/lib/python3.7/site-packages/sklearn/metrics/_regression.py in mean_squared_error(y_true, y_pred, sample_weight, multioutput, squared)
250 """
251 y_type, y_true, y_pred, multioutput = _check_reg_targets(
--> 252 y_true, y_pred, multioutput)
253 check_consistent_length(y_true, y_pred, sample_weight)
254 output_errors = np.average((y_true - y_pred) ** 2, axis=0,

/usr/local/lib/python3.7/site-packages/sklearn/metrics/_regression.py in _check_reg_targets(y_true, y_pred, multioutput, dtype)
82
83 """
---> 84check_consistent_length(y_true, y_pred)
85 y_true = check_array(y_true, ensure_2d=False, dtype=dtype)
86 y_pred = check_array(y_pred, ensure_2d=False, dtype=dtype)

/usr/local/lib/python3.7/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
210 if len(uniques) > 1:
211 raise ValueError("Found input variables with inconsistent numbers of"
--> 212 " samples: %r" % [int(l) for l in lengths])
213
214

ValueError: Found input variables with inconsistent numbers of samples: [102, 506]
 
Hi Vaishali & team
I am unable to join today's session through Webex. It is showing me Sessions is locked with below error "AT=JM&ST=FAIL&RS=MeetingLocked&MK=1700632879". Please help
 

_78636

New Member
Hi Vaishali,

I'm unable to access today's class as a message popped up from WebEx Meetings showing "The meeting host has restricted access to this meeting.Please contact the meeting host"
 
Hi Vaishali,

I'm unable to access today's class as a message popped up from WebEx Meetings showing "The meeting host has restricted access to this meeting.Please contact the meeting host"
 
Hello Vaishali\TA,

I am not able to attend today's live session on Machine Learning .

After clicking the "Join Now" button
an error pops up with message as follows :

AT=JM&ST=FAIL&RS=MeetingLocked&MK=1700632879

I also tried joining using phone it says :
" The meeting host has restricted access to this meeting. Please contact the meeting host. "

PFA screenshot for your reference .
 

Attachments

  • Capture.PNG
    Capture.PNG
    14.4 KB · Views: 8

_85870

New Member
Hi Vaishali,

I am unable to attend today's class (16th nov 2020) with error "The meeting host has restricted access to this meeting. Please contact meeting host" in mobile app. In internet browser in desktop, I got error " The webex session is locked. The host has restricted training access to those currently in attendance. " I hope you have noticed me attending the class. My class attendance is also marked with tick mark in the lms portal. I dont understand this error. I am still willing to attend class as there is still time. But I would like you to resolve this atleast for my next class.

Regards
Rexaline Infancia Xavier Raj
 

Naveen Krishnan Raja

Customer
Customer
Hi Vaishali,

I am unable to attend today's class (11/16/2020)due to the attached error. I waited for 30 min and logged out from PC and then logged via mobile but still, I could not Join the session and received this error message "The meeting host has restricted access to this meeting. Please contact the meeting host". Please help to resolve this issue for the next class. I have raised ticket #00748807 for this issue. Also, please clarify the attendance for today's class (11/16/2020).

Thanks,
Naveen Krishna Raja
 

Attachments

  • ML live class.JPG
    ML live class.JPG
    52.3 KB · Views: 8

Support Simplilearn(4685)

Moderator
Staff member
Alumni
Hey Guys,

Do not worry, due to the LMS issue yesterday many of you were not able to join the session. So we had informed Vaishali to take the session for only an hour and it was only a Q & A session no new topics were covered.

I hope that this information was useful.

Regards,
Team Simplilearn
 

Keerthi T V

Active Member
Staff member
Alumni
Hi Vaishali,

I'm unable to access today's class as a message popped up from WebEx Meetings showing "The meeting host has restricted access to this meeting.Please contact the meeting host"
Hi Hanumanth,

My sincere apologies for the discomfort caused to you.
This was caused due to an unexpected technical glitch,we are looking into it and we'll update you regarding the course of action within 24-48 hours.
Thank you for your patience in the meantime.

Feel free to contact us for any further assistance or clarifications.

Regards,
Keerthi
Global Teaching Assistant
Simplilearn
 
hi Vaishali,

i am getting this below error while importing excel file in python. can you please help.
module 'pandas' has no attribute 'read_xlsx'
 

Raghavendra B M

Well-Known Member
Staff member
Simplilearn Support
hi Vaishali,

i am getting this below error while importing excel file in python. can you please help.
module 'pandas' has no attribute 'read_xlsx'

Hi Anisha Patel,

Please use pandas "read_excel()" function to import excel file.

Regards,
Raghavendra
 

Sahana_19

Member
Plz anyone who know this can answer
======================================================
RMSE value of the train dataset is :
4.268579123832707

RMSE value of the test dataset is :
6.226198168856468

so, can i say that its an appropriate model?? the difference b/w them is almost 1

Plz answer this question this doubt is stuck in my mind and need to clear it.


Thanks
 
Hi,
My name is Anushka Mitra i have joined for data analyst course. I have registered for evening classes which it should start from 21 november 7:30 PM. But i was unable to view any live classes after receiving joining session mail as well
 
Hi Vaishali,
In the discussion of Decision tree model with Horse dataset, we first performed encoding using pd.get_dummies and then went for imputation.
Below mentioned codes were used there for encoding...

X = horse_df.drop('outcome' , axis =1)
y = horse_df['outcome']
horse_encode_new = pd.get_dummies(X)

Now, due to the default setting of these attributes 'columns=None' and 'dummy_na=False', all the columns with `object` or `category` dtype will be converted/encoded, with the NaN values replaced by zeros only (0000...).
Thus, with this concept, if we check, after encoding actually we do not have any missing/NaN values for any feature with categorical/object type variables.
I checked it also with the below code...

list_indices=[]
series1=horse_encode_new.isnull().sum()
for i in series1.index:
if series1!= 0:
list_indices.append(i)
print(list_indices)

for which output is....
['rectal_temp', 'pulse', 'respiratory_rate', 'nasogastric_reflux_ph', 'packed_cell_volume', 'total_protein', 'abdomo_protein']

which are all the features with numerical values only... thus, now, after encoding there is no NaN values for categorical variables features in our dataframe.
So, now when we performed imputation(with below codes) after the encoding step, it actually imputed for NaN values in features with numerical values only.
Is it a correct way to perform these 2-transformers/preprocessors? (as what I got that for categorical variables, we are not doing the desired imputation at all this way)

import numpy as np
from sklearn.impute import SimpleImputer
imp = SimpleImputer(missing_values=np.nan,strategy="most_frequent")
X_train = imp.fit_transform(X_train)
X_test = imp.fit_transform(X_test)

Please help me with this (I know there is a lot to read... that's why I wanted to discuss it verbally after live class session :))
 
Hi Vaishali
My Jupyter lab not opening up and it says that "Your monthly lab usage time of 3000 minutes has reached the monthly allocated time of 3000 minutes". This message coming with blank screen in Jupyter lab. We are not aware of this usage time allocation for the Lab, and hence i raised the Support request #00753480 to extend the usage time until Live class and Project completion/submission. I left the Lab open some times though i did not use it as i am not aware of this monthly allocated time limit.
 

_78394

New Member
Hello Vaishali,
I want to develop a model for spelling mistake correction for Names(Proper nouns). I have a dictionary for all the proper nouns. Could you please let me know which ML algorithm can be used and how to proceed?
 

Aliyah_1

Member
I'm working on Project 3 (Recommender Systems) and am stuck on the last part which asks to make predictions on the test data. So far I've been able to split my data in train and test, and come up with the user similarity matrix on the train set. Vaishali confirmed in our class yesterday that the data was only to be split into X_train and X_test as I was unclear on that, so no y_train/ y_test. But when it comes to predicting, the models seem to require a y. Maybe I'm using the wrong model? Anyone have any suggestions? Thanks.
 
Hi, I am working on the income qualification project and need clarification on below:
  • Set poverty level of the members and the head of the house within a family - as per my understanding, we need to set the poverty level of the members of the household same as that of the head of the family. is this correct?
  • I am more comfortable with SQL than the dataframe group by functions hence used the pysqldf to get the answers for questions like check if all members of same household have same poverty level and to check if there was a house without a family head. Is this acceptable? In real time scenarios, are dataframe operations more preferred over sql? does the performance get affected due to SQL? Please confirm.
  • Since this was a classification problem, I tried doing LDA but the accuracy reduced with LDA. Also, the accuracy of the model was higher without cross validation than the accuracy with cross validation. Is this possible or am I missing something ?
 
Last edited:

Vaishali_26

Well-Known Member
Alumni
Hi,
1. when should use () brackets for example: df["CATEGORY*"].unique() ??
and
2.what is the difference b/w the output and how do we read that output??
Hi Sahana,

We use the round brackets when we access the in built methods of the data structure like dataframe /lists etcs.,
 

Vaishali_26

Well-Known Member
Alumni
Hi, I am working on the income qualification project and need clarification on below:
  • Set poverty level of the members and the head of the house within a family - as per my understanding, we need to set the poverty level of the members of the household same as that of the head of the family. is this correct?
  • I am more comfortable with SQL than the dataframe group by functions hence used the pysqldf to get the answers for questions like check if all members of same household have same poverty level and to check if there was a house without a family head. Is this acceptable? In real time scenarios, are dataframe operations more preferred over sql? does the performance get affected due to SQL? Please confirm.
  • Since this was a classification problem, I tried doing LDA but the accuracy reduced with LDA. Also, the accuracy of the model was higher without cross validation than the accuracy with cross validation. Is this possible or am I missing something ?
Hi Shivani ,

1. Yes, your understanding is correct.
2. Yes, it is acceptable :)
3. Have you treated the null values and outliers in your dataset ? Please check once.
 

Vaishali_26

Well-Known Member
Alumni
I'm working on Project 3 (Recommender Systems) and am stuck on the last part which asks to make predictions on the test data. So far I've been able to split my data in train and test, and come up with the user similarity matrix on the train set. Vaishali confirmed in our class yesterday that the data was only to be split into X_train and X_test as I was unclear on that, so no y_train/ y_test. But when it comes to predicting, the models seem to require a y. Maybe I'm using the wrong model? Anyone have any suggestions? Thanks.
Hi Aliyah,

Try reading this research paper for more understanding :)
 

Attachments

  • EvaluationOfRecommenderSystems.pdf
    301.8 KB · Views: 23

Vaishali_26

Well-Known Member
Alumni
Hi Vaishali,

Can you please provide some inputs regarding how the ML models are deployed /reused.

Thank you !

Hi Shivani,

PFB the link that will help you understand about deploying ML models using Flask.

[url]https://towardsdatascience.com/how-to-easily-deploy-machine-learning-models-using-flask-b95af8fe34d4
[/URL]
 
Hi Shivani ,

1. Yes, your understanding is correct.
2. Yes, it is acceptable :)
3. Have you treated the null values and outliers in your dataset ? Please check once.


Thank you for confirming !
Regarding 3rd point, i did treat the null values but did not look for outliers.
In datasets like this one, where there are many features, how do we look for outliers? do we need to do it individually or are there any functions to so?
 

Vaishali_26

Well-Known Member
Alumni
Unable to execute RSME code

rom sklearn.metrics import mean_squared_error
import numpy as np
print("RMSE value of the test dataset is :")
print(np.sqrt(mean_squared_error(y_test , y_test_predicted_value)))

Getting the below error

--------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-17-cdf2035deddb> in <module>
2 import numpy as np
3 print("RMSE value of the test dataset is :")
----> 4print(np.sqrt(mean_squared_error(y_test , y_test_predicted_value))) #this order of parameters should be

/usr/local/lib/python3.7/site-packages/sklearn/metrics/_regression.py in mean_squared_error(y_true, y_pred, sample_weight, multioutput, squared)
250 """
251 y_type, y_true, y_pred, multioutput = _check_reg_targets(
--> 252 y_true, y_pred, multioutput)
253 check_consistent_length(y_true, y_pred, sample_weight)
254 output_errors = np.average((y_true - y_pred) ** 2, axis=0,

/usr/local/lib/python3.7/site-packages/sklearn/metrics/_regression.py in _check_reg_targets(y_true, y_pred, multioutput, dtype)
82
83 """
---> 84check_consistent_length(y_true, y_pred)
85 y_true = check_array(y_true, ensure_2d=False, dtype=dtype)
86 y_pred = check_array(y_pred, ensure_2d=False, dtype=dtype)

/usr/local/lib/python3.7/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
210 if len(uniques) > 1:
211 raise ValueError("Found input variables with inconsistent numbers of"
--> 212 " samples: %r" % [int(l) for l in lengths])
213
214



ValueError: Found input variables with inconsistent numbers of samples: [102, 506]



Hi Dhilip,

I hope you that you were able to rectify this error based on the numerous discussions we have had about the train test split method in our classes.
 

Vaishali_26

Well-Known Member
Alumni
Hi Vaishali,
In the discussion of Decision tree model with Horse dataset, we first performed encoding using pd.get_dummies and then went for imputation.
Below mentioned codes were used there for encoding...

X = horse_df.drop('outcome' , axis =1)
y = horse_df['outcome']
horse_encode_new = pd.get_dummies(X)

Now, due to the default setting of these attributes 'columns=None' and 'dummy_na=False', all the columns with `object` or `category` dtype will be converted/encoded, with the NaN values replaced by zeros only (0000...).
Thus, with this concept, if we check, after encoding actually we do not have any missing/NaN values for any feature with categorical/object type variables.
I checked it also with the below code...

list_indices=[]
series1=horse_encode_new.isnull().sum()
for i in series1.index:
if series1!= 0:
list_indices.append(i)
print(list_indices)

for which output is....
['rectal_temp', 'pulse', 'respiratory_rate', 'nasogastric_reflux_ph', 'packed_cell_volume', 'total_protein', 'abdomo_protein']

which are all the features with numerical values only... thus, now, after encoding there is no NaN values for categorical variables features in our dataframe.
So, now when we performed imputation(with below codes) after the encoding step, it actually imputed for NaN values in features with numerical values only.
Is it a correct way to perform these 2-transformers/preprocessors? (as what I got that for categorical variables, we are not doing the desired imputation at all this way)

import numpy as np
from sklearn.impute import SimpleImputer
imp = SimpleImputer(missing_values=np.nan,strategy="most_frequent")
X_train = imp.fit_transform(X_train)
X_test = imp.fit_transform(X_test)

Please help me with this (I know there is a lot to read... that's why I wanted to discuss it verbally after live class session :))
Hi Abhishek,

It's okay. It was an interesting read :) Yes, you are right. Only numerical features get imputed. So, if you want to manually impute the categorical
columns , you can impute it using methods like fillna() and then encode the values using pd.get_dummies function :)
 

_76991

Member
I was using own installed version of Jupyter Notebook, but had issue in installing xgboost. Hence now switched to Jupyter lab of Simplilearn. The csv download which was working with notebook is not working in Jupyter lab and giving error. If any of you have faced this issue and resolved, can you tell me solution ? I have attached the snapshot. (Unfortunately support team could not resolve and just shared some material). I also tried df = pd.read_csv (r'd:\UberDrives2016.csv'), but that also did not work. Getting error
File "<ipython-input-10-116da616a7eb>", line 1
df = pd.read_csv ('d:\UberDrives2016.csv')
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-5: truncated \UXXXXXXXX escape
 

Attachments

  • upload_2020-12-1_22-18-58.png
    upload_2020-12-1_22-18-58.png
    14.6 KB · Views: 10

_76991

Member
To Aayushi mam:
For the class material of 6th Dec i have tried to read file in Jupyter lab of simplilearn. But i am getting error. Attached the snapshhot. Can you help me to clear it ?
 

Attachments

  • AirQuality_CSV_Rea_error.JPG
    AirQuality_CSV_Rea_error.JPG
    127.5 KB · Views: 4

gilbert cane

New Member
ValueError: Found input variables with inconsistent numbers of samples: [102, 506]

Sounds like the shapes of your labels and predictions are not in alignment. I faced a similar problem while fitting a linear regression model . The problem in my case was, Number of rows in X was not equal to number of rows in y. In most case, x as your feature parameter and y as your predictor. But your feature parameter should not be 1D. So check the shape of x and if it is 1D, then convert it from 1D to 2D.

x.reshape(-1,1)

Also, you likely get problems because you remove rows containing nulls in X_train and y_train independent of each other. y_train probably has few, or no nulls and X_train probably has some. So when you remove a row in X_train and the same row is not removed in y_train it will cause your data to be unsynced and have different lenghts. Instead you should remove nulls before you separate X and y.
 
Top