Machine learning | Vighnesh

Discussion in 'Big Data and Analytics' started by Vikas Kumar_18, Jun 29, 2019.

  1. Vikas Kumar_18

    Vikas Kumar_18 Well-Known Member
    Simplilearn Support Alumni

    Joined:
    Dec 17, 2018
    Messages:
    205
    Likes Received:
    37
    #1
  2. Vigneshwar V

    Vigneshwar V Customer
    Customer

    Joined:
    Jun 22, 2019
    Messages:
    12
    Likes Received:
    3
    Hi Everyone !!!!
    I have uploaded all the artifacts discussed during the session in the below google drive link.

    https://drive.google.com/open?id=1qv6xFERPzXKYHIYdKk9hxiJvCkvU3skm

    Kindly make sure you download and practice the code in the notebook and be prepared for the next session.

    Have a happy learning !!!!!

    Thanks and Regards
    Vigneshwar V
     
    #2
  3. srinivas nallur

    srinivas nallur New Member

    Joined:
    Jul 4, 2019
    Messages:
    1
    Likes Received:
    0
    I could not find any description of what needs to be done with the Unassisted practice datasets.
    I only see 2 data sets mtcars and salaries.
    Is there any description given for it or just to play around with it?
     
    #3
  4. PRASHANT NAMDEO SHELARE

    Joined:
    Apr 5, 2019
    Messages:
    2
    Likes Received:
    0
    Respected Sir,
    You had forgot to disscus on the topic of Groupby under Data Manupulation..
     
    #4
  5. DEVANURU YADA KISHORE

    Joined:
    Apr 17, 2019
    Messages:
    4
    Likes Received:
    0
    Hi Vignesh,

    Greetings for the Day,

    can you please tell me about y=mx+c , what is the m-value, c-value in liner regression graph,
    i seen some graphs in google and they have taken value like this : y=5x+0.03 as per graph so , how they taken m, c values here ,
     

    Attached Files:

    #5
  6. Vikas Kumar_18

    Vikas Kumar_18 Well-Known Member
    Simplilearn Support Alumni

    Joined:
    Dec 17, 2018
    Messages:
    205
    Likes Received:
    37
    Hi Srinivas.

    You need to explore the dataset and perform the operation by yourself that's the reason it is said Unassisted practice. You can practice some operation which is taught in the class. I hope you understand this.
     
    #6
  7. Vikas Kumar_18

    Vikas Kumar_18 Well-Known Member
    Simplilearn Support Alumni

    Joined:
    Dec 17, 2018
    Messages:
    205
    Likes Received:
    37

    Hi Devanuru,

    Y = mx +c where m is the slope and c is the constant value from where the straight line starts. This all values would be counted by the algorithms. If you draw a line in a plane so as per the coordinates it would have some equation. It has the same approach where a line would be drawn which satisfy almost all points.
     
    #7
  8. Ambika Thippareddy

    Joined:
    Jun 12, 2019
    Messages:
    2
    Likes Received:
    0
    Hello Vignesh,
    i am not able to download the recording session from simplilearn app. It is really required me to take the recordings over the cell phone.
    Do you have any suggestion?
     
    #8
  9. DEVANURU YADA KISHORE

    Joined:
    Apr 17, 2019
    Messages:
    4
    Likes Received:
    0
    Hi Vignesh,

    Greetings for the day,

    in my CSV file there are 2 columns had Categorical data,
    So now by using Label Encoder, How to convert those 2-columns categorical data in to numerical format.

    from sklearn.preprocessing import LabelEncoder
    aaa=df.iloc[ : , columns].values // how to take multiple columns , how to use LabelEncoder on Multiple Columns
    gender_encoder = LabelEncoder()
    y = gender_encoder.fit_transform(aaa)
    y



    the Excel Date is here

    upload_2019-7-15_21-37-35.png


    Please help me it is important for me, in case if most of columns some thing like this , how to use Label Encoder on those .
     

    Attached Files:

    #9
    Last edited: Jul 20, 2019
  10. DEVANURU YADA KISHORE

    Joined:
    Apr 17, 2019
    Messages:
    4
    Likes Received:
    0
    How to visualize xtrain, ytrain data in Graph,
    can you please help me in this with 5-records from xtrain, y-train
     
    #10
  11. _42147

    _42147 New Member

    Joined:
    Sep 28, 2018
    Messages:
    1
    Likes Received:
    0
    Time Series Model:

    Code:

    dateparse = lambda dates: pd.to_datetime.strptime(dates, '%Y-%m')
    data = pd.read_csv('AirPassengers.csv', parse_dates=['Month'], index_col ='Month', date_parser=dateparse)

    Getting Error message below:

    ---------------------------------------------------------------------------
    AttributeError Traceback (most recent call last)
    C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in converter(*date_cols)
    3177 result = tools.to_datetime(
    -> 3178 date_parser(*date_cols), errors='ignore')
    3179 if isinstance(result, datetime.datetime):

    <ipython-input-31-8edd126540ad> in <lambda>(dates)
    ----> 1 dateparse = lambda dates: pd.to_datetime.strptime(dates, '%Y-%m')
    2
    3

    AttributeError: 'function' object has no attribute 'strptime'

    During handling of the above exception, another exception occurred:

    AttributeError Traceback (most recent call last)
    C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in converter(*date_cols)
    3186 parser=date_parser,
    -> 3187 dayfirst=dayfirst),
    3188 errors='ignore')

    pandas\_libs\tslibs\parsing.pyx in pandas._libs.tslibs.parsing.try_parse_dates()

    pandas\_libs\tslibs\parsing.pyx in pandas._libs.tslibs.parsing.try_parse_dates()

    <ipython-input-31-8edd126540ad> in <lambda>(dates)
    ----> 1 dateparse = lambda dates: pd.to_datetime.strptime(dates, '%Y-%m')
    2
    3

    AttributeError: 'function' object has no attribute 'strptime'

    ----Gunasekaran Thukkaram
     
    #11
  12. DEVANURU YADA KISHORE

    Joined:
    Apr 17, 2019
    Messages:
    4
    Likes Received:
    0
    Hi Vignesh,
    Greetings for the day,

    i have a doubt in Leasson 4,
    Lasso, Ridge --> Regression algorithms family
    Logistic regression --> Classification family .
    so in lesson 4 why do you use two algorithms at a time one side of another.


    lr = LogisticRegression(penalty='l1') ## L1-lasso, L2-Ridge
    lr.fit(X_train_transformed,y_train)
    y_predict = lr.predict(X_test_transformed)
    from sklearn.metrics import accuracy_score
    accuracy = accuracy_score(y_predict,y_test)
    print(accuracy)


    can you help me why do you used lasso inside of Logistic ? what was the reason, why do we use
     
    #12
  13. Taheare Basha Doodekula

    Joined:
    Apr 16, 2019
    Messages:
    1
    Likes Received:
    0
    Hi Vignesh,

    By using below code i am getting the error.
    from statsmodels.tsa.stattools import adfuller
    def test_stationarity(timeseries):

    File "<ipython-input-9-90286eb131f2>", line 2 def test_stationarity(timeseries):
    ^ SyntaxError: unexpected EOF while parsing


    I have tried in multipul ways but getting same error like this.

    Please help to advice me on the same.


    Thanks,
    Taheare Basha
     
    #13
  14. sukant_1

    sukant_1 Member

    Joined:
    Mar 24, 2016
    Messages:
    6
    Likes Received:
    0
    Hi Vignesh,

    In Mercedes project, when executing code to get predict result, it is keep on showing spinning symbol, but not displaying result.
    Please refer my attached notebook. please let me know if you have any suggestion. I will try another project and will submit by next week.
     

    Attached Files:

    #14
  15. Mukunda Prasad Jena(2034)

    Alumni

    Joined:
    Jul 3, 2011
    Messages:
    1
    Likes Received:
    0
    Hi Vignesh,

    Random Forest regression for California housing project giving Memory Error. Could you please help on this.

    Code:
    from sklearn.ensemble import RandomForestClassifier
    rfc = RandomForestClassifier(n_estimators=600)
    rfc.fit(xtrain, ytrain)

    Output:
    ---------------------------------------------------------------------------
    MemoryError Traceback (most recent call last)
    <ipython-input-248-9fc3985fbf99> in <module>
    1 #8. Perform Random Forest Regression :
    2 # Training Random Forest Regression model
    ----> 3rfc.fit(xtrain, ytrain)

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\ensemble\forest.py in fit(self, X, y, sample_weight)
    331 t, self, X, y, sample_weight, i, len(trees),
    332 verbose=self.verbose, class_weight=self.class_weight)
    --> 333 for i, t in enumerate(trees))
    334
    335 # Collect newly grown trees

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self, iterable)
    918 self._iterating = self._original_iterator is not None
    919
    --> 920while self.dispatch_one_batch(iterator):
    921 pass
    922

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in dispatch_one_batch(self, iterator)
    757 return False
    758 else:
    --> 759self._dispatch(tasks)
    760 return True
    761

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in _dispatch(self, batch)
    714 with self._lock:
    715 job_idx = len(self._jobs)
    --> 716job = self._backend.apply_async(batch, callback=cb)
    717 # A job can complete so quickly than its callback is
    718 # called before we get here, causing self._jobs to

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py in apply_async(self, func, callback)
    180 def apply_async(self, func, callback=None):
    181 """Schedule a func to be run"""
    --> 182result = ImmediateResult(func)
    183 if callback:
    184 callback(result)

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py in __init__(self, batch)
    547 # Don't delay the application, to avoid keeping the input
    548 # arguments in memory
    --> 549self.results = batch()
    550
    551 def get(self):

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self)
    223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
    224 return [func(*args, **kwargs)
    --> 225 for func, args, kwargs in self.items]
    226
    227 def __len__(self):

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in <listcomp>(.0)
    223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
    224 return [func(*args, **kwargs)
    --> 225 for func, args, kwargs in self.items]
    226
    227 def __len__(self):

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\ensemble\forest.py in _parallel_build_trees(tree, forest, X, y, sample_weight, tree_idx, n_trees, verbose, class_weight)
    117 curr_sample_weight *= compute_sample_weight('balanced', y, indices)
    118
    --> 119tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
    120 else:
    121 tree.fit(X, y, sample_weight=sample_weight, check_input=False)

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    799 sample_weight=sample_weight,
    800 check_input=check_input,
    --> 801 X_idx_sorted=X_idx_sorted)
    802 return self
    803

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    364 min_impurity_split)
    365
    --> 366builder.build(self.tree_, X, y, sample_weight, X_idx_sorted)
    367
    368 if self.n_outputs_ == 1:

    sklearn\tree\_tree.pyx in sklearn.tree._tree.DepthFirstTreeBuilder.build()

    sklearn\tree\_tree.pyx in sklearn.tree._tree.DepthFirstTreeBuilder.build()

    sklearn\tree\_tree.pyx in sklearn.tree._tree.Tree._add_node()

    sklearn\tree\_tree.pyx in sklearn.tree._tree.Tree._resize_c()

    sklearn\tree\_utils.pyx in sklearn.tree._utils.safe_realloc()

    MemoryError: could not allocate 931135488 bytes


    Regards,
    Mukunda.​
     
    #15
  16. Shruti Singhal_1

    Joined:
    Jul 18, 2019
    Messages:
    6
    Likes Received:
    0
    Hi Vignesh,

    As discussed in class, please check the below error for Lasso

    upload_2019-11-29_12-55-39.png

    Thanks,
    Shruti
     
    #16
  17. Shailendra Singh

    Joined:
    May 27, 2015
    Messages:
    2
    Likes Received:
    0
    Hi Vignesh,

    i am getting below error while running #Checking the magnitude of coefficients.


    File "<ipython-input-112-1a566db53ecf>", line 5
    coef.plot(kind='bar', title='Modal Coefficients’)
    ^
    SyntaxError: EOL while scanning string literal


    Code: -
    predictors = df.columns[:-1]
    coef = pd.Series(model.coef_,predictors).sort_values()
    coef.plot(kind='bar', title='Modal Coefficients’)
    plt.scatter(df.RM,df.HOUSING_VALUE)
    plt.title("Relationship between RM and Target Variable")
    plt.show()
    plt.scatter(df.NOX,df.HOUSING_VALUE)
    plt.title("Relationship between NOX and Target Variable")
    plt.show()
    print('R2 Value/Coefficient of Determination: {}'.format(model.score(xtest, ytest)))
     
    #17
  18. Shruti Singhal_1

    Joined:
    Jul 18, 2019
    Messages:
    6
    Likes Received:
    0
    This is fixed, problem was with the variable Lasso (lassoreg).
     
    #18
  19. MARIA BOTELLE GABARRO

    Joined:
    Feb 27, 2019
    Messages:
    1
    Likes Received:
    0
    Hi vigneshwar,

    I have a trouble importing the dataset of tweets. I don't understand because in other cases it worked.

    I supose that the file is not good generated. (it seems line 5 and 8 are diferent from the others, find attached the code error and the
    screenshot of the database of tweets)
    screenshot of the database of tweets:
    upload_2019-12-12_7-10-49.png


    code error:
    FileNotFoundError Traceback (most recent call last)
    <ipython-input-7-9598dde46e70> in <module>
    1 #Read the dataset
    2 import pandas as pd
    ----> 3df= pd.read_csv("Desktop/Limpiar/AAAA/99_Boslan/05_Cursos/Simplilearn/Machine_Learning/NEW_Lessons/Assisted Practice/Lesson 10/Tweets.csv")

    ~\Anaconda3\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    683 )
    684
    --> 685return _read(filepath_or_buffer, kwds)
    686
    687 parser_f.__name__ = name

    ~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    455
    456 # Create the parser.
    --> 457parser = TextFileReader(fp_or_buf, **kwds)
    458
    459 if chunksize or iterator:

    ~\Anaconda3\lib\site-packages\pandas\io\parsers.py in __init__(self, f, engine, **kwds)
    893 self.options["has_index_names"] = kwds["has_index_names"]
    894
    --> 895self._make_engine(self.engine)
    896
    897 def close(self):

    ~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _make_engine(self, engine)
    1133 def _make_engine(self, engine="c"):
    1134 if engine == "c":
    -> 1135self._engine = CParserWrapper(self.f, **self.options)
    1136 else:
    1137 if engine == "python":

    ~\Anaconda3\lib\site-packages\pandas\io\parsers.py in __init__(self, src, **kwds)
    1915 kwds["usecols"] = self.usecols
    1916
    -> 1917self._reader = parsers.TextReader(src, **kwds)
    1918 self.unnamed_cols = self._reader.unnamed_cols
    1919

    pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

    pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

    FileNotFoundError: [Errno 2] File b'Desktop/Limpiar/AAAA/99_Boslan/05_Cursos/Simplilearn/Machine_Learning/NEW_Lessons/Assisted Practice/Lesson 10/Tweets.csv' does not exist: b'Desktop/Limpiar/AAAA/99_Boslan/05_Cursos/Simplilearn/Machine_Learning/NEW_Lessons/Assisted Practice/Lesson 10/Tweets.csv'
     
    #19
  20. Shruti Singhal_1

    Joined:
    Jul 18, 2019
    Messages:
    6
    Likes Received:
    0
    Hi Vignesh, please explain how to do label/one hot encoding without passing column names.
     
    #20
  21. Kunal Saurabh

    Kunal Saurabh Member

    Joined:
    Aug 28, 2019
    Messages:
    3
    Likes Received:
    0
    Good Morning Vignesh, as I spoke to you in last class I am late attendee of this batch but I should be able to finish it on my own really fast.I need a small help from your side.I need all the assisted practise data sets.Thanks
     
    #21
  22. Shruti Singhal_1

    Joined:
    Jul 18, 2019
    Messages:
    6
    Likes Received:
    0
    Hi Vignesh,

    I have submitted the project and passed it. However there are no comments/feedbacks on the code submitted. Please help evaluate the project and share your views on it. It will help the learners to improve in future.

    Thanks,
    Shruti
     
    #22
  23. MJ Services

    MJ Services Active Member

    Joined:
    Nov 29, 2019
    Messages:
    31
    Likes Received:
    0
    Dear Vignesh pl check
     

    Attached Files:

    #23
  24. Vigneshwar V

    Vigneshwar V Customer
    Customer

    Joined:
    Jun 22, 2019
    Messages:
    12
    Likes Received:
    3
    Hi Mili,
    Pls check ur file name. The file name is 'Salaries.csv.url', Try giving the same name or rename the file as 'Salaries.csv' and execute the command.
    Thanks
     
    #24
  25. hellosarathy

    hellosarathy Member

    Joined:
    Mar 28, 2016
    Messages:
    3
    Likes Received:
    0
    hi Vignesh,
    as discussed, please share the asisted learning code that we practiced in the class. That helps us to revise faster.
     
    #25
  26. Vikas Kumar_18

    Vikas Kumar_18 Well-Known Member
    Simplilearn Support Alumni

    Joined:
    Dec 17, 2018
    Messages:
    205
    Likes Received:
    37
    Hi Learner,

    We have requested the trainer to upload all the assisted codes in the respective google drive. It would take a few days but it would be uploaded.
    We really appreciate your cooperation.
     
    #26
  27. Adarsh Kumar_7

    Joined:
    Oct 29, 2019
    Messages:
    2
    Likes Received:
    0
    Hi Vighnesh,
    Good day,

    i just need your support regarding LogisticRegression Titanic Dataset model.

    can u pls help to clear me that how can I check, mean Age value for Pclass?

    This codes,
    def age_approx(cols):
    Age = cols[0]
    Pclass = cols[1]

    if pd.isnull(age):
    if Pclass == 1:
    return 37
    elif Pclass ==2:
    return 29
    else:
    return 24
    else:
    return Age

    Regards,
    Adarsh Kumar
     
    #27
  28. Vigneshwar V

    Vigneshwar V Customer
    Customer

    Joined:
    Jun 22, 2019
    Messages:
    12
    Likes Received:
    3
    Hi Sarathy,
    The assisted code had been uploaded in the drive after the session.
    Kindly check the file named as "Big Mart Sales.ipynb" in the drive.
     
    #28
  29. Rajesh Thakur_1

    Rajesh Thakur_1 Active Member

    Joined:
    Oct 1, 2019
    Messages:
    16
    Likes Received:
    0
    Hi Sir,
    Without removing the outlier if I am checking the Hitogram then I am getting something like which is given below.
    upload_2020-3-6_0-35-56.png

    I removed my outlier by the below code
    q1=train['Item_Visibility'].quantile(0.25)
    q3=train['Item_Visibility'].quantile(0.75)
    IQR=q3-q1
    filt_train=train.query('(@q1 - 1.5 * @IQR)<= Item_Visibility <= (@q3 + 1.5 * @IQR)')

    train=filt_train

    After removing the outlier if again I am checking the histogram
    train['Item_Visibility'].hist(bins=20)

    Still it showing the same graph.

    upload_2020-3-6_0-38-45.png

    can you please help me out
     
    #29
  30. MJ Services

    MJ Services Active Member

    Joined:
    Nov 29, 2019
    Messages:
    31
    Likes Received:
    0
    Code: salcal.groupby('Year').sum()[['TotalPay']]

    Output: TotalPay
    Year

    2011 2.594113e+09
    2012 2.724736e+09
    2013 2.918656e+09
    2014 2.876911e+09

    Can someone help me to get these figures as expanded version or expanded to some units.
     
    #30
  31. MJ Services

    MJ Services Active Member

    Joined:
    Nov 29, 2019
    Messages:
    31
    Likes Received:
    0
    INput: salcal[salcal.Year == '2014']

    Out[11]:

    Id EmployeeName JobTitle BasePay OvertimePay OtherPay Benefits TotalPay TotalPayBenefits Year Notes Agency Status

    Question: The above input should have given the entire dataframe but is only showing columnnames and no rows. can someone explain why?
     
    #31
  32. Vikas Kumar_18

    Vikas Kumar_18 Well-Known Member
    Simplilearn Support Alumni

    Joined:
    Dec 17, 2018
    Messages:
    205
    Likes Received:
    37
    It could have no values for 2014 so that it shows only one column.
    You can check this code as well

    salcal[[salcal.Year == '2014']]
     
    #32
  33. MJ Services

    MJ Services Active Member

    Joined:
    Nov 29, 2019
    Messages:
    31
    Likes Received:
    0
    No, this is not so. There are values for 2014, but even if i change to 2011 0r 2012, the output is same.
     
    #33
  34. MJ Services

    MJ Services Active Member

    Joined:
    Nov 29, 2019
    Messages:
    31
    Likes Received:
    0
    Code:
    import matplotlib.pyplot as plt
    fig, axs = plt.subplots(1,13, sharey=True )
    boston.plot(kind='scatter', x='CRIM', y='MEDV', ax=axs[0], figsize=(10,6))
    boston.plot(kind='scatter', x='ZN', y='MEDV', ax=axs[1], figsize=(10,6))
    boston.plot(kind='scatter', x='INDUS', y='MEDV', ax=axs[1], figsize=(10,6))
    boston.plot(kind='scatter', x='CHAS', y='MEDV', ax=axs[1], figsize=(10,6))
    boston.plot(kind='scatter', x='NOX', y='MEDV', ax=axs[1], figsize=(10,6))
    boston.plot(kind='scatter', x='RM', y='MEDV', ax=axs[1], figsize=(10,6))
    boston.plot(kind='scatter', x='AGE', y='MEDV', ax=axs[1], figsize=(10,6))
    boston.plot(kind='scatter', x='DIS', y='MEDV', ax=axs[1], figsize=(10,6))
    boston.plot(kind='scatter', x='RAD', y='MEDV', ax=axs[1], figsize=(10,6))
    boston.plot(kind='scatter', x='TAX', y='MEDV', ax=axs[1], figsize=(10,6))
    boston.plot(kind='scatter', x='PTRATIO', y='MEDV', ax=axs[1], figsize=(10,6))
    boston.plot(kind='scatter', x='B', y='MEDV', ax=axs[1], figsize=(10,6))
    boston.plot(kind='scatter', x='LSTAT', y='MEDV', ax=axs[1], figsize=(10,6))

    Can someone help me to convert this into def: or a Loop?
     
    #34
  35. Rajesh Thakur_1

    Rajesh Thakur_1 Active Member

    Joined:
    Oct 1, 2019
    Messages:
    16
    Likes Received:
    0
    from sklearn.decomposition import PCA

    Hi Sir,

    Kindly look into this issue.

    pca = PCA(n_components=1)
    x_train = pca.fit_transform(x_train)
    x_test = pca.transform(x_test)

    This is working fine for me


    but when I setting n_component=2 I getting the value error
    pca = PCA(n_components=2)
    x_train = pca.fit_transform(x_train)
    x_test = pca.transform(x_test)




    ValueError: n_components=2 must be between 0 and min(n_samples, n_features)=1 with svd_solver='full'
     
    #35
  36. Rajesh Thakur_1

    Rajesh Thakur_1 Active Member

    Joined:
    Oct 1, 2019
    Messages:
    16
    Likes Received:
    0
    Hi Sir,

    As in today's class we found an error you said that post in community please look into this.

    from sklearn.model_selection import cross_val_score
    knn = KNeighborsClassifier(n_neighbors=4)
    print(cross_val_score(knn, x1, y1, cv=10, scoring ='accuracy').mean())




    I am getting the below error

    ---------------------------------------------------------------------------
    ValueError Traceback (most recent call last)
    <ipython-input-44-a7fde292a61f> in <module>
    1 from sklearn.model_selection import cross_val_score
    2 knn = KNeighborsClassifier(n_neighbors=4)
    ----> 3print(cross_val_score(knn, x1, y1, cv=10, scoring ='accuracy').mean())

    ~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, error_score)
    389 fit_params=fit_params,
    390 pre_dispatch=pre_dispatch,
    --> 391 error_score=error_score)
    392 return cv_results['test_score']
    393

    ~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py in cross_validate(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, return_train_score, return_estimator, error_score)
    230 return_times=True, return_estimator=return_estimator,
    231 error_score=error_score)
    --> 232 for train, test in cv.split(X, y, groups))
    233
    234 zipped_scores = list(zip(*scores))

    ~/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
    919 # remaining jobs.
    920 self._iterating = False
    --> 921if self.dispatch_one_batch(iterator):
    922 self._iterating = self._original_iterator is not None
    923

    ~/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
    752 tasks = BatchedCalls(itertools.islice(iterator, batch_size),
    753 self._backend.get_nested_backend(),
    --> 754 self._pickle_cache)
    755 if len(tasks) == 0:
    756 # No more tasks available in the iterator: tell caller to stop.

    ~/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in __init__(self, iterator_slice, backend_and_jobs, pickle_cache)
    208
    209 def __init__(self, iterator_slice, backend_and_jobs, pickle_cache=None):
    --> 210self.items = list(iterator_slice)
    211 self._size = len(self.items)
    212 if isinstance(backend_and_jobs, tuple):

    ~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py in <genexpr>(.0)
    225 pre_dispatch=pre_dispatch)
    226 scores = parallel(
    --> 227 delayed(_fit_and_score)(
    228 clone(estimator), X, y, scorers, train, test, verbose, None,
    229 fit_params, return_train_score=return_train_score,

    ~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_split.py in split(self, X, y, groups)
    333 .format(self.n_splits, n_samples))
    334
    --> 335for train, test in super().split(X, y, groups):
    336 yield train, test
    337

    ~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_split.py in split(self, X, y, groups)
    87 X, y, groups = indexable(X, y, groups)
    88 indices = np.arange(_num_samples(X))
    ---> 89for test_index in self._iter_test_masks(X, y, groups):
    90 train_index = indices[np.logical_not(test_index)]
    91 test_index = indices[test_index]

    ~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_split.py in _iter_test_masks(self, X, y, groups)
    684
    685 def _iter_test_masks(self, X, y=None, groups=None):
    --> 686test_folds = self._make_test_folds(X, y)
    687 for i in range(self.n_splits):
    688 yield test_folds == i

    ~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_split.py in _make_test_folds(self, X, y)
    649 raise ValueError("n_splits=%d cannot be greater than the"
    650 " number of members in each class."
    --> 651 % (self.n_splits))
    652 if self.n_splits > min_groups:
    653 warnings.warn(("The least populated class in y has only %d"

    ValueError: n_splits=10 cannot be greater than the number of members in each class.
     
    #36
  37. Rohit Mathur

    Rohit Mathur Member
    Alumni

    Joined:
    May 11, 2015
    Messages:
    6
    Likes Received:
    0
    My view on bias and variance for models. Please correct/discuss

    - Rationale is that Bias is a measurement used for training data set and variance for predictions/test dataset

    - When Training model
    a) medium or High bias ( not low bias because it may be overfitting)
    - When model is ready for prediction, then for test data
    a) predictions should have low Variance
     
    #37
    Last edited: Mar 19, 2020
  38. MJ Services

    MJ Services Active Member

    Joined:
    Nov 29, 2019
    Messages:
    31
    Likes Received:
    0
    Hi Adarsh,

    Calculating those mean is mentioned in video recording, check the recording video of that day. It will help. I did that too.

    The code which you have mentioned is to assign Pclass based on age. If yet not convinced then reply me, will go extramile to give you solution.

    Fellow member
    -Mili


     
    #38
  39. MJ Services

    MJ Services Active Member

    Joined:
    Nov 29, 2019
    Messages:
    31
    Likes Received:
    0
    Dear Vignesh & members here,

    Please check if the train and test file of project 1 has these columns. Do you also have same columns in csv?

    If so then why test file does not have column "y"

    Regards,
    Mili

    For details check the attahcment.
     

    Attached Files:

    #39
  40. Murukai Gumbo-Mberi

    Joined:
    Sep 30, 2019
    Messages:
    7
    Likes Received:
    0
    Hi Vignesh and members

    I am trying to use xgboost but I am failing to do so I get the error no such module xgboost please help I am really stuck.

    ModuleNotFoundError: No module named 'xgboost'

    If anyone knows what I am missing I would appreciate the help.

    Regards,

    Murukai
     
    #40
  41. Rajesh Thakur_1

    Rajesh Thakur_1 Active Member

    Joined:
    Oct 1, 2019
    Messages:
    16
    Likes Received:
    0
    Hi Sir,

    Can you please find the attachment when I am applying the XGboost model on the test dataSet I am getting an error

    Kindly go through once.
     

    Attached Files:

    #41
  42. Rajesh Thakur_1

    Rajesh Thakur_1 Active Member

    Joined:
    Oct 1, 2019
    Messages:
    16
    Likes Received:
    0
    Hi Shruti,

    As per my knowledge check your type of target variable.
     
    #42
  43. hellosarathy

    hellosarathy Member

    Joined:
    Mar 28, 2016
    Messages:
    3
    Likes Received:
    0
    ------

    Right, 'y' is missing in 'Test' data set. I thought we should use that for prediction.
     
    #43
  44. hellosarathy

    hellosarathy Member

    Joined:
    Mar 28, 2016
    Messages:
    3
    Likes Received:
    0
    hi team,
    I thought we should LDA for dimension reduction as PCA does not have great variance, it is almost uniformly distributed in all features.
    However, when I tried LDA, I got this error: any help is appreciated:

    from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
    lda = LDA(n_components=1)
    X = lda.fit_transform(X, y_train)

    --------------------------------------------------------------------------
    ValueError Traceback (most recent call last)
    <ipython-input-149-9ca3625ff137> in <module>
    2
    3 lda = LDA(n_components=1)
    ----> 4X = lda.fit_transform(X, y_train)

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\base.py in fit_transform(self, X, y, **fit_params)
    463 else:
    464 # fit method of arity 2 (supervised transformation)
    --> 465return self.fit(X, y, **fit_params).transform(X)
    466
    467

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\discriminant_analysis.py in fit(self, X, y)
    428 """
    429 X, y = check_X_y(X, y, ensure_min_samples=2, estimator=self)
    --> 430self.classes_ = unique_labels(y)
    431 n_samples, _ = X.shape
    432 n_classes = len(self.classes_)

    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\multiclass.py in unique_labels(*ys)
    94 _unique_labels = _FN_UNIQUE_LABELS.get(label_type, None)
    95 if not _unique_labels:
    ---> 96raise ValueError("Unknown label type: %s" % repr(ys))
    97
    98 ys_labels = set(chain.from_iterable(_unique_labels(y) for y in ys))

    ValueError: Unknown label type: (array([130.81, 88.53, 76.26, ..., 109.22, 87.48, 110.85]),)
     
    #44
  45. MJ Services

    MJ Services Active Member

    Joined:
    Nov 29, 2019
    Messages:
    31
    Likes Received:
    0
    Try
    X = lda.fit_transform(X_train, y_train)



     
    #45
  46. MJ Services

    MJ Services Active Member

    Joined:
    Nov 29, 2019
    Messages:
    31
    Likes Received:
    0
    Hi,
    Can someone tell how did we install XGboost package in Ananconda? Have search in all recording , didn't find that.
    Looking for a fast reply as query is simple.
     
    #46
  47. Daniel Akakabota

    Daniel Akakabota Customer
    Customer

    Joined:
    Sep 27, 2019
    Messages:
    2
    Likes Received:
    0
    Hi Vignesh, hi all, please kindly look into the following errors i was getting while practicing.
    First is with the Kmeans clustering where i tried to plot the clusters but am confused as to the parameters for 'X', and 'Y', i have tried different column names but it keeps giving me the same error.In the class practice there were only two features(columns), so he just simply put two of them for 'X' and ';Y' and it worked fine. But in the Zoo dataset which we were asked to use for practice, all the column names i tried outputed error.Please see the file named Kmeans practice error below.

    Second, i was trying to use the xgboost ensemble technique in the practice but was given an error that xgboost does not exist, Please look at the exact error it threw in the file named 'Ensemble practice error' below.Thanks, I would appreciate a prompt reply
     

    Attached Files:

    #47

Share This Page