Data Science with Python | Samridhi

Discussion in 'Big Data and Analytics' started by Nishant_Singh, Jun 3, 2019.

  1. Lokesh Gowda S

    Lokesh Gowda S New Member

    Joined:
    Jul 12, 2019
    Messages:
    1
    Likes Received:
    0
    iam completed movie lens project till
    3) determine the feature effecting the particular rating as showed in the below screenshot
    i was been strucked in no 3) question as below screenshort plz kindly help me out of this wt the nxt step to do and also how to find the 4) develop the appropriate model to predict the movie rating, help me by the coding formates movie lens method.png querry.png
     
    #51
    Last edited: Aug 28, 2019
  2. ASHIK S R

    ASHIK S R Member

    Joined:
    Jul 26, 2019
    Messages:
    5
    Likes Received:
    0
    Aug 24 - Sep 28 Batch - Assignment 1
    Correct me to arrange this properly.

    i = 4
    while i > 0:
    print("#")
    j = i
    while j > 0:
    print("1")
    j -= 1
    print("\n")
    i -= 1
    print(""*i)

    o/p:


    #
    1
    1
    1
    1



    #
    1
    1
    1



    #
    1
    1



    #
    1
     
    #52
  3. Harshit Sharma_2

    Joined:
    Aug 10, 2019
    Messages:
    2
    Likes Received:
    0
    Hello Mam,

    Can you please assist me how this is working a[-1:-2:-1, :]
    -----------------------a----
    array([[ 0. , 1.11111111, 2.22222222],
    [ 3.33333333, 4.44444444, 5.55555556],
    [10. , 7.77777778, 8.88888889]])
     
    #53
  4. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    224
    Likes Received:
    22
    Hi,

    Since this is a multi-class classification problem, you need to use chisquare test of independence. All the files are updated in the google drive. Please open the statistics folder in the google drive to look for the chisquare method of feature selection.

    Regards,
    Samridhi
     
    #54
  5. Deepak Shanthaiah

    Joined:
    Jul 25, 2019
    Messages:
    2
    Likes Received:
    0

    Hi Samridhi,

    My question is I am not able to understand features affecting means what features?
    Also really struck and confused to solve these 2 questions so please assist me
     
    #55
  6. Harshit Sharma_2

    Joined:
    Aug 10, 2019
    Messages:
    2
    Likes Received:
    0
    how to slice row ? when loc and isin is also used
    brics.loc[1:3,brics.columns.isin(['capital','area'])]
     
    #56
  7. Pranaya Kumar Panda

    Joined:
    Aug 6, 2019
    Messages:
    2
    Likes Received:
    0
    Hi Samridhi,

    users.dat has following data format

    1::F::1::10::48067
    2::M::56::16::70072
    3::M::25::15::55117
    4::M::45::7::02460
    5::M::25::20::55455
    6::F::50::9::55117

    From the above data set I can understand the first column is User_id, next column is gender, next column is age. Last two columns I am not able to understand. Can you explain other fields?

    Similarly ratings.dat has following format

    1::1193::5::978300760
    1::661::3::978302109
    1::914::3::978301968
    1::3408::4::978300275
    1::2355::5::978824291

    From the above data set I can't understand any of the field. Can you explain the fields?

    Regards,
    Pranaya
     
    #57
  8. PRASHANT NAMDEO SHELARE

    Joined:
    Apr 5, 2019
    Messages:
    2
    Likes Received:
    0
    Hello Ma'am,

    I was doing Movielens project and I was not getting appropriate result after doing
    pd.concat()....The output is showing
    (19266, 28)
    Actual I had took 10000 records for analysis i.e (10000,10) and one hot encoding on unique_genres which also of
    (10000, 18) .. So my output should be (10000,28)....Please help me...
    I am sending my full code in pdf format... Download it and rename the extension as .ipynb
     

    Attached Files:

    #58
  9. Guru mahesh

    Guru mahesh Member

    Joined:
    Feb 18, 2019
    Messages:
    12
    Likes Received:
    0
    hello

    I am having some doubts relate to Building user-based recommendation model for Amazon project.

    There are so many NaN values presented in each Movie columns, how to i replace Nan value

    i just followed this way :

    Reduced_df=AMTV_Ratind_df.loc[AMTV_Ratind_df.columns.notnull(),AMTV_Ratind_df.columns]
    for i in range(0,206):
    Reduced_df.loc[Reduced_df[Reduced_df.columns].isnull(),Reduced_df.columns]=0

    But , if i replace Nan values with 0, MEan and Median are changes . how can i solve this
     
    #59
  10. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    224
    Likes Received:
    22
    Hi Prashant,

    pd.concat is not working here, because indices of the 2 dataframes are not same. So, you need to reset the index column and put common indices in both the dataframes, and then do the column-wise concatenation.

    Regards,
    Samridhi
     
    #60
  11. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    224
    Likes Received:
    22
    Hi Guru,

    For this, don't consider the movies rated NaN, while computing the average rating.

    Regards,
    Samridhi
     
    #61
  12. Shyamanth

    Shyamanth Member

    Joined:
    Oct 7, 2019
    Messages:
    4
    Likes Received:
    0
    Hi Samridhi,

    Can you please let me know the following?

    a)How to Set up working directory? Ie., I downloaded Anaconda and installed. Launched Jupyter Notebook from Anaconda Navigator. How to setup working directory?

    b) What does Files, Running, Clusters mean inside Jupyter Notebook? How to create a new directory and set it as default working directory?

    c) What are Terminals and Notebooks? How to open Terminals and Notebooks? How many terminals and notebooks can be opened at a time?

    d) How many Python windows or instances cane be opened at a time? What is the technical name of a Python Window?

    Thanks
    Shyam
     
    #62
  13. Shyamanth

    Shyamanth Member

    Joined:
    Oct 7, 2019
    Messages:
    4
    Likes Received:
    0
    Hi Samridhi,

    Webex is not allowing to join today (11th Oct) class. Later it showed session is not started. Is there a class today? I have sent an email to support team to provide access. Can you please share today's class recordings in google drive?

    Thanks
    Shyam
     
    #63
  14. Soumyabrata Roy

    Joined:
    Aug 21, 2019
    Messages:
    8
    Likes Received:
    0
    Hi Samridhi
    Couple of questions on the movielens project.
    1. User Age Distribution
    a: Once we get the age distribution with the value_count method, how do we plot that the same in a histogram/bar graph?
    b: How do I convert a value.count method into a pandas dataframe?

    Features affecting the ratings, is this an interpretation from the Data or do we need to code something here?

    Model building, what needs to be done here?

    Any particular cheatsheet we have on brackets? Like which brackets to use and when?It's one of the part which I find quite confusing.
     
    #64
    Last edited: Oct 21, 2019
  15. Aakarsh Nair

    Aakarsh Nair New Member

    Joined:
    Oct 4, 2019
    Messages:
    1
    Likes Received:
    0
    Hello ma'am
    How can I access internet related problems/network issues from the column 'Customer Complaint'
    which has a string of words and please tell me how can i access records based monthly from column 'Date_month_year' and daily from date column
     

    Attached Files:

    #65
  16. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    224
    Likes Received:
    22
    1. User Age Distribution
    a: Once we get the age distribution with the value_count method, how do we plot that the same in a histogram/bar graph?
    - use matplotlib library as discussed in data visualization class
    b: How do I convert a value.count method into a pandas dataframe?
    - use pd.DataFrame function

    Features affecting the ratings, is this an interpretation from the Data or do we need to code something here?
    - we need to use feature selection techniques like anova / chisq / lin re / log re

    Model building, what needs to be done here?
    Create prediction models like lin re / log re /knn

    Any particular cheatsheet we have on brackets? Like which brackets to use and when?It's one of the part which I find quite confusing.[/QUOTE]


     
    #66
  17. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    224
    Likes Received:
    22
    Regarding brackets:
    [] - represents lists
    ()- tuples
    {} - dictionaries / sets

    [] - also used for subsetting / slicing data structures
    () - also used to enclose function arguments

    [/QUOTE]
     
    #67
  18. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    224
    Likes Received:
    22
    Hi Aakarsh,

    Please use regular expressions to search for the search string. Please refer to pandas df session. There we have discussed how to extract some search string using regular expressions.
     
    #68
  19. Soumyabrata Roy

    Joined:
    Aug 21, 2019
    Messages:
    8
    Likes Received:
    0
    Hi Samridhi
    I have submitted the project. I have zipped the ipynb file and attached it in the submit section. Is there anything else I need to do? Please let me know. I have attached the here as well for reference.
     

    Attached Files:

    #69
  20. Shyamanth

    Shyamanth Member

    Joined:
    Oct 7, 2019
    Messages:
    4
    Likes Received:
    0
    In Movie lens projects, for feature engineering for 'Find out all the unique genres (Hint: split the data in column genre making a
    list and then process the data to find out only the unique categories of genres)' I'm scripting as below and get error.
    Can you please let me know what I am doing wrong?

    MovieLensData.Genres = MovieLensData.Genres.str.split("|") # where MovieLensData is my master data




    Error
    ---------------------------------------------------------------------------
    AttributeError Traceback (most recent call last)
    <ipython-input-119-c3c47e3a006b> in <module>
    ----> 1 MovieLensData.Genres = MovieLensData.Genres.str.split("|")

    ~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
    5061 if (name in self._internal_names_set or name in self._metadata or
    5062 name in self._accessors):
    -> 5063 return object.__getattribute__(self, name)
    5064 else:
    5065 if self._info_axis._can_hold_identifiers_and_holds_name(name):

    ~\Anaconda3\lib\site-packages\pandas\core\accessor.py in __get__(self, obj, cls)
    169 # we're accessing the attribute of the class, i.e., Dataset.geo
    170 return self._accessor
    --> 171 accessor_obj = self._accessor(obj)
    172 # Replace the property with the accessor object. Inspired by:
    173 # http://www.pydanny.com/cached-property.html

    ~\Anaconda3\lib\site-packages\pandas\core\strings.py in __init__(self, data)
    1794
    1795 def __init__(self, data):
    -> 1796 self._validate(data)
    1797 self._is_categorical = is_categorical_dtype(data)
    1798

    ~\Anaconda3\lib\site-packages\pandas\core\strings.py in _validate(data)
    1816 # (instead of test for object dtype), but that isn't practical for
    1817 # performance reasons until we have a str dtype (GH 9343)
    -> 1818 raise AttributeError("Can only use .str accessor with string "
    1819 "values, which use np.object_ dtype in "
    1820 "pandas")

    AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
     
    #70
  21. Venkatarao Kuna

    Venkatarao Kuna New Member

    Joined:
    Apr 25, 2019
    Messages:
    1
    Likes Received:
    0
    1. As requested by trainer Samridhi to post problem discription to print series 1, 11, 21, 1211, 111221,312211, ...........
    1 -------------Take 1 as input [now how many 1s we have count from left -> one one -> so print 11]
    11 -----------Take 11 as input now [ now how many 1s we have count from left -> two ones-> so print 21]
    21 -----------Take 21 as input now [now how many 2s in 1st place from left -> one two and how many ones after it -> one one -> so print 1211]
    1211 ------- Take 1211 as input now and repeat as above

    PFA Java code for above is uploaded as file
     

    Attached Files:

    #71
  22. Anubrata Das

    Anubrata Das Member

    Joined:
    Sep 25, 2019
    Messages:
    3
    Likes Received:
    0
    #72
  23. Anubrata Das

    Anubrata Das Member

    Joined:
    Sep 25, 2019
    Messages:
    3
    Likes Received:
    0
    2nd Nov-7th Dec batch
    this is the code for printing 1s-
    for idx in range(4):
    n=4
    while n>=0:
    lst=[]
    for num in range(n):
    lst.append(str(1))
    print("".join(lst).rjust(4))
    n=n-1
     
    #73
  24. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    224
    Likes Received:
    22
    T
    Thanks Venkatarao! :)
     
    #74
  25. ASHIK S R

    ASHIK S R Member

    Joined:
    Jul 26, 2019
    Messages:
    5
    Likes Received:
    0
    Hi Samridhi,

    I have got an accuracy of about 1.8214 using MSE while I tried to get predictions using Linear Regression for the assignment 'Evaluate the Ad Budget Dataset of XYZ Firm'.
    Is this accuracy figure is in an acceptable range?
     
    #75
  26. Anubrata Das

    Anubrata Das Member

    Joined:
    Sep 25, 2019
    Messages:
    3
    Likes Received:
    0
    for the movielens project, since there are so many categorical variables, should we not model based on logistic regression ?
     
    #76
  27. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    224
    Likes Received:
    22
    Hi Ashik,

    Please calculate the R2 value, and check accuracy in percentage.

    Regards,
    Samridhi
     
    #77
  28. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    224
    Likes Received:
    22
    Yes you can use logistic regression or knn.

    Regards,
    Samridhi
     
    #78
  29. _6781

    _6781 Member
    Alumni

    Joined:
    Apr 26, 2017
    Messages:
    9
    Likes Received:
    0
    Hi, I m pursuing a machine learning course and want to know what are the machine learning applications that are helpful in market research or data mining.
    Please, anyone, help me out for this.
     
    #79
  30. Y Nandigam

    Y Nandigam New Member

    Joined:
    Sep 30, 2019
    Messages:
    1
    Likes Received:
    0
    Hi,

    For the titanic data set, I have assigned as df.sex=male by mistake , in that case how could i correct it so that i can have the original data itself
     
    #80
  31. T SAI MAHESH

    T SAI MAHESH Member

    Joined:
    Sep 19, 2019
    Messages:
    2
    Likes Received:
    0
    #81
  32. T SAI MAHESH

    T SAI MAHESH Member

    Joined:
    Sep 19, 2019
    Messages:
    2
    Likes Received:
    0

    Attached Files:

    #82
  33. Akhil_93

    Akhil_93 Customer
    Customer

    Joined:
    May 9, 2020
    Messages:
    3
    Likes Received:
    0
    Statistics for data science
    Hi Akhil, Greetings from Simplilearn, We have gone through your course and would like to inform you that the Statistics for Data Science course is not included in your subscription. Kindly contact the SPOC in your organization for Simplilearn courses. Hope this helps. Please reach out to us for any assistance. We are here to help. -- Regards, Jibin Dev Team Simplilearn www.simplilearn.com US: +1 844 532 7688 | India: 1800-212-7688 ref:_00D28sMrr._5002x3aoOH:ref
     
    #83
  34. Vaibhav Bhardwaj

    Joined:
    Mar 18, 2020
    Messages:
    4
    Likes Received:
    0
    Screen Shot 2020-05-15 at 12.10.07 PM.png Screen Shot 2020-05-15 at 12.10.07 PM.png

    Can someone please help me understand why the count of True is coming out to be 3?
     
    #84
  35. Vaibhav Bhardwaj

    Joined:
    Mar 18, 2020
    Messages:
    4
    Likes Received:
    0
    Assignment #1
    1 1 1 1 1
    _ 1 1 1 1
    _ _ 1 1 1
    _ _ _ 1 1
    _ _ _ _ 1


    CODE:
    number = int(input('Enter the number of rows:'))

    for i in range(number):
    for j in range(number):
    if j >= i:
    print(1, end = '')
    else:
    print(' ', end = '')
    print('\n')
     
    #85
  36. _77573

    _77573 New Member

    Joined:
    Apr 30, 2020
    Messages:
    1
    Likes Received:
    0
    Unable to see any live classes today 5/16/2020
     
    #86
  37. Akhil_93

    Akhil_93 Customer
    Customer

    Joined:
    May 9, 2020
    Messages:
    3
    Likes Received:
    0
    Not
    Ticket number: 00570870
     
    #87
  38. Shraddha Khatavkar

    Joined:
    May 15, 2020
    Messages:
    8
    Likes Received:
    0
    Hi Samridhi Ma'am,

    Kindly note that, I'm not able to receive any file which you shared with G-Drive.

    Please find my email-ID below:

    ID: shraddhakulkarni2312@gmail.com

    Regards,
    Shraddha Kulkarni (Khatavkar)
     
    #88
  39. Shraddha Khatavkar

    Joined:
    May 15, 2020
    Messages:
    8
    Likes Received:
    0
    Hi Samridhi ma'am,

    The path " %config IPCompleter.greedy = True " for numpy arrays in python is not working.
    please help for the same.

    Thank You.
    Shraddha Khatavkar.
     
    #89
  40. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    224
    Likes Received:
    22
    Hi Shraddha,

    Please sign-in to google using gmail id. Then you'll be able to access the google drive at the shared link.

    Regards,
    Samridhi
     
    #90
  41. Vaibhav Bhardwaj

    Joined:
    Mar 18, 2020
    Messages:
    4
    Likes Received:
    0
    Screen Shot 2020-05-22 at 6.06.22 PM.png

    in line 40, we gave the input as a raw string while in line 4 we didn't. Both are producing the same result. Can someone please let me know the difference between the two?
     
    #91
  42. Shraddha Khatavkar

    Joined:
    May 15, 2020
    Messages:
    8
    Likes Received:
    0
    Hi Samridhi Ma'am,

    Unable to open lab. The screen shows error
    Debug error: Expired timestamp, yours 1590155715, ours 1590154975
     
    #92
  43. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    224
    Likes Received:
    22
    Hi,

    You are expected to download the contents of the google drive within the same week / month. After that we delete, in order to free up space. Anyway, you can access the contents from my other batches in the following location:
    https://drive.google.com/drive/folders/1KVpakwhPR0zuH-LGlTIHhmte-bSKEwou

    Regards,
    Samridhi
     
    #93
  44. _76794

    _76794 Member

    Joined:
    Apr 24, 2020
    Messages:
    2
    Likes Received:
    0

    for clarification 1:
    for x in range(len(test)):
    print(type(test[x]))
     
    #94
  45. _76794

    _76794 Member

    Joined:
    Apr 24, 2020
    Messages:
    2
    Likes Received:
    0
    for

    for clarifications 1 & 3:

    for x in range(len(test)):
    print(type(test[x]))
    print("lenght of " + str(type(test[x]))+ " is "+str(len(test[x])))
     
    #95
  46. Shraddha Khatavkar

    Joined:
    May 15, 2020
    Messages:
    8
    Likes Received:
    0
    Hello Ma'am,

    I'm unable to download anaconda navigator on laptop. Please help for the same.
    And also the following statements are not working :
    import pandas as pd
    import numpy as np
    please help for the same.

    Regards,
    Shraddha
     
    #96
  47. _51111

    _51111 New Member

    Joined:
    Dec 7, 2018
    Messages:
    1
    Likes Received:
    0
    Hi Samridhi i have few questions regarding movilens project.
    1. I want to do a pivot table to understand age distribution. I can use User ID as values for aggregate function. But i am not geting correct code for it. When using group by the following codes are working
    Master_Data_2.groupby('Age Group')[['UserID']].nunique()
    Master_Data_2.groupby('Gender')[['UserID']].nunique()
    Master_Data_2.groupby('Occupation')[['UserID']].nunique()

    where as using pivot table nunique is throwing an error
    2. In the session I asked how to split Movie Name column into two columns: Movie and Year. I tried it using str.extract and regular expression.The output only solved half of the problem. As you can see in figure that year column is as expected but the movie column is unsatisfactory.
    Please help in above two doubts. Str Extract Error.JPG
    Pivot Table Error.JPG
    Movilens DF.JPG
     
    #97
  48. Shraddha Khatavkar

    Joined:
    May 15, 2020
    Messages:
    8
    Likes Received:
    0
    Hello Ma'am,
    I m doing Walmart Project. In last question how to build linear regression model And also want to know which linear regression model is fitted ?
    Please help for the same.

    Regards,
    Shraddha
     
    #98
  49. Darsh Chetan Thakker

    Alumni

    Joined:
    Nov 22, 2019
    Messages:
    1
    Likes Received:
    0
    hii,
    mam how to find percentage of complaints resolved till date, which were received through the Internet and customer care calls in comcast project ?
     
    #99
    Last edited: May 29, 2020 at 9:29 AM
  50. Gaurav kumar_51

    Joined:
    Mar 21, 2020
    Messages:
    7
    Likes Received:
    0
    Hi Samridhi,

    while working on walmart project,

    Linear Regression – Utilize variables like date and restructure dates as 1 for 5 Feb 2010 (starting from the earliest date in order).

    Y-Intercept:
    139639575.33440715

    Root Mean Square Error(rmsd)
    552459.8953123686

    R^2 Value:
    0.03433024544837493

    x=walmart.drop(['Weekly_Sales','Store','Date','Semester'],axis=1)

    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state= 0)

    please help me with that R square value is not close to 1, which is not good fit for model.
     
    #100

Share This Page