Data Science with Python | Samridhi

Discussion in 'Big Data and Analytics' started by Nishant_Singh, Jun 3, 2019.

  1. Lokesh Gowda S

    Lokesh Gowda S New Member

    Joined:
    Jul 12, 2019
    Messages:
    1
    Likes Received:
    0
    iam completed movie lens project till
    3) determine the feature effecting the particular rating as showed in the below screenshot
    i was been strucked in no 3) question as below screenshort plz kindly help me out of this wt the nxt step to do and also how to find the 4) develop the appropriate model to predict the movie rating, help me by the coding formates movie lens method.png querry.png
     
    #51
    Last edited: Aug 28, 2019
  2. ASHIK S R

    ASHIK S R Member

    Joined:
    Jul 26, 2019
    Messages:
    3
    Likes Received:
    0
    Aug 24 - Sep 28 Batch - Assignment 1
    Correct me to arrange this properly.

    i = 4
    while i > 0:
    print("#")
    j = i
    while j > 0:
    print("1")
    j -= 1
    print("\n")
    i -= 1
    print(""*i)

    o/p:


    #
    1
    1
    1
    1



    #
    1
    1
    1



    #
    1
    1



    #
    1
     
    #52
  3. Harshit Sharma_2

    Joined:
    Aug 10, 2019
    Messages:
    2
    Likes Received:
    0
    Hello Mam,

    Can you please assist me how this is working a[-1:-2:-1, :]
    -----------------------a----
    array([[ 0. , 1.11111111, 2.22222222],
    [ 3.33333333, 4.44444444, 5.55555556],
    [10. , 7.77777778, 8.88888889]])
     
    #53
  4. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    193
    Likes Received:
    21
    Hi,

    Since this is a multi-class classification problem, you need to use chisquare test of independence. All the files are updated in the google drive. Please open the statistics folder in the google drive to look for the chisquare method of feature selection.

    Regards,
    Samridhi
     
    #54
  5. Deepak Shanthaiah

    Joined:
    Jul 25, 2019
    Messages:
    2
    Likes Received:
    0

    Hi Samridhi,

    My question is I am not able to understand features affecting means what features?
    Also really struck and confused to solve these 2 questions so please assist me
     
    #55
  6. Harshit Sharma_2

    Joined:
    Aug 10, 2019
    Messages:
    2
    Likes Received:
    0
    how to slice row ? when loc and isin is also used
    brics.loc[1:3,brics.columns.isin(['capital','area'])]
     
    #56
  7. Pranaya Kumar Panda

    Joined:
    Aug 6, 2019
    Messages:
    2
    Likes Received:
    0
    Hi Samridhi,

    users.dat has following data format

    1::F::1::10::48067
    2::M::56::16::70072
    3::M::25::15::55117
    4::M::45::7::02460
    5::M::25::20::55455
    6::F::50::9::55117

    From the above data set I can understand the first column is User_id, next column is gender, next column is age. Last two columns I am not able to understand. Can you explain other fields?

    Similarly ratings.dat has following format

    1::1193::5::978300760
    1::661::3::978302109
    1::914::3::978301968
    1::3408::4::978300275
    1::2355::5::978824291

    From the above data set I can't understand any of the field. Can you explain the fields?

    Regards,
    Pranaya
     
    #57
  8. PRASHANT NAMDEO SHELARE

    Joined:
    Apr 5, 2019
    Messages:
    2
    Likes Received:
    0
    Hello Ma'am,

    I was doing Movielens project and I was not getting appropriate result after doing
    pd.concat()....The output is showing
    (19266, 28)
    Actual I had took 10000 records for analysis i.e (10000,10) and one hot encoding on unique_genres which also of
    (10000, 18) .. So my output should be (10000,28)....Please help me...
    I am sending my full code in pdf format... Download it and rename the extension as .ipynb
     

    Attached Files:

    #58
  9. Guru mahesh

    Guru mahesh Member

    Joined:
    Feb 18, 2019
    Messages:
    12
    Likes Received:
    0
    hello

    I am having some doubts relate to Building user-based recommendation model for Amazon project.

    There are so many NaN values presented in each Movie columns, how to i replace Nan value

    i just followed this way :

    Reduced_df=AMTV_Ratind_df.loc[AMTV_Ratind_df.columns.notnull(),AMTV_Ratind_df.columns]
    for i in range(0,206):
    Reduced_df.loc[Reduced_df[Reduced_df.columns].isnull(),Reduced_df.columns]=0

    But , if i replace Nan values with 0, MEan and Median are changes . how can i solve this
     
    #59
  10. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    193
    Likes Received:
    21
    Hi Prashant,

    pd.concat is not working here, because indices of the 2 dataframes are not same. So, you need to reset the index column and put common indices in both the dataframes, and then do the column-wise concatenation.

    Regards,
    Samridhi
     
    #60
  11. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    193
    Likes Received:
    21
    Hi Guru,

    For this, don't consider the movies rated NaN, while computing the average rating.

    Regards,
    Samridhi
     
    #61
  12. Shyamanth

    Shyamanth Member

    Joined:
    Oct 7, 2019
    Messages:
    4
    Likes Received:
    0
    Hi Samridhi,

    Can you please let me know the following?

    a)How to Set up working directory? Ie., I downloaded Anaconda and installed. Launched Jupyter Notebook from Anaconda Navigator. How to setup working directory?

    b) What does Files, Running, Clusters mean inside Jupyter Notebook? How to create a new directory and set it as default working directory?

    c) What are Terminals and Notebooks? How to open Terminals and Notebooks? How many terminals and notebooks can be opened at a time?

    d) How many Python windows or instances cane be opened at a time? What is the technical name of a Python Window?

    Thanks
    Shyam
     
    #62
  13. Shyamanth

    Shyamanth Member

    Joined:
    Oct 7, 2019
    Messages:
    4
    Likes Received:
    0
    Hi Samridhi,

    Webex is not allowing to join today (11th Oct) class. Later it showed session is not started. Is there a class today? I have sent an email to support team to provide access. Can you please share today's class recordings in google drive?

    Thanks
    Shyam
     
    #63
  14. Soumyabrata Roy

    Joined:
    Aug 21, 2019
    Messages:
    8
    Likes Received:
    0
    Hi Samridhi
    Couple of questions on the movielens project.
    1. User Age Distribution
    a: Once we get the age distribution with the value_count method, how do we plot that the same in a histogram/bar graph?
    b: How do I convert a value.count method into a pandas dataframe?

    Features affecting the ratings, is this an interpretation from the Data or do we need to code something here?

    Model building, what needs to be done here?

    Any particular cheatsheet we have on brackets? Like which brackets to use and when?It's one of the part which I find quite confusing.
     
    #64
    Last edited: Oct 21, 2019
  15. Aakarsh Nair

    Aakarsh Nair New Member

    Joined:
    Oct 4, 2019
    Messages:
    1
    Likes Received:
    0
    Hello ma'am
    How can I access internet related problems/network issues from the column 'Customer Complaint'
    which has a string of words and please tell me how can i access records based monthly from column 'Date_month_year' and daily from date column
     

    Attached Files:

    #65
  16. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    193
    Likes Received:
    21
    1. User Age Distribution
    a: Once we get the age distribution with the value_count method, how do we plot that the same in a histogram/bar graph?
    - use matplotlib library as discussed in data visualization class
    b: How do I convert a value.count method into a pandas dataframe?
    - use pd.DataFrame function

    Features affecting the ratings, is this an interpretation from the Data or do we need to code something here?
    - we need to use feature selection techniques like anova / chisq / lin re / log re

    Model building, what needs to be done here?
    Create prediction models like lin re / log re /knn

    Any particular cheatsheet we have on brackets? Like which brackets to use and when?It's one of the part which I find quite confusing.[/QUOTE]


     
    #66
  17. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    193
    Likes Received:
    21
    Regarding brackets:
    [] - represents lists
    ()- tuples
    {} - dictionaries / sets

    [] - also used for subsetting / slicing data structures
    () - also used to enclose function arguments

    [/QUOTE]
     
    #67
  18. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    193
    Likes Received:
    21
    Hi Aakarsh,

    Please use regular expressions to search for the search string. Please refer to pandas df session. There we have discussed how to extract some search string using regular expressions.
     
    #68
  19. Soumyabrata Roy

    Joined:
    Aug 21, 2019
    Messages:
    8
    Likes Received:
    0
    Hi Samridhi
    I have submitted the project. I have zipped the ipynb file and attached it in the submit section. Is there anything else I need to do? Please let me know. I have attached the here as well for reference.
     

    Attached Files:

    #69
  20. Shyamanth

    Shyamanth Member

    Joined:
    Oct 7, 2019
    Messages:
    4
    Likes Received:
    0
    In Movie lens projects, for feature engineering for 'Find out all the unique genres (Hint: split the data in column genre making a
    list and then process the data to find out only the unique categories of genres)' I'm scripting as below and get error.
    Can you please let me know what I am doing wrong?

    MovieLensData.Genres = MovieLensData.Genres.str.split("|") # where MovieLensData is my master data




    Error
    ---------------------------------------------------------------------------
    AttributeError Traceback (most recent call last)
    <ipython-input-119-c3c47e3a006b> in <module>
    ----> 1 MovieLensData.Genres = MovieLensData.Genres.str.split("|")

    ~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
    5061 if (name in self._internal_names_set or name in self._metadata or
    5062 name in self._accessors):
    -> 5063 return object.__getattribute__(self, name)
    5064 else:
    5065 if self._info_axis._can_hold_identifiers_and_holds_name(name):

    ~\Anaconda3\lib\site-packages\pandas\core\accessor.py in __get__(self, obj, cls)
    169 # we're accessing the attribute of the class, i.e., Dataset.geo
    170 return self._accessor
    --> 171 accessor_obj = self._accessor(obj)
    172 # Replace the property with the accessor object. Inspired by:
    173 # http://www.pydanny.com/cached-property.html

    ~\Anaconda3\lib\site-packages\pandas\core\strings.py in __init__(self, data)
    1794
    1795 def __init__(self, data):
    -> 1796 self._validate(data)
    1797 self._is_categorical = is_categorical_dtype(data)
    1798

    ~\Anaconda3\lib\site-packages\pandas\core\strings.py in _validate(data)
    1816 # (instead of test for object dtype), but that isn't practical for
    1817 # performance reasons until we have a str dtype (GH 9343)
    -> 1818 raise AttributeError("Can only use .str accessor with string "
    1819 "values, which use np.object_ dtype in "
    1820 "pandas")

    AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
     
    #70
  21. Venkatarao Kuna

    Venkatarao Kuna New Member

    Joined:
    Apr 25, 2019
    Messages:
    1
    Likes Received:
    0
    1. As requested by trainer Samridhi to post problem discription to print series 1, 11, 21, 1211, 111221,312211, ...........
    1 -------------Take 1 as input [now how many 1s we have count from left -> one one -> so print 11]
    11 -----------Take 11 as input now [ now how many 1s we have count from left -> two ones-> so print 21]
    21 -----------Take 21 as input now [now how many 2s in 1st place from left -> one two and how many ones after it -> one one -> so print 1211]
    1211 ------- Take 1211 as input now and repeat as above

    PFA Java code for above is uploaded as file
     

    Attached Files:

    #71
  22. Anubrata Das

    Anubrata Das Member

    Joined:
    Sep 25, 2019
    Messages:
    3
    Likes Received:
    0
    #72
  23. Anubrata Das

    Anubrata Das Member

    Joined:
    Sep 25, 2019
    Messages:
    3
    Likes Received:
    0
    2nd Nov-7th Dec batch
    this is the code for printing 1s-
    for idx in range(4):
    n=4
    while n>=0:
    lst=[]
    for num in range(n):
    lst.append(str(1))
    print("".join(lst).rjust(4))
    n=n-1
     
    #73
  24. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    193
    Likes Received:
    21
    T
    Thanks Venkatarao! :)
     
    #74
  25. ASHIK S R

    ASHIK S R Member

    Joined:
    Jul 26, 2019
    Messages:
    3
    Likes Received:
    0
    Hi Samridhi,

    I have got an accuracy of about 1.8214 using MSE while I tried to get predictions using Linear Regression for the assignment 'Evaluate the Ad Budget Dataset of XYZ Firm'.
    Is this accuracy figure is in an acceptable range?
     
    #75
  26. Anubrata Das

    Anubrata Das Member

    Joined:
    Sep 25, 2019
    Messages:
    3
    Likes Received:
    0
    for the movielens project, since there are so many categorical variables, should we not model based on logistic regression ?
     
    #76
  27. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    193
    Likes Received:
    21
    Hi Ashik,

    Please calculate the R2 value, and check accuracy in percentage.

    Regards,
    Samridhi
     
    #77
  28. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    193
    Likes Received:
    21
    Yes you can use logistic regression or knn.

    Regards,
    Samridhi
     
    #78
  29. _6781

    _6781 Member
    Alumni

    Joined:
    Apr 26, 2017
    Messages:
    9
    Likes Received:
    0
    Hi, I m pursuing a machine learning course and want to know what are the machine learning applications that are helpful in market research or data mining.
    Please, anyone, help me out for this.
     
    #79
  30. Y Nandigam

    Y Nandigam New Member

    Joined:
    Sep 30, 2019
    Messages:
    1
    Likes Received:
    0
    Hi,

    For the titanic data set, I have assigned as df.sex=male by mistake , in that case how could i correct it so that i can have the original data itself
     
    #80
  31. T SAI MAHESH

    T SAI MAHESH Member

    Joined:
    Sep 19, 2019
    Messages:
    2
    Likes Received:
    0
    #81
  32. T SAI MAHESH

    T SAI MAHESH Member

    Joined:
    Sep 19, 2019
    Messages:
    2
    Likes Received:
    0

    Attached Files:

    #82

Share This Page