DS Python mentoring Session | Sunny

Discussion in 'Big Data and Analytics' started by SUNNY BHAVEEN CHANDRA, Oct 9, 2019.

  1. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    82
    Likes Received:
    15
    #1
    Last edited: Oct 19, 2019
    _65977 and _66029 like this.
  2. ved prakash_15

    ved prakash_15 New Member

    Joined:
    Sep 13, 2019
    Messages:
    1
    Likes Received:
    0
    sir please share file
     
    #2
  3. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    82
    Likes Received:
    15
    Hi Ved,

    Kindly download the files from the drive link shared above.

    Regards,
    Sunny
    Teaching Assistant
     
    #3
  4. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    82
    Likes Received:
    15
  5. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    82
    Likes Received:
    15
  6. Shuvankar Goutam

    Joined:
    May 6, 2019
    Messages:
    11
    Likes Received:
    0
    Hi

    I have gone through the comcast project but I was not able to understand few things. How can I clear my doubts?
     
    #6
  7. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    82
    Likes Received:
    15
    Hi Shuvankar,

    You can ask your doubts here. Please let me know I'll help you out.

    Regards,
    Sunny
    Teaching Assistant
     
    #7
  8. Ashish kumar Gupta_6

    Joined:
    Sep 14, 2019
    Messages:
    2
    Likes Received:
    0
    Hi ,

    I am having problem while doing the assignment 1st, i am not able to understand this

    'Provide the trend chart for the number of complaints at monthly and daily granularity levels.'
     
    #8
  9. Shuvankar Goutam

    Joined:
    May 6, 2019
    Messages:
    11
    Likes Received:
    0
    Why the !pip install word count part has been done? From the liveDemo.py file from where !pip install till end I am not able to understand. This question also I have not found? What is the relevancy of this?
     
    #9
  10. L Guruprathapa Reddy

    Joined:
    Sep 16, 2019
    Messages:
    1
    Likes Received:
    0
    How do I resolve "ImportError: DLL load failed: The specified module could not be found" error. This is occurring when I try to import seaborn or gensim or sklearn etc.
     
    #10
  11. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    82
    Likes Received:
    15
    Hi L Guruprathapa Reddy,

    Please re-install these packages one by one using the following commands either in command prompt or in anaconda prompt-

    Code:
    conda install -c anaconda scikit-learn
    conda install -c anaconda seaborn
    conda install -c anaconda gensim
    
    after that restart the kernel. And in case if you still face this issue then please raise a ticket so that we can have a one on one instant session to resolve your case.

    Regards,
    Sunny,
    Ex.Teaching Assistant
     
    #11
  12. Shuvankar Goutam

    Joined:
    May 6, 2019
    Messages:
    11
    Likes Received:
    0

    Reply is awaited
     
    #12
  13. Shuvankar Goutam

    Joined:
    May 6, 2019
    Messages:
    11
    Likes Received:
    0
    Pls answer my question
     
    #13
  14. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    82
    Likes Received:
    15
    Hi Ashish,

    The query quoted by you means that you need to make a chart or graph which can show no. of cases registered on daily and monthly basis which will help us to understand the trend of the complaints registered. You'll be able to observe which month or day was crucial in terms of the no. of complaints registered.

    I hope this helps

    Regards,
    Sunny,
    Ex.Teaching Assistant
     
    #14
  15. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    82
    Likes Received:
    15
    Hi Shuvankar,

    Code:
     !pip install wordcloud 
    is used to install the library wordcloud which helps in plotting the most frequently occurring words in the textual data so that we can roughly judge which kind of the complaints were mostly registered. If you can observer from the following screenshot which was generated by wordcloud , we can roughly tell that "Internet", "Payment", "cable", "speeds" related issue were the major concerns as these were the most frequently used words in the registered complaints. NOTE: here large font means more frequency of the word

    #Screenshot
    upload_2019-11-7_17-52-41.png


    I hope this helps and if you want to know more about wordcloud please read the following blog its a very good read to know more about it -
    https://peekaboo-vision.blogspot.com/2012/11/a-wordcloud-in-python.html

    Apart from above blog its always a good choice to refer the docs for more experiment -
    http://amueller.github.io/word_cloud/index.html

    And this was an alternative answer to the analysis task "Provide a table with the frequency of complaint types." And since its better than making a table I used wordcloud.

    I hope this clarifies some of your doubts.
    Let me know if you have some other doubts.

    Regards,
    Sunny,
    Ex.Teaching Assistant
     
    #15
  16. Snigdha sagarika ray

    Joined:
    Jul 18, 2019
    Messages:
    8
    Likes Received:
    0
    Hi sunny, In my system the surprise library is not getting installed, Its taking a lot of time to run and I am not able to work on that project. its not even getting installed in kagel kernel. help me with that
     
    #16
  17. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    82
    Likes Received:
    15
    Hi Snigdha,

    Please try the following command in your cmd prompt instead of the jupyter notebook-
    Code:
    conda install -c conda-forge scikit-surprise
    
    And it will work.
    #Screenshot for your reference -
    upload_2019-11-10_1-37-48.png

    While using pip command there is some issue with surprise library. So try above mentioned conda command.

    And in case of kaggle you need to enable internet option as shown in the below screenshot and to enable that you need to verify you phone no. and after that you can install surprise library or any other library in kaggle as well. But in Kaggle Surprise library is pre-installed so no need to worry about installing it you can directly start working on the project -

    #Screenshot
    upload_2019-11-10_1-44-48.png

    I hope this helps.

    Regards,
    Sunny,
    Ex.Teaching Assistant
     

    Attached Files:

    #17
    Last edited: Nov 9, 2019
  18. Snigdha sagarika ray

    Joined:
    Jul 18, 2019
    Messages:
    8
    Likes Received:
    0
    I sunny I tried both. And again I am facing some issue. i am not able to post the screenshots over here. In command prompt after selecting proceed as y I am getting an error : The current user does not have right permission to the target environment.
    2nd in kaggle i am getting the same message as per your screenshot but when I am installing reader, accuracy package, I am getting below error:

    "No module named 'surpirse'


    + Code+ Markdown
    "
     
    #18
  19. Snigdha sagarika ray

    Joined:
    Jul 18, 2019
    Messages:
    8
    Likes Received:
    0
    I am getting the following error in kaggle and command prompt the above error
     

    Attached Files:

    #19
  20. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    82
    Likes Received:
    15
    Hi Snigdha,

    Please check the spelling -> its surprise not surpirse. After that try again it'll work in Kaggle. Plus to avoid such issues in future you can always use <tab> completion which will auto-fill the correct library or will give you suggestion.
    #referring from your screenshot -

    upload_2019-11-11_23-38-25.png

    I hope this helps.

    Regards,
    Sunny,
    Ex.Teaching Assistant
     
    #20
  21. Santanu Ghosh_1

    Joined:
    Sep 13, 2019
    Messages:
    2
    Likes Received:
    0
    Hi Sunny,
    I am going to submit the project. But have some doubt regarding what all I need to submit into Write Up and the Screenshot section. Please help me out for the same.
     
    #21
  22. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    82
    Likes Received:
    15
    Hi Santanu,

    Please follow the below procedure -
    1. Write-up: In this case just prepare a doc file in which you can mention all your thought process while solving your project. You can also mention which step worked for you and which didn't. Plus what was your final conclusion.
    2. Screenshots: Here you can just submit the screenshot of the final outcome.

    I hope this helps.

    Regards,
    Sunny,
    Ex.Teaching Assistant
     
    #22
    Santanu Ghosh_1 likes this.
  23. Jude Rodriguez

    Jude Rodriguez Active Member

    Joined:
    Jul 18, 2019
    Messages:
    17
    Likes Received:
    1
    Hi Sunny,

    Having a problem in attempting project 1 on lms for ds with python. How do I "Provide the trend chart for the number of complaints at monthly and daily granularity levels"? not sure how to proceed. Kindly advise or assist with hints. Went through your response to the queries above.

    "you need to make a chart or graph which can show no. of cases registered on daily and monthly basis which will help us to understand the trend of the complaints registered. You'll be able to observe which month or day was crucial in terms of the no. of complaints registered."

    would plotting bar plot of customer complaints vs date yield the result?
     
    #23
  24. Jude Rodriguez

    Jude Rodriguez Active Member

    Joined:
    Jul 18, 2019
    Messages:
    17
    Likes Received:
    1
    Also for "Provide a table with the frequency of complaint types", how to get a column with frequency? What changes would need to be made to 'Customer Complaints' column? is there a key word to limit it to? as in positive/negative, and then total both instances in a table?
     
    #24
  25. Santanu Ghosh_1

    Joined:
    Sep 13, 2019
    Messages:
    2
    Likes Received:
    0
    Hi Sunny,
    Do we get a project completion certificate along with the course completion certificate??
     
    #25
  26. Saumyajit

    Saumyajit Customer
    Customer

    Joined:
    Oct 18, 2019
    Messages:
    2
    Likes Received:
    0
    Hi Sunny,
    I had 2 questions please:
    1) Why am I unable to convert a CSV file into Dataframe format ?
    2) To use Scipy, why do we need to type the import function every time. Can't we import just once in the beginning, like we do for pandas, and for numpy ?
     
    #26
  27. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    82
    Likes Received:
    15
    Hi Jude,

    Your outcome should look like this -
    #For day wise trend chart - Here you can see that I have plotted a line chart for day vs no. of complaints and you can see a spike near end of June month which says that between 25 - 30th June there was lot of complaint registered.
    upload_2019-11-19_23-20-38.png

    #For Monthly trend chart - here again you can see I have plotted a horizontal bar graph for Month vs no. of complaints and it also proves that in month of June there were lot of complaints registered.
    upload_2019-11-19_23-21-42.png

    So you need to get such an outcome. You can use seaborn or matplotlib or any other choice of visualisation library. I haven't provided the code because this I'm sure you'll be able to do it one your own after seeing the outcome.

    I hope this helps.

    Regards,
    Sunny,
    Ex.Teaching Assistant
     
    #27
  28. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    82
    Likes Received:
    15
    Hi Jude,

    For this query I have already discussed in the lecture. However we have tackled this by using wordcloud and topic modelling. Please refer the lecture for this. In this question the main objective is to get the most frequent kind of complaints and it cannot be positive/negative as complaints are always negative.

    I hope this helps.

    Regards,
    Sunny,
    Ex.Teaching Assistant
     
    #28
  29. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    82
    Likes Received:
    15
    Hi Santanu,

    For this query please connect with the SL team by raising a ticket or you can also dial the toll-free no. which is available on the SL's home page at the bottom. They will definitely guide you.

    Regards,
    Sunny,
    Ex.Teaching Assistant
     
    #29
  30. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    82
    Likes Received:
    15

    Hi Saumyajit and All,


    Regarding your 1st query - Please share the screenshot or the code file here so that I can look into your case.

    And for the 2nd query - If in case you don't want to import different libraries at once then you can install "Pyforest" library which automatically imports all the ML necessary libraries at once.
    Official link for Pyforest to install and detailed demo - https://pypi.org/project/pyforest/

    # Demo in jupyter notebook -
    [​IMG]

    # Demo in python shell -

    [​IMG]




    I hope this helps.

    Regards,
    Sunny,
    Ex.Teaching Assistant
     
    #30
  31. Tarif Chowdhury

    Tarif Chowdhury New Member

    Joined:
    Jun 12, 2019
    Messages:
    1
    Likes Received:
    0
    Hi Sunny,
    If we have n number of columns then how to find out the largest numbers from each columns.
     
    #31
  32. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    82
    Likes Received:
    15
    Hi Tarif,

    In such case you can use the following codes -
    Code:
    df.describe().T["max"]
    
    in above code I'm taking transpose of df.describe() and then checking column "max". This works for any data-set.

    OR

    Simply you can use -
    Code:
    df.max()
    
    But there's one problem with this code it also returns the the max values from a column containing string values. So you need to be careful with this one.

    I hope this helps.

    Regards,
    Sunny,
    Ex.Teaching Assistant
     
    #32
    Tarif Chowdhury likes this.
  33. Ved Prakash_7

    Ved Prakash_7 Member

    Joined:
    Jun 26, 2019
    Messages:
    10
    Likes Received:
    4
    In Machine Learning Project of Boston House Price Prediction .
    I am getting Rmse value very high Even i have apply standardisation .

    from math import sqrt
    from sklearn.metrics import mean_squared_error
    rmse = sqrt(mean_squared_error(y_test, y_pred))

    print(rmse)


    output : 68949.62451074278

    I dont know where i am making mistake i request you to please go through the My pdf file . At last i am calculating RMSE getting very high RMSE value .

    Please help me to find out where i am doing wrong.
     

    Attached Files:

    #33
    SUNNY BHAVEEN CHANDRA likes this.
  34. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    82
    Likes Received:
    15
    Hi Ved,

    First of all, let me tell you that you have done everything correctly until now. So, no worries.
    Now the important part is if you look at the formula of RMSE -
    [​IMG]
    You'll observe that its nothing but mean or average at the end.
    It neither represents percentage nor probability.
    Thus it's not necessary that it'll stay within 0 and 1.
    It simply tells you how far is my prediction from the actual value on an average.

    So in your case, if the RMSE value is 68949.62451074278 then it simply means your predicted value will deviate by + or - 68949.62451074278 from an actual value. This means your model is underfitting or it not a strong model.
    So in order to reduce that you can try other strong models like -

    1. Stochastic Gradient Descent Regressor
    2. Decision Tree Regressor
    3. Random Forest Regressor etc

    Now select the better model among above and do hyperparameter tuning. If needed try regularization techniques like Lasso, Ridge or Elastic net.

    I hope this helps.

    Regards,
    Sunny,
    Ex.Teaching Assistant
     
    #34
  35. Prasoon Dhoundiyal

    Joined:
    Feb 28, 2019
    Messages:
    7
    Likes Received:
    1
    Hello,

    I am trying to apply non parametric statistical method as follows to compare multiple distributions:

    # Performing Kruskal-Wallis H Test
    import random
    from scipy.stats import kruskal
    sample = []
    data_random_samples = []
    sample_data = []
    data = data_nyc_new.groupby('City')
    for each_city in data.describe().reset_index()['City']:
    data_1 = data.get_group(each_city)['Request Closing Time']
    for y in data_1:
    sample.append(y)
    random_sample = random.sample(sample,30)
    print('sample_'+ each_city,':','\n', random_sample, '\n')
    data_random_samples.append(random_sample)

    stat, p = kruskal(data_random_samples[0],data_random_samples[1])
    print('Statistics ', ': ', (stat, p))
    if p > 0.05:
    print('Probably the same distribution')
    else:
    print('Probably different distributions')

    There are 48 lists within list -- data_random_samples. Instead of manually writing all the 48 lists within Krushkal function, can anyone help me out with a function or a better method?
    Looking forward to a quick reply.
     
    #35
    SUNNY BHAVEEN CHANDRA likes this.
  36. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    82
    Likes Received:
    15
    Hi Prasoon,

    You can use python's "combinations" function from "itertools" library to generate various combinations possible. See the following screenshot for an example:-

    upload_2020-1-20_11-2-7.png

    So, in your case, you can create a function like below instead of manually writing all the 48 lists within Krushkal function -

    upload_2020-1-20_11-12-41.png

    Adapt as per your needs.

    I hope this helps.

    Regards,
    Sunny,
    Ex.Teaching Assistant
     

    Attached Files:

    #36
    Prasoon Dhoundiyal likes this.
  37. Prasoon Dhoundiyal

    Joined:
    Feb 28, 2019
    Messages:
    7
    Likes Received:
    1
    Thank you. I will check out the itertools. I wanted to do as following:

    stat, p = kruskal(data_random_samples[0],data_random_samples[1],data_random_samples[3],data_random_samples[4],\
    data_random_samples[5],data_random_samples[6],data_random_samples[7],data_random_samples[8],\
    data_random_samples[9],data_random_samples[10],data_random_samples[11],data_random_samples[12],\
    data_random_samples[13],data_random_samples[14],data_random_samples[15],data_random_samples[16],\
    data_random_samples[17],data_random_samples[18],data_random_samples[19],data_random_samples[20])

    Instead of writing these 20 lists within krushkal test separately, I was looking to shorten it using a funtion may be such that statement reads:
    stat,p = krushkal(all 20 lists may be using a function)
     
    #37
    Last edited: Jan 28, 2020
  38. Prasoon Dhoundiyal

    Joined:
    Feb 28, 2019
    Messages:
    7
    Likes Received:
    1
     
    #38
  39. Prasoon Dhoundiyal

    Joined:
    Feb 28, 2019
    Messages:
    7
    Likes Received:
    1
    Few questions on concepts:

    1 -- Scaling: Minmax scalar vs Standard scalar vs Normalizer vs Power Transformer vs Robust scalar vs Binarizer. How do we decide which one to use?
    2 -- How do we decide the jumps towards minima in gradient descent? Also, Library/ function for Gradient Descent.
    3 -- Fitting of the curve in python - kindly share the library/ function.
    4 -- Kindly speak about interpretation of f1 score a bit.
    5 -- Train and test dataset are fine. What about validation data set? How do we implement validation data set in python?
    6 - How to decide on alpha value in ridge regression. Kindly talk about alpha value.
     
    #39
  40. Aditya Vyas_1

    Aditya Vyas_1 Member

    Joined:
    Jan 9, 2020
    Messages:
    8
    Likes Received:
    0
    Hello Sunny

    I am currently in the DS with Python batch.
    I need your help in solving the problem I am facing while installing ANACONDA PYTHON on my system. The CMD prompt says CONDA is not recognised command and there is no ANACONDA NAVIGATOR APP icon anywhere on my system. I have done this thrice but the same problem.

    Please help.
     
    #40
  41. Leena Vig

    Leena Vig New Member

    Joined:
    Dec 17, 2019
    Messages:
    1
    Likes Received:
    0
    Hello,
    I had a doubt regarding the move ratings project(project 2)
    There are two datasets given one has the movie names and movieIDs the other has the ratings of the movies and the movieIDs..however i did .shape and found that the entries in the ratings file are way less than those in the movies file. since we are asked to find the movies that are having max ratings(which is 5) .....there will be some movies in the ratings file to which we cannot assign a movie name...can u help here..i was wondering ..is this a mistake in the question or am i missing something here

    also what do they mean by saying average rating?? The ratings for each movie are given as a single digit number 1,2,3,4 or 5.

    Thank you
    -Leena
     
    #41
  42. KIran_1411

    KIran_1411 New Member

    Joined:
    Jun 26, 2019
    Messages:
    1
    Likes Received:
    0
    hi, im tryig to use line line /bar chart options but Line option is missing from SAS university URL im using .
    Please suggest
     
    #42
  43. Kaustubh Bailoor

    Joined:
    Dec 5, 2019
    Messages:
    2
    Likes Received:
    0
    Hello,
    I am currently in the DS with Python batch and working on the Movielens project. The final activity is to develop a model to predict the movie ratings. I am getting an accuracy of 35%, and am not sure how to go about rectifying this. Can you please help?

    PS: I used Logistic Regression model. Hope this is correct one I used.

    Appreciate all your help with this.

    Thanks
     
    #43
  44. Sanket kumar Samal

    Sanket kumar Samal New Member

    Joined:
    Dec 10, 2019
    Messages:
    1
    Likes Received:
    0
    #44

Share This Page