Data Science with Python | Sumeet

Discussion in 'Big Data and Analytics' started by Sumeet Vyas, Sep 1, 2019.

  1. Sumeet Vyas

    Sumeet Vyas Member

    Joined:
    Aug 21, 2019
    Messages:
    9
    Likes Received:
    0
    #1
    Last edited: Sep 1, 2019
  2. Sumeet Vyas

    Sumeet Vyas Member

    Joined:
    Aug 21, 2019
    Messages:
    9
    Likes Received:
    0
    Have updated the Assignments from the coursework for the lessons of Numpy, Scipy and Pandas in the above Google Drive Link along with the template Jupyter Notebooks / Python Codes for those who couldn't find the notebook in the Simplilearn LMS Portal. Request all of you to complete the assignments, so that we can have discussions on the same in the coming sessions.

    Google Drive Link:
    https://drive.google.com/drive/folders/1RZPC86ZRVCmSbIdyxVjCLAdnhUuia1r0?usp=sharing
     
    #2
  3. Afzal_10

    Afzal_10 Member

    Joined:
    Jun 28, 2019
    Messages:
    2
    Likes Received:
    0
    Good Afternoon Sir,
    I am unable to get Dataset to solve the assignment
     
    #3
  4. Afzal_10

    Afzal_10 Member

    Joined:
    Jun 28, 2019
    Messages:
    2
    Likes Received:
    0
    Hello Sir,

    Numpy Assignment 1 & 2 Completed. How can i submit my assignment.
     
    #4
  5. Sumeet Vyas

    Sumeet Vyas Member

    Joined:
    Aug 21, 2019
    Messages:
    9
    Likes Received:
    0
    Hello Afzal,

    Glad to hear that! The assignments are few exercises that will help us to complete the final projects. Hence the assignments need not be submitted for now. Also, you can refer to the Self Learning Tab of the LMS portal where the videos walk you through the solutions for the assignments. If there are any bottlenecks or doubts related to the assignments, we can discuss the same in the live lectures.

    Happy Learning!
     
    #5
  6. Vikas Kanodia_1

    Joined:
    May 23, 2019
    Messages:
    12
    Likes Received:
    0
    Hi,
    I have data of my ticket logged in our portal for software support. Now i have to find the common/repeated queries. So how python or other technology can help me to get this shorted out ? Please guide.

    Regards,
    Vikas.
     
    #6
  7. Avi Garg

    Avi Garg Member

    Joined:
    Aug 21, 2019
    Messages:
    2
    Likes Received:
    0
    Hi All,

    Can someone suggest some good book of Mathematics for Data Science.
     
    #7
  8. Sumeet Vyas

    Sumeet Vyas Member

    Joined:
    Aug 21, 2019
    Messages:
    9
    Likes Received:
    0
    Hi Vikas,
    I would like to know more about this portal which receives the tickets? Is it HP-SM9 or anything else?
     
    #8
  9. Sumeet Vyas

    Sumeet Vyas Member

    Joined:
    Aug 21, 2019
    Messages:
    9
    Likes Received:
    0
    Hello Avi,

    One the books I have referred for the Mathematics is "Elements of Statistical Learning", however the book is Mathematially very intensive. There is course on Simplilearn platform with the name "Machine Learning" which I believe covers required the level of Mathematics.
     
    #9
  10. Vikas Kanodia_1

    Joined:
    May 23, 2019
    Messages:
    12
    Likes Received:
    0
    Hi,

    We are using sapphire ticketing system and i can download all the tickets in excel format.
     
    #10
  11. Sumeet Vyas

    Sumeet Vyas Member

    Joined:
    Aug 21, 2019
    Messages:
    9
    Likes Received:
    0
    Hi Vikas,

    That's good if you can download the data in excel. That means the data can be easily available and can be loaded in dataframe to be preprocessed.
     
    #11
  12. _42445

    _42445 Member

    Joined:
    Oct 1, 2018
    Messages:
    3
    Likes Received:
    0
    Hi Sumeet,

    Could not raise this question during the session, wasn't sure if this has any relevance to others.

    I am nearly completing my masters course in Data science with SimpliLearn. I am a little skeptical if with my qualification (Commerce) & work experience that I carry (Customer Support - Quality Performance Analysis) can get me any where in DATA science field in near future.

    Any help/re-direction is greatly appreciated.
     
    #12
  13. deolgs

    deolgs Member
    Alumni

    Joined:
    Jul 9, 2019
    Messages:
    2
    Likes Received:
    1
    Hi Sumeet,

    I stuck somewhere in the movielens project and I need your help.
    Topic: Top 25 movies by viewership rating

    df = pd.DataFrame(master_data[["Title","Rating"]]) # Getting Title (Movie Title) and Rating into different DataFrame

    sum_of_rating = df.groupby("Title")["Rating"].sum() # Here I'm using grouping on Movie Title and getting sum of Rating.

    But after this I need to sort Movies by Sum of Rating, means higher the rating movie should come first.


    Kind regards,
    Gurpreet Deol
     
    #13
  14. ANUPAM GHOSH_1

    Joined:
    Jun 27, 2019
    Messages:
    3
    Likes Received:
    0

    Hi Gurpreet,

    To sort the data frame in desc order you can use the below:

    df_mv_Rt=df_mv_Rt.sort_values(['Rating'],ascending=False)

    Hope this helps.

    Thanks,
    Anupam Ghosh
     
    #14
  15. ANUPAM GHOSH_1

    Joined:
    Jun 27, 2019
    Messages:
    3
    Likes Received:
    0
    Hi Sumeet,

    I have a query regarding the Movie Lens project. Should we use Age and Occupation as a single feature in our model or create separate binary feature for each group like it has been asked to do for the Genere attribute.
    What is the benefit of creating different binary attributes from a single attribute?

    Thanks in advance,
    Anupam Ghosh
     
    #15
  16. Sumeet Vyas

    Sumeet Vyas Member

    Joined:
    Aug 21, 2019
    Messages:
    9
    Likes Received:
    0

    Hi Anupam,

    The binarsation of variables is usually used when you have lot of different data in a given column. What it will help you do is generalize the model a bit better. Although, a lot of it depends on the kind of data we are working on.You can raise this doubt in the upcoming session and I will cover this doubt.
     
    #16
  17. Sumeet Vyas

    Sumeet Vyas Member

    Joined:
    Aug 21, 2019
    Messages:
    9
    Likes Received:
    0
     
    #17
  18. Sumeet Vyas

    Sumeet Vyas Member

    Joined:
    Aug 21, 2019
    Messages:
    9
    Likes Received:
    0
    Got this question in the yesterday's Session -

    "
    I want to identify repeat callers for a call centre - I have the dataset with two columns - Column A phone number, Column B date & time of the call
    Excel has freedom to sort by phone number then by date time (sorting is not my problem) and then refer with formula referring to cell one below the other. Example if(and(PhoneNumberInA1=PhoneNumberInA2,(CallDateTimeIbB1-CallDateTimeinB2)>1),"Repeat","First time Call". This solve works for 10M rows, now data is growing and excel cannot help. Is there a solve in python or another statistical tool/solve to this problem.
    "

    From my experience, storage of your data in a simple CSV format can help your storage solution if the data is upto 10M rows. Beyond this, an ideal solution would be Database - something like SQL, since your data looks structured. You can explore "Structured storage options and Databases" on google and you can get lot of solutions for this. Hope this answers your question.
     
    #18
  19. _42445

    _42445 Member

    Joined:
    Oct 1, 2018
    Messages:
    3
    Likes Received:
    0

    Hi Sumeet,

    Thank you for the response!

    I am also a SQL developer and also site admin. Storage is not my problem, concern is how to refer a feature from current row with the below one before arriving at a response.
     
    #19
  20. Tapaswini Puhan

    Joined:
    Jul 20, 2019
    Messages:
    3
    Likes Received:
    0
    Hi Sumeet,

    I have started working on the Movielens Case Study. I am getting warning in the first line while reading the Movie.dat file.

    /opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
    """Entry point for launching an IPython kernel.

    The output is not coming in the table format.

    This is the code i have used

    import pandas as pd
    df = pd.read_csv("movies.dat",sep="::")
     
    #20
  21. deolgs

    deolgs Member
    Alumni

    Joined:
    Jul 9, 2019
    Messages:
    2
    Likes Received:
    1
    Hi Tapaswini,

    I used the below code to read movies.dat file, maybe it helps you.

    pd.read_csv(r'movies.dat', sep='::', names=['MovieID', 'Title', 'Genres'], engine='python')
     
    #21
    Tapaswini Puhan likes this.
  22. Tapaswini Puhan

    Joined:
    Jul 20, 2019
    Messages:
    3
    Likes Received:
    0
    Thanks! It works
     
    #22

Share This Page