# Data Science with Python | Samridhi

Discussion in 'Big Data and Analytics' started by Nishant_Singh, Jun 3, 2019.

1. ### Lokesh Gowda S New Member

Joined:
Jul 12, 2019
Messages:
1
0
iam completed movie lens project till
3) determine the feature effecting the particular rating as showed in the below screenshot
i was been strucked in no 3) question as below screenshort plz kindly help me out of this wt the nxt step to do and also how to find the 4) develop the appropriate model to predict the movie rating, help me by the coding formates

#51
Last edited: Aug 28, 2019
2. ### ASHIK S R Member

Joined:
Jul 26, 2019
Messages:
5
0
Aug 24 - Sep 28 Batch - Assignment 1
Correct me to arrange this properly.

i = 4
while i > 0:
print("#")
j = i
while j > 0:
print("1")
j -= 1
print("\n")
i -= 1
print(""*i)

o/p:

#
1
1
1
1

#
1
1
1

#
1
1

#
1

#52
3. ### Harshit Sharma_2 Member

Joined:
Aug 10, 2019
Messages:
2
0
Hello Mam,

Can you please assist me how this is working a[-1:-2:-1, :]
-----------------------a----
array([[ 0. , 1.11111111, 2.22222222],
[ 3.33333333, 4.44444444, 5.55555556],
[10. , 7.77777778, 8.88888889]])

#53
4. ### Samridhi Dutta Well-Known Member AlumniTrainer

Joined:
Aug 16, 2017
Messages:
224
22
Hi,

Since this is a multi-class classification problem, you need to use chisquare test of independence. All the files are updated in the google drive. Please open the statistics folder in the google drive to look for the chisquare method of feature selection.

Regards,
Samridhi

#54
5. ### Deepak Shanthaiah Member

Joined:
Jul 25, 2019
Messages:
2
0

Hi Samridhi,

My question is I am not able to understand features affecting means what features?
Also really struck and confused to solve these 2 questions so please assist me

#55
6. ### Harshit Sharma_2 Member

Joined:
Aug 10, 2019
Messages:
2
0
how to slice row ? when loc and isin is also used
brics.loc[1:3,brics.columns.isin(['capital','area'])]

#56
7. ### Pranaya Kumar Panda Member

Joined:
Aug 6, 2019
Messages:
2
0
Hi Samridhi,

users.dat has following data format

1::F::1::10::48067
2::M::56::16::70072
3::M::25::15::55117
4::M::45::7::02460
5::M::25::20::55455
6::F::50::9::55117

From the above data set I can understand the first column is User_id, next column is gender, next column is age. Last two columns I am not able to understand. Can you explain other fields?

Similarly ratings.dat has following format

1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
1::2355::5::978824291

From the above data set I can't understand any of the field. Can you explain the fields?

Regards,
Pranaya

#57
8. ### PRASHANT NAMDEO SHELARE Member

Joined:
Apr 5, 2019
Messages:
2
0
Hello Ma'am,

I was doing Movielens project and I was not getting appropriate result after doing
pd.concat()....The output is showing
(19266, 28)
Actual I had took 10000 records for analysis i.e (10000,10) and one hot encoding on unique_genres which also of
I am sending my full code in pdf format... Download it and rename the extension as .ipynb

File size:
391.7 KB
Views:
31
#58
9. ### Guru mahesh Member

Joined:
Feb 18, 2019
Messages:
12
0
hello

I am having some doubts relate to Building user-based recommendation model for Amazon project.

There are so many NaN values presented in each Movie columns, how to i replace Nan value

i just followed this way :

Reduced_df=AMTV_Ratind_df.loc[AMTV_Ratind_df.columns.notnull(),AMTV_Ratind_df.columns]
for i in range(0,206):
Reduced_df.loc[Reduced_df[Reduced_df.columns].isnull(),Reduced_df.columns]=0

But , if i replace Nan values with 0, MEan and Median are changes . how can i solve this

#59
10. ### Samridhi Dutta Well-Known Member AlumniTrainer

Joined:
Aug 16, 2017
Messages:
224
22
Hi Prashant,

pd.concat is not working here, because indices of the 2 dataframes are not same. So, you need to reset the index column and put common indices in both the dataframes, and then do the column-wise concatenation.

Regards,
Samridhi

#60
11. ### Samridhi Dutta Well-Known Member AlumniTrainer

Joined:
Aug 16, 2017
Messages:
224
22
Hi Guru,

For this, don't consider the movies rated NaN, while computing the average rating.

Regards,
Samridhi

#61
12. ### Shyamanth Member

Joined:
Oct 7, 2019
Messages:
4
0
Hi Samridhi,

Can you please let me know the following?

a)How to Set up working directory? Ie., I downloaded Anaconda and installed. Launched Jupyter Notebook from Anaconda Navigator. How to setup working directory?

b) What does Files, Running, Clusters mean inside Jupyter Notebook? How to create a new directory and set it as default working directory?

c) What are Terminals and Notebooks? How to open Terminals and Notebooks? How many terminals and notebooks can be opened at a time?

d) How many Python windows or instances cane be opened at a time? What is the technical name of a Python Window?

Thanks
Shyam

#62
13. ### Shyamanth Member

Joined:
Oct 7, 2019
Messages:
4
0
Hi Samridhi,

Webex is not allowing to join today (11th Oct) class. Later it showed session is not started. Is there a class today? I have sent an email to support team to provide access. Can you please share today's class recordings in google drive?

Thanks
Shyam

#63
14. ### Soumyabrata Roy Member

Joined:
Aug 21, 2019
Messages:
8
0
Hi Samridhi
Couple of questions on the movielens project.
1. User Age Distribution
a: Once we get the age distribution with the value_count method, how do we plot that the same in a histogram/bar graph?
b: How do I convert a value.count method into a pandas dataframe?

Features affecting the ratings, is this an interpretation from the Data or do we need to code something here?

Model building, what needs to be done here?

Any particular cheatsheet we have on brackets? Like which brackets to use and when?It's one of the part which I find quite confusing.

#64
Last edited: Oct 21, 2019
15. ### Aakarsh Nair New Member

Joined:
Oct 4, 2019
Messages:
1
0
Hello ma'am
How can I access internet related problems/network issues from the column 'Customer Complaint'
which has a string of words and please tell me how can i access records based monthly from column 'Date_month_year' and daily from date column

File size:
53.4 KB
Views:
7
File size:
108 KB
Views:
6
#65
16. ### Samridhi Dutta Well-Known Member AlumniTrainer

Joined:
Aug 16, 2017
Messages:
224
22
1. User Age Distribution
a: Once we get the age distribution with the value_count method, how do we plot that the same in a histogram/bar graph?
- use matplotlib library as discussed in data visualization class
b: How do I convert a value.count method into a pandas dataframe?
- use pd.DataFrame function

Features affecting the ratings, is this an interpretation from the Data or do we need to code something here?
- we need to use feature selection techniques like anova / chisq / lin re / log re

Model building, what needs to be done here?
Create prediction models like lin re / log re /knn

Any particular cheatsheet we have on brackets? Like which brackets to use and when?It's one of the part which I find quite confusing.[/QUOTE]

#66
17. ### Samridhi Dutta Well-Known Member AlumniTrainer

Joined:
Aug 16, 2017
Messages:
224
22
Regarding brackets:
[] - represents lists
()- tuples
{} - dictionaries / sets

[] - also used for subsetting / slicing data structures
() - also used to enclose function arguments

[/QUOTE]

#67
18. ### Samridhi Dutta Well-Known Member AlumniTrainer

Joined:
Aug 16, 2017
Messages:
224
22
Hi Aakarsh,

Please use regular expressions to search for the search string. Please refer to pandas df session. There we have discussed how to extract some search string using regular expressions.

#68
19. ### Soumyabrata Roy Member

Joined:
Aug 21, 2019
Messages:
8
0
Hi Samridhi
I have submitted the project. I have zipped the ipynb file and attached it in the submit section. Is there anything else I need to do? Please let me know. I have attached the here as well for reference.

File size:
508.3 KB
Views:
47
#69
20. ### Shyamanth Member

Joined:
Oct 7, 2019
Messages:
4
0
In Movie lens projects, for feature engineering for 'Find out all the unique genres (Hint: split the data in column genre making a
list and then process the data to find out only the unique categories of genres)' I'm scripting as below and get error.
Can you please let me know what I am doing wrong?

MovieLensData.Genres = MovieLensData.Genres.str.split("|") # where MovieLensData is my master data

Error
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-119-c3c47e3a006b> in <module>
----> 1 MovieLensData.Genres = MovieLensData.Genres.str.split("|")

~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5061 if (name in self._internal_names_set or name in self._metadata or
5062 name in self._accessors):
-> 5063 return object.__getattribute__(self, name)
5064 else:
5065 if self._info_axis._can_hold_identifiers_and_holds_name(name):

~\Anaconda3\lib\site-packages\pandas\core\accessor.py in __get__(self, obj, cls)
169 # we're accessing the attribute of the class, i.e., Dataset.geo
170 return self._accessor
--> 171 accessor_obj = self._accessor(obj)
172 # Replace the property with the accessor object. Inspired by:
173 # http://www.pydanny.com/cached-property.html

~\Anaconda3\lib\site-packages\pandas\core\strings.py in __init__(self, data)
1794
1795 def __init__(self, data):
-> 1796 self._validate(data)
1797 self._is_categorical = is_categorical_dtype(data)
1798

~\Anaconda3\lib\site-packages\pandas\core\strings.py in _validate(data)
1816 # (instead of test for object dtype), but that isn't practical for
1817 # performance reasons until we have a str dtype (GH 9343)
-> 1818 raise AttributeError("Can only use .str accessor with string "
1819 "values, which use np.object_ dtype in "
1820 "pandas")

AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

#70
21. ### Venkatarao Kuna New Member

Joined:
Apr 25, 2019
Messages:
1
0
1. As requested by trainer Samridhi to post problem discription to print series 1, 11, 21, 1211, 111221,312211, ...........
1 -------------Take 1 as input [now how many 1s we have count from left -> one one -> so print 11]
11 -----------Take 11 as input now [ now how many 1s we have count from left -> two ones-> so print 21]
21 -----------Take 21 as input now [now how many 2s in 1st place from left -> one two and how many ones after it -> one one -> so print 1211]
1211 ------- Take 1211 as input now and repeat as above

PFA Java code for above is uploaded as file

File size:
750 bytes
Views:
6
#71

Joined:
Sep 25, 2019
Messages:
3
0
#72
23. ### Anubrata Das Member

Joined:
Sep 25, 2019
Messages:
3
0
2nd Nov-7th Dec batch
this is the code for printing 1s-
for idx in range(4):
n=4
while n>=0:
lst=[]
for num in range(n):
lst.append(str(1))
print("".join(lst).rjust(4))
n=n-1

#73
24. ### Samridhi Dutta Well-Known Member AlumniTrainer

Joined:
Aug 16, 2017
Messages:
224
22
T
Thanks Venkatarao!

#74
25. ### ASHIK S R Member

Joined:
Jul 26, 2019
Messages:
5
0
Hi Samridhi,

I have got an accuracy of about 1.8214 using MSE while I tried to get predictions using Linear Regression for the assignment 'Evaluate the Ad Budget Dataset of XYZ Firm'.
Is this accuracy figure is in an acceptable range?

#75
26. ### Anubrata Das Member

Joined:
Sep 25, 2019
Messages:
3
0
for the movielens project, since there are so many categorical variables, should we not model based on logistic regression ?

#76
27. ### Samridhi Dutta Well-Known Member AlumniTrainer

Joined:
Aug 16, 2017
Messages:
224
22
Hi Ashik,

Please calculate the R2 value, and check accuracy in percentage.

Regards,
Samridhi

#77
28. ### Samridhi Dutta Well-Known Member AlumniTrainer

Joined:
Aug 16, 2017
Messages:
224
22
Yes you can use logistic regression or knn.

Regards,
Samridhi

#78
29. ### _6781 Member Alumni

Joined:
Apr 26, 2017
Messages:
9
0
Hi, I m pursuing a machine learning course and want to know what are the machine learning applications that are helpful in market research or data mining.
Please, anyone, help me out for this.

#79
30. ### Y Nandigam New Member

Joined:
Sep 30, 2019
Messages:
1
0
Hi,

For the titanic data set, I have assigned as df.sex=male by mistake , in that case how could i correct it so that i can have the original data itself

#80

Joined:
Sep 19, 2019
Messages:
2
0
#81

Joined:
Sep 19, 2019
Messages:
2
0

File size:
86.4 KB
Views:
5
#82
33. ### Akhil_93 Customer Customer

Joined:
May 9, 2020
Messages:
3
0
Statistics for data science
Hi Akhil, Greetings from Simplilearn, We have gone through your course and would like to inform you that the Statistics for Data Science course is not included in your subscription. Kindly contact the SPOC in your organization for Simplilearn courses. Hope this helps. Please reach out to us for any assistance. We are here to help. -- Regards, Jibin Dev Team Simplilearn www.simplilearn.com US: +1 844 532 7688 | India: 1800-212-7688 ref:_00D28sMrr._5002x3aoOH:ref

#83
34. ### Vaibhav Bhardwaj Member

Joined:
Mar 18, 2020
Messages:
4
0

Can someone please help me understand why the count of True is coming out to be 3?

#84
35. ### Vaibhav Bhardwaj Member

Joined:
Mar 18, 2020
Messages:
4
0
Assignment #1
1 1 1 1 1
_ 1 1 1 1
_ _ 1 1 1
_ _ _ 1 1
_ _ _ _ 1

CODE:
number = int(input('Enter the number of rows:'))

for i in range(number):
for j in range(number):
if j >= i:
print(1, end = '')
else:
print(' ', end = '')
print('\n')

#85
36. ### _77573 New Member

Joined:
Apr 30, 2020
Messages:
1
0
Unable to see any live classes today 5/16/2020

#86
37. ### Akhil_93 Customer Customer

Joined:
May 9, 2020
Messages:
3
0
Not
Ticket number: 00570870

#87

Joined:
May 15, 2020
Messages:
8
0
Hi Samridhi Ma'am,

Kindly note that, I'm not able to receive any file which you shared with G-Drive.

Regards,

#88

Joined:
May 15, 2020
Messages:
8
0
Hi Samridhi ma'am,

The path " %config IPCompleter.greedy = True " for numpy arrays in python is not working.

Thank You.

#89

Joined:
Aug 16, 2017
Messages:
224
22

Regards,
Samridhi

#90
41. ### Vaibhav Bhardwaj Member

Joined:
Mar 18, 2020
Messages:
4
0

in line 40, we gave the input as a raw string while in line 4 we didn't. Both are producing the same result. Can someone please let me know the difference between the two?

#91

Joined:
May 15, 2020
Messages:
8
0
Hi Samridhi Ma'am,

Unable to open lab. The screen shows error
Debug error: Expired timestamp, yours 1590155715, ours 1590154975

#92
43. ### Samridhi Dutta Well-Known Member AlumniTrainer

Joined:
Aug 16, 2017
Messages:
224
22
Hi,

You are expected to download the contents of the google drive within the same week / month. After that we delete, in order to free up space. Anyway, you can access the contents from my other batches in the following location:

Regards,
Samridhi

#93
44. ### _76794 Member

Joined:
Apr 24, 2020
Messages:
2
0

for clarification 1:
for x in range(len(test)):
print(type(test[x]))

#94
45. ### _76794 Member

Joined:
Apr 24, 2020
Messages:
2
0
for

for clarifications 1 & 3:

for x in range(len(test)):
print(type(test[x]))
print("lenght of " + str(type(test[x]))+ " is "+str(len(test[x])))

#95

Joined:
May 15, 2020
Messages:
8
0
Hello Ma'am,

And also the following statements are not working :
import pandas as pd
import numpy as np

Regards,

#96
47. ### _51111 New Member

Joined:
Dec 7, 2018
Messages:
1
0
Hi Samridhi i have few questions regarding movilens project.
1. I want to do a pivot table to understand age distribution. I can use User ID as values for aggregate function. But i am not geting correct code for it. When using group by the following codes are working
Master_Data_2.groupby('Age Group')[['UserID']].nunique()
Master_Data_2.groupby('Gender')[['UserID']].nunique()
Master_Data_2.groupby('Occupation')[['UserID']].nunique()

where as using pivot table nunique is throwing an error
2. In the session I asked how to split Movie Name column into two columns: Movie and Year. I tried it using str.extract and regular expression.The output only solved half of the problem. As you can see in figure that year column is as expected but the movie column is unsatisfactory.

#97

Joined:
May 15, 2020
Messages:
8
0
Hello Ma'am,
I m doing Walmart Project. In last question how to build linear regression model And also want to know which linear regression model is fitted ?

Regards,

#98
49. ### Darsh Chetan Thakker New Member Alumni

Joined:
Nov 22, 2019
Messages:
1
0
hii,
mam how to find percentage of complaints resolved till date, which were received through the Internet and customer care calls in comcast project ?

#99
Last edited: May 29, 2020 at 9:29 AM
50. ### Gaurav kumar_51 Member

Joined:
Mar 21, 2020
Messages:
7
0
Hi Samridhi,

while working on walmart project,

Linear Regression – Utilize variables like date and restructure dates as 1 for 5 Feb 2010 (starting from the earliest date in order).

Y-Intercept:
139639575.33440715

Root Mean Square Error(rmsd)
552459.8953123686

R^2 Value:
0.03433024544837493

x=walmart.drop(['Weekly_Sales','Store','Date','Semester'],axis=1)

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state= 0)

please help me with that R square value is not close to 1, which is not good fit for model.

#100