Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

DS-Python | Feb 17 - Mar 24 | Prashant Nair

Riya Pal

Member
Alumni
hi,
how we get the difference in declaring list and tuples:
Suppose if we declare as
l=(1,2,3)
l[0]='new'
it gives error
but if we declare like this
my_l=[1,2,3]
my_l[0]='new'
it allows and change the index number o with new.

Please explain .
 

Madhu K_M

Member
Alumni
hi,
how we get the difference in declaring list and tuples:
Suppose if we declare as
l=(1,2,3)
l[0]='new'
it gives error
but if we declare like this
my_l=[1,2,3]
my_l[0]='new'
it allows and change the index number o with new.

Please explain .
It is tuples property that , it do not allow changing its members once created. but its faster in programs execution time.
where as list members can be modified , so, use tuple or list depending on usage scenario.
for more details see
https://stackoverflow.com/questions/626759/whats-the-difference-between-lists-and-tuples
 

Riya Pal

Member
Alumni

Sujith PCP

Member
Alumni
Just want to know how to input in Python. Some thing like I want to enter the name, age etc in runtime, rather than hard coding in the program.
 

Sudhir Sharma_1

Member
Alumni
I tried this code below
def domainGet(e_mail):
domain=e_mail.split('@')[-1]
print(domain)
return domain


domainGet('user@domain.com')

out put is :
domain.com
'domain.com'

query: why we do not have quotes when I use print (domain) and why we have quotes when use return.
 

Madhu K_M

Member
Alumni
I tried this code below
def domainGet(e_mail):
domain=e_mail.split('@')[-1]
print(domain)
return domain


domainGet('user@domain.com')

out put is :
domain.com
'domain.com'

query: why we do not have quotes when I use print (domain) and why we have quotes when use return.

Here print(string) function is actually implemented in 'c' so python interpreter would pass the content within the quote to underling c language print function.

Whereas, in case of “return” the interpreter directly outputs what is at that memory location , since the strings are stored with the quote, it gets pushed to the shell / terminal window.

Another simple example would be, try

x = ‘hello’
print(x)
x
 

Prashant_Nair

Well-Known Member
Simplilearn Support
Alumni
Trainer
I tried this code below
def domainGet(e_mail):
domain=e_mail.split('@')[-1]
print(domain)
return domain


domainGet('user@domain.com')

out put is :
domain.com
'domain.com'

query: why we do not have quotes when I use print (domain) and why we have quotes when use return.
print returns data in the form of string. However if you call a variable directly, the Output Interpreter of Python defines the constant datatype. In this case since you get the single quote, thus the interpreter is trying to express that its a STRING. Hope that clarifies
 

Prashant_Nair

Well-Known Member
Simplilearn Support
Alumni
Trainer
Just want to know how to input in Python. Some thing like I want to enter the name, age etc in runtime, rather than hard coding in the program.
a = int(input("Enter first number:"))
b = int(input('Enter second number:'))
print("Addition of ",a," and ",b," is ",a+b)

Hope that answers your question ;)
 

Prashant_Nair

Well-Known Member
Simplilearn Support
Alumni
Trainer
hi,
how we get the difference in declaring list and tuples:
Suppose if we declare as
l=(1,2,3)
l[0]='new'
it gives error
but if we declare like this
my_l=[1,2,3]
my_l[0]='new'
it allows and change the index number o with new.

Please explain .
Simple!
()- is a tuple which is immutable. Thus editing not allowed
[]- is a list which is mutable. Thus editing is allowed.

Please refer the PPT for Python Basics !!! Hope that answers your question
 

Sudhir Sharma_1

Member
Alumni
print returns data in the form of string. However if you call a variable directly, the Output Interpreter of Python defines the constant datatype. In this case since you get the single quote, thus the interpreter is trying to express that its a STRING. Hope that clarifies
"thank you"
 

Sudhir Sharma_1

Member
Alumni
I was able to remove the warnings:
/usr/local/lib/python3.4/dist-packages/ipykernel_launcher.py:2: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.

by using engine='python in load call as below
# load data sets
movies_data=pd.read_csv('movies.dat',sep='::',names=['MoviesID','Titles','Generes'],engine='python')
users_data=pd.read_csv('users.dat',sep='::',names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python')
ratings_data=pd.read_csv('ratings.dat',sep='::',names=['UserID','MovieID','Rating','Timestamp'],engine='python')
 

_13345

Member
Alumni
I was able to remove the warnings:
/usr/local/lib/python3.4/dist-packages/ipykernel_launcher.py:2: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.

by using engine='python in load call as below
# load data sets
movies_data=pd.read_csv('movies.dat',sep='::',names=['MoviesID','Titles','Generes'],engine='python')
users_data=pd.read_csv('users.dat',sep='::',names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python')
ratings_data=pd.read_csv('ratings.dat',sep='::',names=['UserID','MovieID','Rating','Timestamp'],engine='python')
Yes, it is working.
 

Miitesh devle

Member
Alumni
Hello,
While creating a model, I am getting an error saying:
ValueError: could not convert string to float: 'd'
Am not able to move further from this step.
 

Madhu K_M

Member
Alumni
had iss
Hello,
While creating a model, I am getting an error saying:
ValueError: could not convert string to float: 'd'
Am not able to move further from this step.

had similar issue with imputer .. now proceding with work around ..

dataset.replace({'?': 'b'}, inplace=True) # replace with mode , till a better imputer is identified.
 

Madhu K_M

Member
Alumni
I have couple of clarification.
1. in a given data set , How to identify which are relevant features , and which features are distraction that can be dropped
2. what are the metrics to measure "best prediction" based on a given dataset . I am getting model.score (test & train) & r2_score as 1.00 for may combination of n .
if everything else being same N=1 should be best since it has minimum workload.

2.1. In the below results , assuming N= 12 to 17 is over-fitting, then how come N=18 on-wards is getting a valid score ( test-score is better than train-score )
2.2 in summary, what are the good set metrics to decide on "what is the best model ?" for various class /type of ML problem
code snippet
============================
model = KNeighborsClassifier(n_neighbors=n)
model.fit(X_train, y_train)
y_predictions = model.predict(X_test)
print( 'N:{} train_score:{:.4f} test_score: {:.4f} is Valid -{}'.\
format(n,round(model.score(X_train,y_train),4),round(model.score(X_test,y_test),4),\
(model.score(X_test,y_test) >= model.score(X_train,y_train)) ))

print("r2_score for N-{} is {:.3f}".format(n,r2_score(y_test,y_predictions)))
=======================

N:1 train_score:1.0000 test_score: 1.0000 is Valid -True
r2_score for N-1 is 1.000
N:2 train_score:1.0000 test_score: 1.0000 is Valid -True
r2_score for N-2 is 1.000
N:3 train_score:1.0000 test_score: 1.0000 is Valid -True
r2_score for N-3 is 1.000
N:4 train_score:1.0000 test_score: 1.0000 is Valid -True
r2_score for N-4 is 1.000
N:5 train_score:1.0000 test_score: 1.0000 is Valid -True
r2_score for N-5 is 1.000
N:6 train_score:1.0000 test_score: 1.0000 is Valid -True
r2_score for N-6 is 1.000
N:7 train_score:1.0000 test_score: 1.0000 is Valid -True
r2_score for N-7 is 1.000
N:8 train_score:1.0000 test_score: 1.0000 is Valid -True
r2_score for N-8 is 1.000
N:9 train_score:1.0000 test_score: 1.0000 is Valid -True
r2_score for N-9 is 1.000
N:10 train_score:1.0000 test_score: 1.0000 is Valid -True
r2_score for N-10 is 1.000
N:11 train_score:1.0000 test_score: 1.0000 is Valid -True
r2_score for N-11 is 1.000
N:12 train_score:0.9998 test_score: 0.9995 is Valid -False
r2_score for N-12 is 0.998
N:13 train_score:0.9998 test_score: 0.9995 is Valid -False
r2_score for N-13 is 0.998
N:14 train_score:0.9997 test_score: 0.9995 is Valid -False
r2_score for N-14 is 0.998
N:15 train_score:0.9997 test_score: 0.9995 is Valid -False
r2_score for N-15 is 0.998
N:16 train_score:0.9995 test_score: 0.9995 is Valid -True
r2_score for N-16 is 0.998
N:17 train_score:0.9997 test_score: 0.9995 is Valid -False
r2_score for N-17 is 0.998
N:18 train_score:0.9995 test_score: 0.9995 is Valid -True
r2_score for N-18 is 0.998
N:19 train_score:0.9994 test_score: 0.9995 is Valid -True
r2_score for N-19 is 0.998
N:20 train_score:0.9995 test_score: 0.9995 is Valid -True
r2_score for N-20 is 0.998
N:21 train_score:0.9995 test_score: 0.9995 is Valid -True
r2_score for N-21 is 0.998

1
 

Casava 52

Member
Alumni
Hi Prashant Sir,
I have a problem so when i try doing my project on nyc311 for class i get this error saying file is too big and cant open, what to do??
This message can appear due to one of the following:
The file contains more than 1,048,576 rows or 16,384 columns. To fix this problem, open the source file in a text editor such as Microsoft Word. Save the source file as several smaller files that conform to this row and column limit, and then open the smaller files in Microsoft Excel. If the source data cannot be opened in a text editor, try importing the data into Microsoft Access, and then exporting subsets of the data from Access to Excel.
The area that you are trying to paste the tab-delineated data into is too small. To fix this problem, select an area in the worksheet large enough to accommodate every delimited item.
Notes
Excel cannot exceed the limit of 1,048,576 rows and 16,384 columns.
By default, Excel places three worksheets in a workbook file. Each worksheet can contain 1,048,576 rows and 16,384 columns of data, and workbooks can contain more than three worksheets if your computer has enough memory to support the additional data.
 

K Manoj

Moderator
Staff member
Simplilearn Support
Do not try to open the data in Excel as the file size is 900MB.
Instead, upload the same zip file in Jupyter lab. Then import the same in pandas data frame.

Code:
with zipfile.ZipFile("Nyc.zip") as z:
    with z.open("311_Service_Requests_from_2010_to_Present.csv") as f:
        nyc311 = pd.read_csv(f)
 
Last edited:

_13345

Member
Alumni
Hi prasanth,

What should be the accuracy value for the movielens model ? For n_neighbours of 1 to 25 odd iterations i got the accuracy value from 22% to 37% for test set . Is the model correct ? What should be the accuracy value ?
 

K Manoj

Moderator
Staff member
Simplilearn Support
Take the first 500 rows (records)

Here is the accuracy value for the same :
Val.PNG
 

_17834

Active Member
Alumni
Hello and Welcome to Data Science (Python) Course!
I am equally excited to deliver this course.
I am having problems with my finalproject and I did not meet the deadline and I don't know how to fix my project. My name is Barbara DiLucchio and I have been in Batch 2 and was supposed to submit my project by May 20 and I did not make that date. I hope my attachment explains my problem so you can read it good enough. My email is bdilucchio@gmail.com I would really appreciate communicating with someone about this until I am able to communicate with someone who could either help me or tell me how to contact someone that could, I will just try things that could possibly get me beyond this error. Thanks very much, Barbara DiLucchio
 
Top