def age_18_to_30(a):
return 18<=a<=30
def level_a(s):
return 85<=s<=100
students=pd.read_excel('C:/temp/students.xlsx',index_col='ID')
students.loc[students['Age'].apply(age_18_to_30)].loc[students['Score'].apply(level_a)]
def age_18_to_30(a):
return 18<=a<=30
def level_a(s):
return 85<=s<=100
students=pd.read_excel('C:/temp/students.xlsx',index_col='ID')
students.loc[students['Age'].apply(age_18_to_30)].loc[students['Score'].apply(level_a)]
任务8:数据筛选,过滤
方法一:通过函数定义筛选条件
筛选数据的条件通常是通过函数的形式来表达的。因此对于目标excel的筛选条件我们要先定义相应的函数
def age_18_to_30(a):
# return a >=18 and a <30 #常规写法
return 18 <= a < 30 # Python 特有的表达式方式
def level_A(s):
return 85 <= s <= 100
学习一下DataFrame里面的.loc[]属性。注意attribut后面跟的是【】,"loc"是“location”的缩写,意思是定义到某个位置,并将其保留下来。
students = students.loc[students['Age'].apply(age_18_to_30)].loc[students['Score'].apply(level_A)]
注:
- students.loc[].loc[]中两个.loc[]连着写目的是实现多条件筛选同时其书写的先后顺序也会影响实际筛选的先后顺序
- 复习上节课Series.apply()的用法,注意()里的函数名不要带()。
方法二:采用lambda表达式(了解内容)
采用lambda表达式可以省略函数定义,式整个代码变得更加简洁
students = students.loc[students['Age'].apply(lambda a: 18 <= a < 30)].loc[
students['Score'].apply(lambda s: 85 <= s <= 100)]
学习DataFrame中列的另外一种表达方式:
表达方式1: students['Age']
表达方式2:students.Age
students = students.loc[students.Age.apply(age_18_to_30)].loc[students.Score.apply(level_A)]
本节代码:
# def age_18_to_30(a):
# return a >=18 and a <30 #常规写法
# return 18 <= a < 30 # Python 特有的表达式方式
# def level_A(s):
# return 85 <= s <= 100
students = pd.read_excel('C:/Temp/Students.xlsx', index_col='ID')
# students = students.loc[students['Age'].apply(age_18_to_30)].loc[students['Score'].apply(level_A)]
students = students.loc[students['Age'].apply(lambda a: 18 <= a < 30)].loc[
students['Score'].apply(lambda s: 85 <= s <= 100)]
# students = students.loc[students.Age.apply(age_18_to_30)].loc[students.Score.apply(level_A)]
print(students)
打印结果
Name Age Score
ID
2 Student_002 26 92
6 Student_006 20 93
9 Student_009 18 85
19 Student_019 19 86
20 Student_020 20 94