让我们行使Twitter情感说明数据来计较每条推文的字数。我们将行使差异的要领,譬喻dataframe iterrows要领,NumPy数组和apply要领。你可以以后处下载数据集(https://datahack.analyticsvidhya.com/contest/practice-problem-twitter-sentiment-analysis/?utm_source=blog&utm_medium=4-methods-optimize-python-code-data-science)。
- '''
- 优化要领:apply要领
- '''
- # 导入库
- import pandas as pd
- import numpy as np
- import time
- import math
- data = pd.read_csv('train_E6oV3lV.csv')
- # 打印头部信息
- print(data.head())
- # 行使dataframe iterows计较字符数
- print('nnUsing Iterrowsnn')
- start_time = time.time()
- data_1 = data.copy()
- n_words = []
- for i, row in data_1.iterrows():
- n_words.append(len(row['tweet'].split()))
- data_1['n_words'] = n_words
- print(data_1[['id','n_words']].head())
- end_time = time.time()
- print('nTime taken to calculate No. of Words by iterrows :',
- (end_time-start_time),'seconds')
- # 行使Numpy数组计较字符数
- print('nnUsing Numpy Arraysnn')
- start_time = time.time()
- data_2 = data.copy()
- n_words_2 = []
- for row in data_2.values:
- n_words_2.append(len(row[2].split()))
- data_2['n_words'] = n_words_2
- print(data_2[['id','n_words']].head())
- end_time = time.time()
- print('nTime taken to calculate No. of Words by numpy array : ',
- (end_time-start_time),'seconds')
- # 行使apply要领计较字符数
- print('nnUsing Apply Methodnn')
- start_time = time.time()
- data_3 = data.copy()
- data_3['n_words'] = data_3['tweet'].apply(lambda x : len(x.split()))
- print(data_3[['id','n_words']].head())
- end_time = time.time()
- print('nTime taken to calculate No. of Words by Apply Method : ',
- (end_time-start_time),'seconds')
(编辑:湖南网)
【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!
|