Python or Java？大数据解读学什么说话最赚钱

发布时间：2018-08-29 11:23:33 所属栏目：教程来源：徐涛

导读：本文首要用Python爬取拉勾网差异编程说话地位信息，包罗：Python岗、Java岗、C++岗、PHP岗、C#岗亭(5岗);用R说话对影响薪资的身分举办说明。因为拉勾网的地位信息只表现30页，一页15个地位信息，假如单独爬取一个都市的岗亭信息，只有几页是匹配的信息，信

副问题[/!--empirenews.page--]

本文首要用Python爬取拉勾网差异编程说话地位信息，包罗：Python岗、Java岗、C++岗、PHP岗、C#岗亭(5岗);用R说话对影响薪资的身分举办说明。因为拉勾网的地位信息只表现30页，一页15个地位信息，假如单独爬取一个都市的岗亭信息，只有几页是匹配的信息，信息量太小，说明没有说服力。因此，本文爬取拉勾网世界地位信息。首要三部门内容：

爬取拉勾网5岗地位信息--以Python岗为例
以Python岗亭信息为例，说明影响薪资的身分
5岗之间薪水身分影响较量说明

一、爬取拉勾网5岗地位信息--以Python岗为例

我们抓取的信息包罗Python岗亭名称、公司名称、薪资、事变履历、学历、公司局限、公司福利。

Python or Java？大数据解读学什么说话最赚钱

##以python岗亭为例，运用selenium+Chrome()爬取岗亭信息 
# coding=UTF-8 
from lxml import etree 
from selenium import webdriver 
import time 
import csv 
 
browser = webdriver.Chrome() 
browser.get('https://www.lagou.com/jobs/list_PYTHON?px=default&city=%E5%85%A8%E5%9B%BD#filterBox') 
browser.implicitly_wait(10) 
 
def get_dates(selector): 
        items = selector.xpath('//*[@id="s_position_list"]/ul/li') 
        for item in items: 
            yield { 
                'Name': item.xpath('div[1]/div[1]/div[1]/a/h3/text()')[0], 
                'Company': item.xpath('div[1]/div[2]/div[1]/a/text()')[0], 
                'Salary': item.xpath('div[1]/div[1]/div[2]/div/span/text()')[0], 
                'Education': item.xpath('div[1]/div[1]/div[2]/div//text()')[3].strip(), 
                'Size': item.xpath('div[1]/div[2]/div[2]/text()')[0].strip(), 
                'Welfare': item.xpath('div[2]/div[2]/text()')[0] 
            } 
def main(): 
    i = 0 
    for i in range(30): 
        selector = etree.HTML(browser.page_source) 
        browser.find_element_by_xpath('//*[@id="order"]/li/div[4]/div[2]').click() 
        time.sleep(5) 
        print('第{}页抓取完毕'.format(i+1)) 
        for item in get_dates(selector): 
            print(item) 
        with open('Py.csv', 'a', newline='') as csvfile:  ##Py.csv是文件的生涯路径，这里默认生涯在事变目次 
            fieldnames = ['Name', 'Company', 'Salary', 'Education', 'Size', 'Welfare'] 
            writer = csv.DictWriter(csvfile, fieldnames=fieldnames) 
            writer.writeheader() 
            for item in get_dates(selector): 
                writer.writerow(item) 
        time.sleep(5) 
    browser.close() 
if __name__=='__main__': 
    main()

（编辑：湖南网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!

1/7

尾页

教你如何安装ghost xp	深度技术Ghost xp系统
ghost xp sp3电脑公司	8187无线网卡驱动,教您