用数据说明收集暴力有多可骇

发布时间：2019-04-02 10:04:26 所属栏目：教程来源：小F

导读：这应该是一篇拖得蛮久的文章。故事源于潘长江在某个综艺节目上没认出蔡徐坤，然后潘长江先生的微博评述区就炸锅了。最后搞得双方都多几几何受到收集暴力的影响。直至今天，这条微博的评述区还在更新着。不得不说微博的黑粉，强行带节拍，真的很可骇。

副问题[/!--empirenews.page--]

这应该是一篇拖得蛮久的文章。

用数据说明收集暴力有多可骇

故事源于潘长江在某个综艺节目上没认出蔡徐坤，然后潘长江先生的微博评述区就炸锅了。

最后搞得双方都多几几何受到收集暴力的影响。

直至今天，这条微博的评述区还在更新着。

不得不说微博的黑粉，强行带节拍，真的很可骇。

尚有好比本身一向存眷的好汉同盟。

上周王校长也是被带了一波节拍，源于姿态退役后又复出的一条微博。

原来是一句很平凡的奚落回覆，「离辣个传奇adc的回归，还远吗?[二哈]」。

然后就有人开始带王校长的节拍，直接把王校长给惹毛了。

上面这些工作，对付我这个吃瓜群众，也没什么好说的。

只是但愿往后能没有那么多无聊的人去带节拍，强行给他人带来压力。

本次通过获取潘长江先生那条微博的评述用户信息，来说明一波。

一共是获取了3天的评述，共14万条。

一、前期事变

微博评述信息获取就不细说，之前也讲过了。

这里提一下用户信息获取，同样从移动端动手。

首要是获取用户的昵称、性别、地域、微博数、存眷数、粉丝数。

其它本次的数据存储回收MySQL数据库。

建设数据库。

import pymysql 
 
db = pymysql.connect(host='127.0.0.1', user='root', password='774110919', port=3306) 
cursor = db.cursor() 
cursor.execute("CREATE DATABASE weibo DEFAULT CHARACTER SET utf8mb4") 
db.close()

建设表格以及配置字段信息。

import pymysql 
 
db = pymysql.connect(host='127.0.0.1', user='root', password='774110919', port=3306, db='weibo') 
cursor = db.cursor() 
sql = 'CREATE TABLE IF NOT EXISTS comments (user_id VARCHAR(255) NOT NULL, user_message VARCHAR(255) NOT NULL, weibo_message VARCHAR(255) NOT NULL, comment VARCHAR(255) NOT NULL, praise VARCHAR(255) NOT NULL, date VARCHAR(255) NOT NULL, PRIMARY KEY (comment, date))' 
cursor.execute(sql) 
db.close()

二、数据获取

详细代码如下。

from copyheaders import headers_raw_to_dict 
from bs4 import BeautifulSoup 
import requests 
import pymysql 
import re 
 
headers = b""" 
accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8 
accept-encoding:gzip, deflate, br 
accept-language:zh-CN,zh;q=0.9 
cache-control:max-age=0 
cookie:你的参数 
upgrade-insecure-requests:1 
user-agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 
""" 
 
# 将哀求头字符串转化为字典 
headers = headers_raw_to_dict(headers) 
 
 
def to_mysql(data): 
    """ 
    信息写入mysql 
    """ 
    table = 'comments' 
    keys = ', '.join(data.keys()) 
    values = ', '.join(['%s'] * len(data)) 
    db = pymysql.connect(host='localhost', user='root', password='774110919', port=3306, db='weibo') 
    cursor = db.cursor() 
    sql = 'INSERT INTO {table}({keys}) VALUES ({values})'.format(table=table, keys=keys, values=values) 
    try: 
        if cursor.execute(sql, tuple(data.values())): 
            print("Successful") 
            db.commit() 
    except: 
        print('Failed') 
        db.rollback() 
    db.close() 
 
 
def get_user(user_id): 
    """ 
    获取用户信息 
    """ 
    try: 
        url_user = 'https://weibo.cn' + str(user_id) 
        response_user = requests.get(url=url_user, headers=headers) 
        soup_user = BeautifulSoup(response_user.text, 'html.parser') 
        # 用户信息 
        re_1 = soup_user.find_all(class_='ut') 
        user_message = re_1[0].find(class_='ctt').get_text() 
        # 微博信息 
        re_2 = soup_user.find_all(class_='tip2') 
        weibo_message = re_2[0].get_text() 
        return (user_message, weibo_message) 
    except: 
        return ('未知', '未知') 
 
 
def get_message(): 
    # 第一页有热点评述,拿守信息较贫困,这里偷个懒~ 
    for i in range(2, 20000): 
        data = {} 
        print('第------------' + str(i) + '------------页') 
        # 哀求网址 
        url = 'https://weibo.cn/comment/Hl2O21Xw1?uid=1732460543&rl=0&page=' + str(i) 
        response = requests.get(url=url, headers=headers) 
        html = response.text 
        soup = BeautifulSoup(html, 'html.parser') 
        # 评述信息 
        comments = soup.find_all(class_='ctt') 
        # 点赞数 
        praises = soup.find_all(class_='cc') 
        # 评述时刻 
        date = soup.find_all(class_='ct') 
        # 获取用户名 
        name = re.findall('id="C_.*?href="/.*?">(.*?)</a>', html) 
        # 获取用户ID 
        user_ids = re.findall('id="C_.*?href="(.*?)">(.*?)</a>', html) 
 
        for j in range(len(name)): 
            # 用户ID 
            user_id = user_ids[j][0] 
            (user_message, weibo_message) = get_user(user_id) 
            data['user_id'] = " ".join(user_id.split()) 
            data['user_message'] = " ".join(user_message.split()) 
            data['weibo_message'] = " ".join(weibo_message.split()) 
            data['comment'] = " ".join(comments[j].get_text().split()) 
            data['praise'] = " ".join(praises[j * 2].get_text().split()) 
            data['date'] = " ".join(date[j].get_text().split()) 
            print(data) 
            # 写入数据库中 
            to_mysql(data) 
 
 
if __name__ == '__main__': 
    get_message()

（编辑：湖南网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!

1/13

尾页

教你如何安装ghost xp	深度技术Ghost xp系统
ghost xp sp3电脑公司	8187无线网卡驱动,教您