机器学习资源

项目经验

读研期间的项目经验

1. 百度-机器知我心

你拉我推的搜索推荐应用:

2. 百度-电影推荐

根据用户对电影的评分,以及电影的标签信息,预测用户评分。

3. 浙大网新-创新计划

新闻聚合器

将微博短新闻和长新闻进行聚合,该应用通过不断爬取新闻和微博,并对其分类,去重,匹配,最终形成每类新闻的时间线展示

4. 网易-有道难题

追美剧学英语,通过数据挖掘技术帮助用户通过追美剧,看电影,学英语的应用

获取四六级,托福雅思的词汇集合,分析字幕中的每个单词,根据单词的长短,所属集合等信息,为目标用户加以标注

5. Eagle-无障碍检测和改造

网站镜像抽样爬虫

English Introduction

Snapshot Crawler

Recently I have developed a snapshot cralwer, which can mirror a whole website to local(It can fetch all the resources needed, for example webpages, pics, static files), then you can start a static server and visit the downloaded website locally.

We need to check whether the pages are designed well enough for the visully disabled people and the eldly people to visit

It’s hard to do all the check work automaticly(by programs), and we have many items that need people to check, so I have to make the check work as small as possible

The most straight forward resolution also our first solution is to when all the pages downloaded, sample part of them out for check, we call it sampling

But It’s time consuming

hash signature

over ten thousand

TOP