On the current stage, machine learning (focus purely on the context of the words) for dialogue generation has been widely studied. Indeed, existing studies of Prendinger et al. show that addressing emotion in dialogue system can enhance user satisfaction, contribute to a more positive perception of the interaction, and lead to fewer breakdowns in dialogues. Therefore, to improve conversational agent performance, our goal is to utilize and improve Emotional Chatting Machine(ECM) that can generate appropriate responses not only in content but also in emotion.
We used BiLSTM model to create sentimental classifier to label emotion categories for data. Then use a variant of seq2seq model that incorporates emotion embedding, internal memory and external memory to generate response Y =(y_1,y_2,…,y_m) that is coherent with the emotion category e Given any post X = (x_1,x_2,…,x_n) .
It is a coursework we did following full data science pipeline:
Business Understanding: forecast restaurant inspection outcomes in order to offer insight to NYC health department, i.e. whether they should give priority to inspect certain restaurant.
Data Preparation: Crawled, selected and feature engineered geological, time and syntax data from various sources
Modeling and Evaluation: Explored multiple models and delivered a report listed the outstanding model results including Random Forest, LightGBM and Ensemble method. The best AUC score we reached was 0.74 which is around the same level to the highest scores we seen on the web
利用Python Theano/Lasagne库实现的基于神经网络的图像识别项目
Applying Machinge Learning Techniques to Bird Species Classification
最佳结果出自sx3_ffc_b32.py, 因为它不但在时间花费上比较短(CPU和GPU两种情况都考虑时)且最终达到90%以上的准确率
Layer Structure | Specifics |
---|---|
Input | 3x128x128 |
conv3-32 | Pad=1 |
pool2 | Stride=2 |
conv3-64 | Pad=1 |
pool2 | Stride=2 |
conv3-128 | Pad=1 |
pool2 | Stride=2 |
FC:512 | Dropout 50% |
FC:512 | Dropout 50% |
Softmax | 9-way |
After running with stratified random data splits for ~100 runs, mean validation accuracy was found to be 92.9%.
在领英广告组负责管理广告数据库并支持公司内部和第三方对广告的使用,使用Couchbase, ElasticSearch, Restli filters, Kibana等技术建立可视化日志文件,支持跨数据中心的加密存储、搜索、可视化和分析,帮助提升组内员工的效率
Cooperated with multiple teams to design an internal logging system that supported securely logging, indexing, searching, storing, visualizing and analyzing of billions of confidential log records stored across multiple data centers
(Due to company policy, the Github page is not available)
First Place winner of 2019 Enigma Datathon.
Performed NYC Subway Performance Rating Factor Analysis using multiple large-scale datasets provided by Enigma and build R-Shiny App to visualize the end results.
Identified the relationship between subway performance with evevator outage frequency, customer feedback data, permanent art catalogue and etc.
利用Algolia提供的搜索云服务,导入湾区餐厅数据,通过实时系统建立了索引,完成高时效性的立即查询
Use Algolia’s search as a service to build a small prototype that able to provide instantaneous, multi-platform and type-tolerant search on local restaurants.
这是一个24小时的hackthon项目,智能礼物推荐网站。它是基于node.js + express + bootstrap framework实现的。由用户提交送礼对象的Twitter页面或Linkedin页面或Pinterest页面,然后页面后台进行文字抓取,然后根据文字频率找出相关的amazon礼物页面
Won Linkedin Intern Hackday Top Finalists with an online gift recommendation service that offer gift suggestions from Amazon based on people social media information