07-31 08:51 阅读 70

Exercise 8: TF/IDF ranking

Exercise 8 - TF/IDF ranking

DIS 2006/2007
Exercise 8: TF/IDF ranking
In this exercise we'll have a look at how the TF/IDF ranking works.
There are 5 different documents in the collection:
D1 = "If it walks like a duck and quacks like a duck, it must be a duck."
D2 = "Beijing Duck is mostly prized for the thin, crispy duck skin with authentic versions of the dish serving mostly the skin."
D3 = "Bugs' ascension to stardom also prompted the Warner animators to recast Daffy Duck as the rabbit's rival, intensely jealous and determined to steal back the spotlight while Bugs remained indifferent to the duck's jealousy, or used it to his advantage. This turned out to be the recipe for the success of the duo."
D4 = "6:25 PM 1/7/2007 blog entry: I found this great recipe for Rabbit Braised in Wine on cookingforengineers.com."
D5 = "Last week Li has shown you how to make the Sechuan duck. Today we'll be making Chinese dumplings (Jiaozi), a popular dish that I had a chance to try last summer in Beijing. There are many recipies for Jiaozi."
Task 1. For the query Q = "Beijing duck recipe", find the two top ranked documents according to the TF/IDF rank. Assume the cosine similarity measure and the culinary term set T = {beijing, dish,duck, rabbit, recipe, roast}. Are the top ranked documents relevant to the query?
Task 2. Assume that the author of the document D5 goes on to tell more about her summer trip to China before doing the cooking and uses the word Beijing 3 times, instead of just once. What happens to the rank of D5? How can this be interpreted in the vector retrieval model (vectors and angles between them)? Is this change in the ranking of D5 a desirable property of TF/IDF? Why?
Solution

Excel sheet with calculations

推荐资源

【海外TikTok+亚马逊实战训练营】带着你赚取海外版抖音的金桶得到 48节思维课正课商业级支付宝小程序入门与实战完整版直播带货运营课程，日销百万直播间“人货场”精细化运营技术基于最新技术与版本打造微信小程序开发必修课微信小程序必备实战演练提升+项目课程剪映入门+实战精通课程16课 2021PR视频剪辑思维实操训练营从小白到剪辑师培训课程2020 高端实战 Python数据分析与机器学习实战 Numpy/Pandas/Matplotlib等常用库精讲视频价值825元牛客算法通关课程视频教程第六期抖店无货源店群精细化运营系列课，帮助0基础新手开启抖店创业之路价值888元