DeepSeek-affiliated Hangzhou DeepSeek AI Fundamental Technology Research Co.,The Intern - A Summer of Lust Ltd. today filed a patent for a new web data collection system designed to improve efficiency and data quality. The patent outlines a method for discovering more webpage links while minimizing website traffic impact. It assesses downloaded content to predict the quality of undiscovered links, prioritizing high-value data and reducing redundant downloads. Efficient web data collection is crucial for training large language models (LLMs), which power AI systems like ChatGPT. Existing techniques struggle with incomplete link retrieval, excessive downloads that can crash websites, and low-quality data filtering. DeepSeek’s proposed system aims to solve these issues by optimizing data allocation and maintaining metadata accuracy. [iThome, in Chinese]
Related Articles
2025-06-26 07:05
2372 views
Today's Hurdle hints and answers for May 5, 2025
If you like playing daily word games like Wordle, then Hurdle is a great game to add to your routine
Read More
2025-06-26 07:00
1954 views
Internet Archive rolls out fact
Even the Wayback Machine is getting into fact-checkingnow.In a blog poston its website, the Internet
Read More
2025-06-26 05:17
926 views
Unhinged Trump was hard at work on Twitter before you'd even had your morning coffee
Another lovely day of waking up, logging onto Twitter, and screaming at the sight of Donald Trump's
Read More