ZDNet has an article up on a new process for cleaning spam from file sharing networks via collaborative filtering.
The project aims at the heart of peer-to-peer networks' biggest weakness today. Allowing people to search each other's hard drives has made hundreds of millions of files potentially available at a mouse-click, but search results remain spotty and badly organized, much like the early days of Web search.What would ordinarily be a straightforward computer science question has been complicated by the fact that so many of the files on peer-to-peer networks are songs or videos under copyright. In this case, improving search results could also contribute to making copyright infringement more efficient.
Peer-to-peer networks have been polluted with junk files and spam almost since their inception. It took spammers only a few months to realize that the popular networks presented a new opportunity for unsolicited advertising, and to adapt their technologies accordingly.
I recall seeing that someone wrote a paper showing how social networks can be destroyed (in terms of their usefulness) by adding bad data to the system. Say you have a network of shared music files. There is a threshold after which the system is no longer useful - that threshold is dependent on honest naming of the files. Something as easy as me uploading audio of me belching the alphabet and naming it the same as some new hot song disturbs the system. Doing this on a larger scale can actual defeat the system.
This paper was very good news for the music industry and they actually pay people to purposely do this to muck up the P2P systems which share music files.
Additionally, images, movies, text files, as well as the music, will be added by spammers and they will have names of items which would be popular on the network - but when opened are either just ads, or in worst case scenario - are viruses/trojans/malware.
This new approach hopes to address this problem. It does seem that it still has the flaw of relying on the fact that users will be honest in their rating - so in order to get around it, the spammers will just need to rate their own items as legit and get other spammers to do that as well.
That way it throws off the signal to noise ratio of the system and we are back where we started.
Posted by Eric at March 18, 2005 10:46 AM
| TrackBack