Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'

Stopthatgirl7@lemmy.world · 1 day ago

Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'

taladar@sh.itjust.works · 8 hours ago

Plenty of things are more difficult in decentralized systems.

You have to store all kinds of data either in multiple copies/caches or get long delays on certain operations such as search or even just displaying aggregated data (such as a post and its comments from different instances on Lemmy).

You have the problem of different jurisdictions and moderation policies for different instances.

You will have a hard time exporting or deleting all data related to a specific user when required by law (e.g. GDPR).