U.S. Navy to Build 350 Billion Record Social Media Archive
The U.S Navy proposes to come up with a global social media archive of a totally unprecedented scale.
The social media archive that the U.S Navy researchers plan to create would comprise 350 billion of digital data records and would be part of the ongoing research efforts conducted through the Department of Defense and Analysis at the Naval Postgraduate School, Monterey, CA.
The research project’s synopsis states, “This project is part of ongoing research efforts conducted through the Department of Defense and Analysis at the Naval Postgraduate School. Our research aims to provide enhanced understanding of fundamental social dynamics, to model the evolution of linguistic communities, and emerging modes of collective expression, over time and across countries.”
The synopsis further explains, “As a central requirement for this research, we seek to acquire a large-scale global historical archive of social media data, providing the full text of all public social media posts, across all countries and languages covered by the social media platform.”
It’s also clarified that the aim is to use the research to advance knowledge through scientific publications and to use the data for pedagogical purposes in the classroom. Students would thus get new opportunities for thesis research and for developing analytical skills as well.
The intention is to go through social media records spanning a period of two-and-a-half years- from 7/1/2014 to 12/31/2016. The data would be collected from a single social media platform and would comprise ” all publically available messages, comments, or posts transmitted on the platform over the specified time period.”
The synopsis also explains that the data “…includes messages from at least 200 million unique users in at least 100 countries, with no single country accounting for more than 30% of users”, and also that the data collected “…must include messages written in at least 60 languages, with at least 50% of the messages written in non-English languages.”
It has been clarified that the collected data must include only “publically available information” and that no private communications or private user data should be included.
Detailing the minimum requirements for the 350 billion records, the synopsis document states, “Each record in the archive must provide the full text of a social media post, unaltered from its original content and formatting, with all publicly available meta-data, including country, language, hashtags, location, handle, timestamp, and URLs, that were associated with the original posting.”
It also states that all records collected should comprise the data and time of sending of each message plus the public handle user that’s associated with it. Similarly, it should also include, for at least 20 percent of the records, the “approximate location information, providing self-reported user hometowns, or other publically available geo-location information”.
It has been reported that this research and the data would be used to study things like communication, the change in patterns of discourse, the evolution of slang etc.
Julia Sowells918 Posts
Julia Sowells has been a technology and security professional. For a decade of experience in technology, she has worked on dozens of large-scale enterprise security projects, and even writing technical articles and has worked as a technical editor for Rural Press Magazine. She now lives and works in New York, where she maintains her own consulting firm with her role as security consultant while continuing to write for Hacker Combat in her limited spare time.