Building and Managing Social Media Collections
- 1. Building and Managing
Social Media Collections
Laura Wrubel @liblaura
Jason Casden @cazzerson
Slides: http://j.mp/DLF_Social_Media
DLF Forum
October 27, 2015
- 2. Outline
1. Introductions
2. Tour of social media archives
3. Ethical and legal discussion
4. Questions for cultural heritage organizations
5. Technical tools review
6. Collecting workflows demo
7. Wrap-up
- 3. Introductions
● Have you done any work related to social
media archives?
● What are you hoping to get out of this
workshop?
- 4. Social Media in Collections
• 50% Social media data in collections, but not in
significant amounts
• 39% No social media in their collections
NCSU Social Media Archives Toolkit North Carolina C.H.O. survey
- 5. “I strongly believe in the relevance of this
information because it is the "front lines"
of movement development--this is where
the important ideas and debates are
happening. Traditional academic spaces
are usually behind (it takes 1-3 years for
articles and books to be published) and,
again, they tend to bias in favor of whites,
men, and long-standing leaders. Ignoring
social media means ignoring
marginalized voices and it thus provides
an incomplete picture of the movement.”
NCSU Social Media Archives Toolkit Researcher Survey
- 6. Future value
• 71% of surveyed researchers saw future value
in using social media as a source for research
• Only 51% of surveyed cultural heritage
organizations thought it was likely that their
institution would archive social media in the
future
NCSU Social Media Archives Toolkit NCSU researcher survey
- 11. “Twitter has been a public and open communications platform
since its beginning. Twitter is donating an archive of what it
determines to be public. Private account information and
deleted tweets will not be part of the archive. Linked
information such as pictures and websites is not part of
the archive, and the Library has no plans to collect the
linked sites. There will be at least a six-month window
between the original date of a tweet and its date of availability
for research use.”
The Library and Twitter: An FAQ, April 28, 2010
- 18. GWU Researchers
● Media and public affairs faculty and graduate students
researching how media outlets and journalists use Twitter,
how members of Congress tweet
● International relations graduate students studying how
ISIS tweets
● Freshman writing seminar student analyzing tweets with
hashtag #YesAllWomen and #BringBackOurGirls
● Business graduate students studying social media use by
Korean companies
- 20. Funding
● Institute for Museum and Library Services
○ National Leadership Grants [ODU/Archive-It]
○ Library Services and Technology Act (LSTA)
Grants [NCSU]
○ Sparks Innovation Grants
● NEH / ODH - Digital Humanities Start-Up Grants
[Univ. of Florida]
● National Historical Publications and Records
Commission (NHPRC) [GWU]
● Council on East Asian Libraries (from Mellon) [JHU,
GWU, Georgetown]
- 22. Ethical and legal discussion scenarios
Form a group of 2-3 people who have selected the
same scenario as you.
1. What legal and ethical issues do you see arising in
these scenarios?
2. What are some ways you might address and
manage these issues?
- 23. Scenario #1
A researcher writing a book visits your library to use the
university archives and study student activism related to the
environment. The university archives has collected tweets by
several university-sponsored environmental clubs and has
around 5,000 tweets from eight clubs over two years. The
researcher would like to use the social media collection as
part of her research.
- 24. Scenario #2
A local person of prominence has donated their personal
papers to your library’s archives. They also have exported
their Facebook account data and would like to include that in
their donation. This data includes their posts, messages,
photos, and videos as well as all other information in the
Facebook-supported account download feature.
- 25. Scenario #3
A faculty member is using Twitter as a discussion medium in
her class on public policy. Students are asked to tweet as
part of their class participation. The professor knows that your
library is able to collect tweets and asks if you can help her in
collecting tweets by her students for the purpose of class
evaluation.
- 26. Scenario #4
Your university has a well-regarded political science
department. To support faculty and students exploring the
role of social media in elections, your library has been
proactively collecting tweets by presidential candidates and
tweets using particular hashtags during the presidential
debates. The collection currently contains close to a million
tweets over two years. A faculty member is researching
differences in communication patterns by party and requests
your dataset.
- 28. “If we are to begin actively
archiving and using social
media content, plans need
to be developed as to what
we are saving and who
social media portrays and
how it portrays individuals
and large communities.”
NCSU Social Media Toolkit Researcher Survey
- 32. Role of the institution
● How do we handle consent?
● These items are ephemeral, but not unique, right?
● How do we determine what to collect?
● Are there special preservation considerations?
- 35. Text?
“(・_・ヾ "Study the feasibility of a public space to
house a permanent collection of UNC-Chapel
Hill’s history" http://www.unc.edu/campus-
updates/message-from-chancellor-folt-update-
on-the-task-force-on-unc-chapel-hill-history/”
- @cazzerson
- 39. {
contributors: null,
truncated: false,
text: "We love it when artists like @cyndilauper speak up for our youth!
#EndYouthHomelessness u2013 On C-SPAN http://t.co/Gw17OHyTiO #edchat",
in_reply_to_status_id: null,
id: 524985632775741440,
favorite_count: 7,
source: "<a href='http://twitter.com' rel='nofollow'>Twitter Web Client</a>",
retweeted: false,
coordinates: null,
entities:
{
symbols: [ ],
user_mentions:
[
{
id: 74501824,
indices:
[
29,
41
],
id_str: "74501824",
screen_name: "cyndilauper",
name: "Cyndi Lauper"
}
Twitter
- 40. ],
hashtags:
[
{
indices:
[
66,
87
],
text: "EndYouthHomelessness"
},
{}
],
urls:
[
{
url: "http://t.co/Gw17OHyTiO",
indices:
[
101,
123
],
expanded_url: "http://cs.pn/1FCx6KY",
display_url: "cs.pn/1FCx6KY"
}
]
},
Twitter
- 41. in_reply_to_screen_name: null,
in_reply_to_user_id: null,
retweet_count: 7,
id_str: "524985632775741440",
favorited: false,
geo: null,
in_reply_to_user_id_str: null,
possibly_sensitive: false,
lang: "en",
created_at: "Wed Oct 22 18:08:23 +0000 2014",
in_reply_to_status_id_str: null,
place: null,
user:
{
follow_request_sent: false,
profile_use_background_image: false,
profile_text_color: "333333",
default_profile_image: false,
id: 22789766,
profile_background_image_url_https: "https://pbs.twimg.
com/profile_background_images/70908209/NYLono_MercerCo_LarchmontElem_182
jpg_twitter.jpg",
verified: true,
profile_location: null,
profile_image_url_https: "https://pbs.twimg.
com/profile_images/502152204040425472/eVCt0lz8_normal.jpeg",
profile_sidebar_fill_color: "DDEEF6",
Twitter
- 42. {
"data": {
"type": "image",
"users_in_photo": [{
"user": {
"username": "kevin",
"full_name": "Kevin S",
"id": "3",
"profile_picture": "..."
},
"position": {
"x": 0.315,
"y": 0.9111
}
}],
"filter": "Walden",
"tags": [],
"comments": {
"data": [{
"created_time": "1279332030",
"text": "Love the sign here",
"from": {
"username": "mikeyk",
Instagram
- 43. {
"created_time": "1279341004",
"text": "Chilako taco",
"from": {
"username": "kevin",
"full_name": "Kevin S",
"id": "3",
"profile_picture": "..."
},
"id": "3"
}],
"count": 2
},
"caption": null,
"likes": {
"count": 1,
"data": [{
"username": "mikeyk",
"full_name": "Mikeyk",
"id": "4",
"profile_picture": "..."
}]
},
"link": "http://instagr.am/p/D/",
Instagram
- 45. What is the container?
● Should we mix content from multiple
platforms?
● How do we define container boundaries?
● How do we describe containers?
- 46. What is the collection?
● To what extent are these artificial collections?
● Should these materials be integrated into existing
collections?
- 47. Access policies
● Can we balance privacy and research value?
● Can we provide research access while adhering
to the Terms of Service?
○ “Hydration?”
● How do researchers browse materials?
- 48. Building research datasets
● Dataset stability and decay
○ Snapshots
○ Deletion
■ All Tweets will eventually be deleted
● Reproducibility
● Data sharing
● Research area restrictions
- 50. “Along with email, social media
will probably provide the main
source of information for
researchers studying our current
time. However, our institution just
does not have the resources
right now to collect and store the
social media of other people or
organizations.”
NCSU Social Media Archives Toolkit C.H.O. survey
- 51. What are your goals?
● create archival / special collections
● support current faculty research
● support students with class projects
- 52. What data do you need?
● current and going forward; recent or far past
● metadata
● images and other media referenced
● comments, responses, conversation
- 53. What do you want to do?
● analyze, visualize
● archive, locally accession
● play back
● hydrate
- 56. Some of the many options
Commercial
● Gnip
● Texifter
● Crimson Hexagon
● Sysomos
● Archive-It
● ArchiveSocial
● Radian6, Sprout Social,
HootSuite
Free / open source
● TAGS
● NodeXL
● IFTTT
● R (twitteR)
● twarc
● Social Feed Manager
[Twitter, Tumblr*, Flickr*]
● lentil [Instagram]
● youtube-dl [YouTube]
● MassMine* [Twitter, Tumblr]
*pre-release
- 68. Bibliography
“National Archives and Records Administration White Paper on Best Practices for the Capture of Social
Media Records,” May 2013. http://www.archives.gov/records-mgmt/resources/socialmediacapture.pdf.
Beckles, Julian, Samuel Collins, Glenn Daniels, Natalie Demyan, Matthew Durington, Cara Heasley, and
David Rico. “Tagging Culture: Building a Public Anthropology through Social Media.” Human Organization
72, no. 4 (December 1, 2013): 358–68.
boyd, danah, and Kate Crawford. “Critical Questions for Big Data.” Information, Communication & Society
15, no. 5 (June 1, 2012): 662–79. doi:10.1080/1369118X.2012.678878.
Bruns, Axel, and Tim Highfield. “POLITICAL NETWORKS ON TWITTER: Tweeting the Queensland State
Election.” Information, Communication & Society 16, no. 5 (June 2013): 667–91. doi:10.1080/1369118X.
2013.782328.
Casden, Jason and Brian Dietz (co-PI). Social Media Archives Toolkit. http://www.lib.ncsu.edu/social-
media-archives-toolkit
Cohen, Dan. “Digital Ephemera and the Calculus of Importance.” Dan Cohen, May 17, 2010. http://www.
dancohen.org/2010/05/17/digital-ephemera-and-the-calculus-of-importance/.
- 69. Collins, Samuel, Matthew Durington, Glenn Daniels, Natalie Demyan, David Rico, Julian Beckles, and
Cara Heasley. “Tagging Culture: Building a Public Anthropology through Social Media.” Human
Organization 72, no. 4 (November 13, 2013): 358–68. doi:10.17730/humo.72.4.v5x0205248427516.
Dash, Anil. “What Is Public? — The Message.” Medium. Accessed August 12, 2014. https://medium.
com/message/what-is-public-f33b16d780f9.
Dixon, Kitsy. “Feminist Online Identity: Analyzing the Presence of Hashtag Feminism.” Journal of Arts &
Humanities 3, no. 7 (2014): 34–40.
“Ethical Decision-Making and Internet Research Recommendations from the AoIR Ethics Working
Committee (Version 2.0),” 2012. http://aoir.org/reports/ethics2.pdf
Halegoua, Germaine R., and Raz Schwartz. “The Spatial Self: Location-Based Identity Performance on
Social Media.” New Media & Society, April 9, 2014, 1–18. doi:10.1177/1461444814531364.
Jules, Bergis. “Documenting the Now: #Ferguson in the Archives — On Archivy.” Medium, April 8, 2015.
https://medium.com/on-archivy/documenting-the-now-ferguson-in-the-archives-adcdbe1d5788.
Lomborg, Stine. “Personal Internet Archives and Ethics.” Research Ethics 9, no. 20 (2013). doi:10.1177
/1747016112459450.
Marshall, Catherine C. “Rethinking Personal Digital Archiving, Part 1: Four Challenges from the Field.” D-
Lib Magazine, April 2008. http://www.dlib.org/dlib/march08/marshall/03marshall-pt1.html#Top.
- 70. Nathan, Lisa P., and Elizabeth Shaffer. “Preserving Social Media: Opening a Multi-Disciplinary Dialogue.”
UNESCO, n.d. http://www.unesco.
org/new/fileadmin/MULTIMEDIA/HQ/CI/CI/pdf/mow/VC_Nathan_Shaffer_27_B_1140.pdf.
Rivero, Enrique. “Twitter ‘Big Data’ Can Be Used to Monitor HIV and Drug-Related Behavior, UCLA Study
Shows.” UCLA Newsroom, February 26, 2014. http://newsroom.ucla.edu/portal/ucla/twitter-big-data-can-
be-used-to-250162.aspx.
Storrar, Tom. “Archiving Social Media.” The National Archives, May 8, 2014. http://blog.nationalarchives.
gov.uk/blog/archiving-social-media/.
Summers, Ed. “An Invitation to Study Ferguson — On Archivy.” Medium, December 3, 2014. https:
//medium.com/on-archivy/an-invitation-to-study-ferguson-367b423cff29.
Tufekci, Zeynep. “Big Questions for Social Media Big Data: Representativeness, Validity and Other
Methodological Pitfalls.” arXiv:1403.7400 [physics], March 28, 2014. http://arxiv.org/abs/1403.7400.
Zimmer, Michael, and Nicholas John Proferes. “A Topology of Twitter Research: Disciplines, Methods,
and Ethics.” Aslib Journal of Information Management 66, no. 3 (2014): 250–61.
Zimmer, Michael. “The Twitter Archive at the Library of Congress: Challenges for Information Practice and
Information Policy.” First Monday 20, no. 7 (June 21, 2015). http://firstmonday.org/ojs/index.
php/fm/article/view/5619.
- 71. Social Feed Manager is
supported by the National
Historical Publications &
Records Commission
Grant NAR14-DI-50017-14
(2014-2017)