SlideShare a Scribd company logo
Understanding the Big Data
Enterprise
Philip E. Bourne, PhD, FACMI
Associate Director for Data Science
https://datascience.nih.gov/
philip.bourne@nih.gov
My Bias
• University professor - 30+ years
• Associate Vice Chancellor for Innovation – 2
years
• Maintainer of public data resources (PDB etc.
– 15 years)
• Open science advocate – 10+ years
• Fed – 2 years and counting
None of what I am about to tell you
negates what you have heard thus far
today…
Much of what you have heard is
prerequisite to my 30,000 foot view
My Definition of Big Data
• More than the 4+ “V’s”
• A signal of the coming digital economy
• An economy characterized by using data to
gain a business advantage (and yes
universities are a business)
What is the Worse that Can Happen?
Digitization
Deception
Disruption
Demonetization
Dematerialization
Democratization
Time
Volume,Velocity,Variety
Digital camera invented by
Kodak but shelved
Megapixels & quality improve slowly;
Kodak slow to react
Film market collapses;
Kodak goes bankrupt
Phones replace
cameras
Instagram,
Flickr become the
value proposition
Digital media becomes bona fide
form of communication
[Steven Kotler]
http://bigthink.com/think-tank/steven-kotlers-six-ds-of-exponential-entrepreneurship
Enterprises that are not born digital
are at a disadvantage in this new
economy…
Fortunately no university has yet to be
born digital …
The “Google university” could change
that
The Writing is on the Wall
(Personal Experiences)
• The story of Meredith
• Increasing number of undergraduates as first
authors on my papers
• Talking head lectures
• Growing frustration at lack of entrepreneurial
support
• The Google bus
The Writing is on the Wall
(Institutional)
• Changing access models
• Changing funding models
– Less federal and state funds
– More sponsored research
– Increased tuition
– More reliance on philanthropy
• Changing pedagogy
– MOOCs, SPOCs, DOCCs, flips
• Changing student expectations
– Expect to be taught in a different way
• Changing faculty expectations
– Expect more from the institution
• Changing staff expectations
– Better recognition
• Changing employer expectations
http://collegeparents.org/2011/01/26/when-your-college-student-unhappy/
Yet demand for a quality higher education has never been higher
Leads to the Notion of the University
as a Digital Enterprise
• The university is defined by its digital assets:
– On-line course materials
– All of the research life cycle on-line: grants, data,
computational methods, results, conclusions,
publications
– Faculty, staff and student profiles on-line
– All administrative data on-line e.g. grants, policies
and procedures, disclosures, contracts, patents,
agreements, payroll, academic files
The Most Successful Universities of the
Future Will be Those That Can Best
Leverage Their Digital Assets – How?
“Life Wasn’t Meant to be Easy”
Malcolm Fraser
Former Prime Minister of Australia
How? - Break Down the Silos
Research
Basic Clinical
Education Administration
How? - An Appropriate Organizational
Structure
Chancellor
CIO /CDO
Research
Services
Education
Services
Admin
Services
Medical
Services
Library
Use Cases from the University as a
Digital Enterprise
Research Data
• Prof x drags and drops her research data to the
institutional dropbox. She is asked for a small
amount of metadata describing the dataset. Part
of that request gives permission for the data to
be indexed and the index analyzed by the
University. That analysis reveals that two other
researchers have worked on the same gene in the
past two months and they are all alerted as to
their common interest and begin collaborating.
.
Faculty Productivity
• From a single profile a faculty member can, at
the push of a button, generate a world-facing
current web presence, provide biosketches to
the major funding agencies and submit their
academic file for review saving countless
hours of reformatting which now goes into
productive research.
The Education – Research Interface
• The UCSD on-line drug commercialization
course which previously had 40 local students
now has 12,000 several of whom apply to Dr.
Bourne’s lab as PhD students based on the
material he presented. The course also
highlights UCSD’s leadership role and by
navigating the on-line curriculum several
students apply to UCSD as undergraduates.
One high school student applies to Dr.
Bourne’s lab as a summer intern.
The Research-Administration Interface
• Researcher x receives a new grant, researchers y
and z are notified since it is very close to areas in
which they work and points of collaboration may
be possible.
• Researcher x needs to have an assay performed
and can immediately locate who on campus and
off-campus can perform the work and at what
cost.
• Experts on and off campus can immediately be
identified for the review of a potential patent
filing based on a researcher’s technology.
Talk is cheap – What is NIH doing to
address a similar situation?
NIH By Comparison
• 27 silos
• Clinical and basic research
• Intramural + extramural
• Administration
• Education role different
https://en.wikipedia.org/wiki/Victory_Soya_Mills_Silos
Established a Commons
• Supports a digital biomedical ecosystem
• Treats products of research – data, software, methods, papers
etc. as digital research objects
• Digital research objects exist in a shared virtual space
• Digital objects need to conform to FAIR principles:
– Findable
– Accessible (and usable)
– Interoperable
– Reusable
Commons Framework Pilots (CFPs)
• Exploring feasibility of the Commons framework
• Facilitating connectivity, interoperability and
access to digital objects
• Providing digital research objects to populate the
Commons
• Enable biomedical science to happen more easily
and robustly
BD2K Centers, MODS
and HMP
Compute Platform: Cloud or HPC
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
Mapping Commons PILOTS to the
Commons Framework
PaaS
SaaS
BD2K Indexing
BioCADDIE,
Other, schema.org
IaaS
[Vivien Bonazzi]
Compute Platform: Cloud or HPC
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
Mapping Commons PILOTS to the
Commons Framework
PaaS
SaaS
Cloud credits model
(CCM)
IaaS
Commons Credits Model
The Commons
(infrastructure)
Cloud Provider
A
Cloud Provider
B
Cloud Provider
C
Provides credits Enables Search
Uses credits in
the Commons
IndexesOption:
Direct Funding
NIH
Investigator
bioCADDIE
[George Komatsoulis]
Culture Change
http://mitchjackson.com/white-elephants/
How to Change the Culture?
• Intramural and extramural training programs
• Fostering open science
– e.g. policies, challenges
• Fostering changes to the research life cycle
– e.g. preprints, data citation, open final reports
• Strategic planning with buy-in from major
stakeholders
• Use cases as exemplars
What is the desired endpoint?
Uber!
Some Thoughts as to Why I am Not
Crazy
• A platform to exchange goods – researchers
produce and consume reagents, data,
knowledge etc.
• A platform built on trust – trust is a key part of
the academic enterprise
• A platform provides a sustainable business
model
Sangeet Paul Choudary
http://www.wired.com/insights/2013/10/why-business-models-fail-pipes-vs-platforms/
Summary
It was the best of times, it was the worst of
times, it was the age of wisdom, it was the
age of foolishness, it was the epoch of
belief, it was the epoch of incredulity, it
was the season of Light, it was the season
of Darkness, it was the spring of hope, it
was the winter of despair…
Charles Dickens

More Related Content

Understanding the Big Data Enterprise

  • 1. Understanding the Big Data Enterprise Philip E. Bourne, PhD, FACMI Associate Director for Data Science https://datascience.nih.gov/ philip.bourne@nih.gov
  • 2. My Bias • University professor - 30+ years • Associate Vice Chancellor for Innovation – 2 years • Maintainer of public data resources (PDB etc. – 15 years) • Open science advocate – 10+ years • Fed – 2 years and counting
  • 3. None of what I am about to tell you negates what you have heard thus far today… Much of what you have heard is prerequisite to my 30,000 foot view
  • 4. My Definition of Big Data • More than the 4+ “V’s” • A signal of the coming digital economy • An economy characterized by using data to gain a business advantage (and yes universities are a business)
  • 5. What is the Worse that Can Happen? Digitization Deception Disruption Demonetization Dematerialization Democratization Time Volume,Velocity,Variety Digital camera invented by Kodak but shelved Megapixels & quality improve slowly; Kodak slow to react Film market collapses; Kodak goes bankrupt Phones replace cameras Instagram, Flickr become the value proposition Digital media becomes bona fide form of communication [Steven Kotler] http://bigthink.com/think-tank/steven-kotlers-six-ds-of-exponential-entrepreneurship
  • 6. Enterprises that are not born digital are at a disadvantage in this new economy… Fortunately no university has yet to be born digital … The “Google university” could change that
  • 7. The Writing is on the Wall (Personal Experiences) • The story of Meredith • Increasing number of undergraduates as first authors on my papers • Talking head lectures • Growing frustration at lack of entrepreneurial support • The Google bus
  • 8. The Writing is on the Wall (Institutional) • Changing access models • Changing funding models – Less federal and state funds – More sponsored research – Increased tuition – More reliance on philanthropy • Changing pedagogy – MOOCs, SPOCs, DOCCs, flips • Changing student expectations – Expect to be taught in a different way • Changing faculty expectations – Expect more from the institution • Changing staff expectations – Better recognition • Changing employer expectations http://collegeparents.org/2011/01/26/when-your-college-student-unhappy/ Yet demand for a quality higher education has never been higher
  • 9. Leads to the Notion of the University as a Digital Enterprise • The university is defined by its digital assets: – On-line course materials – All of the research life cycle on-line: grants, data, computational methods, results, conclusions, publications – Faculty, staff and student profiles on-line – All administrative data on-line e.g. grants, policies and procedures, disclosures, contracts, patents, agreements, payroll, academic files
  • 10. The Most Successful Universities of the Future Will be Those That Can Best Leverage Their Digital Assets – How?
  • 11. “Life Wasn’t Meant to be Easy” Malcolm Fraser Former Prime Minister of Australia
  • 12. How? - Break Down the Silos Research Basic Clinical Education Administration
  • 13. How? - An Appropriate Organizational Structure Chancellor CIO /CDO Research Services Education Services Admin Services Medical Services Library
  • 14. Use Cases from the University as a Digital Enterprise
  • 15. Research Data • Prof x drags and drops her research data to the institutional dropbox. She is asked for a small amount of metadata describing the dataset. Part of that request gives permission for the data to be indexed and the index analyzed by the University. That analysis reveals that two other researchers have worked on the same gene in the past two months and they are all alerted as to their common interest and begin collaborating. .
  • 16. Faculty Productivity • From a single profile a faculty member can, at the push of a button, generate a world-facing current web presence, provide biosketches to the major funding agencies and submit their academic file for review saving countless hours of reformatting which now goes into productive research.
  • 17. The Education – Research Interface • The UCSD on-line drug commercialization course which previously had 40 local students now has 12,000 several of whom apply to Dr. Bourne’s lab as PhD students based on the material he presented. The course also highlights UCSD’s leadership role and by navigating the on-line curriculum several students apply to UCSD as undergraduates. One high school student applies to Dr. Bourne’s lab as a summer intern.
  • 18. The Research-Administration Interface • Researcher x receives a new grant, researchers y and z are notified since it is very close to areas in which they work and points of collaboration may be possible. • Researcher x needs to have an assay performed and can immediately locate who on campus and off-campus can perform the work and at what cost. • Experts on and off campus can immediately be identified for the review of a potential patent filing based on a researcher’s technology.
  • 19. Talk is cheap – What is NIH doing to address a similar situation?
  • 20. NIH By Comparison • 27 silos • Clinical and basic research • Intramural + extramural • Administration • Education role different https://en.wikipedia.org/wiki/Victory_Soya_Mills_Silos
  • 21. Established a Commons • Supports a digital biomedical ecosystem • Treats products of research – data, software, methods, papers etc. as digital research objects • Digital research objects exist in a shared virtual space • Digital objects need to conform to FAIR principles: – Findable – Accessible (and usable) – Interoperable – Reusable
  • 22. Commons Framework Pilots (CFPs) • Exploring feasibility of the Commons framework • Facilitating connectivity, interoperability and access to digital objects • Providing digital research objects to populate the Commons • Enable biomedical science to happen more easily and robustly
  • 23. BD2K Centers, MODS and HMP Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data DigitalObjectCompliance App store/User Interface Mapping Commons PILOTS to the Commons Framework PaaS SaaS BD2K Indexing BioCADDIE, Other, schema.org IaaS [Vivien Bonazzi]
  • 24. Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data DigitalObjectCompliance App store/User Interface Mapping Commons PILOTS to the Commons Framework PaaS SaaS Cloud credits model (CCM) IaaS
  • 25. Commons Credits Model The Commons (infrastructure) Cloud Provider A Cloud Provider B Cloud Provider C Provides credits Enables Search Uses credits in the Commons IndexesOption: Direct Funding NIH Investigator bioCADDIE [George Komatsoulis]
  • 27. How to Change the Culture? • Intramural and extramural training programs • Fostering open science – e.g. policies, challenges • Fostering changes to the research life cycle – e.g. preprints, data citation, open final reports • Strategic planning with buy-in from major stakeholders • Use cases as exemplars
  • 28. What is the desired endpoint? Uber!
  • 29. Some Thoughts as to Why I am Not Crazy • A platform to exchange goods – researchers produce and consume reagents, data, knowledge etc. • A platform built on trust – trust is a key part of the academic enterprise • A platform provides a sustainable business model Sangeet Paul Choudary http://www.wired.com/insights/2013/10/why-business-models-fail-pipes-vs-platforms/
  • 30. Summary It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair… Charles Dickens

Editor's Notes

  1. SPOC – small private on-line course DOCC Distributed on-line collaborative course