Originally posted January 9, 2019

I hosted a meetup.com meeting today to talk about Ocean Protocol, other data sources for machine learning, and lead a group discussion of startup business ideas involving curating and selling data. The following is from a handout I created from material on the Ocean Protocol web site and other sources:

Data Trumps Software

Machine learning libraries like TensorFlow, Keras, PyTorch, etc. and people trained to use them have become a commodity. What is not a commodity yet is the availability of high quality application specific data.

Effective machine learning requires quality data

Ocean Protocol https://oceanprotocol.com - is a ecosystem based on blockchain for sharing data that serves needs for both data producers who want to monetize their data assets and for data consumers who need specific data that is affordable. This ecosystem is still under development but there are portions of the infrastructure (which will all be open source) already available. If you have docker installed you can quickly run their data marketplace demonstration system https://docs.oceanprotocol.com/setup/quickstart/.
Common Crawl http://commoncrawl.org - is a free source of web crawl data that was previously only available to large search engine companies. There are many open source libraries to access and process crawl data. You can most easily get started by downloading a few WARC data segment files to your laptop. My open source Java and Clojure libraries for processing WARC files are at https://github.com/commoncrawl/example-warc-java
Amazon Public Dataset Program https://aws.amazon.com/opendata/public-datasets/ - is a free service for hosting public datasets. AWS evaluates applications to contribute data quarterly if you have data to share. To access data sources search using the form at https://registry.opendata.awsto find useful datasets and use the S3 bucket URIs (or ARNs) to access. Most data sources have documentation pages and example client libraries and examples.

Overview of Ocean Protocol

Ocean Protocol is a decentralized data exchange protocol that lets people share and monetize data while providing control, auditing, transparency and compliance to both data providers and data consumers. The initial Ocean Protocol digital token sale ended March 2018 and raised $22 million. Ocean Protocol tokens will be available by trading Ethereum Ether and can be used by data consumers to purchase access to data. Data providers will be able to trade tokens back to Ethereum Ether.

Terminology

Publisher: is a service that provides access to data from data producers. Data producers will often also act as publishers of their own data.
Consumer: any person or organization who needs access to data. Access is via client libraries or web interfaces.
Marketplace: a service that lists assets and facilitates access to free datasets and datasets available for purchase.
Verifier: a software service that checks and validates steps in transactions for selling and buying data. A verifier is paid for this service.
Service Execution Agreement (SEA): a smart contract used by providers, consumers, and verifiers.

Software Components

Aquarius: is a service for storing and managing metadata for data assets that uses the off-chain database OceanDB.
Brizo: used by publishers for managing interactions with market places and data consumers.
Keeper: a service running a blockchain client and uses Ocean Protocol to process smart contracts.
Pleuston: an example/demo marketplace that you can run locally with Docker on your laptop.
Squid Libraries: client libraries to locate and access data (currently Python and JavaScript are supported).

Also of interest: SingularityNET

https://singularitynet.io is a decentralized service that supports creating, sharing, and monetizing AI services and hopes to be the world’s decentralized AI network. SingularityNET was started by my friend Ben Goertzel to create a marketplace for AI service APIs.

Search This Blog

Ocean Protocol Meetup

Data Trumps Software

Effective machine learning requires quality data

Overview of Ocean Protocol

Terminology

Software Components

Also of interest: SingularityNET

Popular posts from this blog

I am moving back to the Google platform, less excited by what Apple is offering

Getting closer to AGI? Google's NoteBookLM and Replit's AI Coding Agent

My Dad's work with Robert Oppenheimer and Edward Teller

Clojure vs. Scala smackdown

Nice: OpenCyc version 4.0 has been released

Ruby Sinatra web apps with background work threads

Small example app using Ember.js and Node.js

Writing a simple SQL data source for the free LGPL version of SmartGWT

Using the Datomic free edition in a lein based project

Comparing Clojure + Clojurescript with Scala + Scala.js

Happy New Year

And the best JVM replacement language for Java is: Java?

History in the making: first Lee Sedol vs. AlphaGo match game