ProvenDB proofs are hashed representations of the state of a particular version, which are pinned to a public Blockchain (the bitcoin Blockchain in this version of ProvenDB), thus providing cryptographic proof of the integrity and provenance of the version. The hash value which is posted to the Blockchain provides cryptographical proof that the version concerned existed at the date of hashing and that it has *not been altered since that time.*

### Proofs

All documents in a database version are included within each ProvenDB proof, although any document can be independently validated using that proof.

# Hashing concepts

A hash is a mathematical *signature* of a document - a digital fingerprint. The chances of two documents having the same hash are infinitesimally small. Even if every computer in the world today generated hashes as fast as it could, it would still take many times longer than the lifetime of the universe to find a duplicate. Hashes are therefore far more precise proofs of document identity than DNA or fingerprints.

Hash functions (source: wikipedia)

# Blockchain hashing and immutability

The bitcoin Blockchain uses hashes to link blocks in the chain and as part of the *Proof of work* algorithm. When a new block needs to be added to the Bitcoin Blockchain, *miners* compete to be the one to add that block by calculating a *nonce*. The nonce is a hash of contents (transactions) in the current block together with the nonce from the previous block. However not any hash will suffice – the nonce must contain a certain number of leading zeros. Finding a valid nonce is computationally expensive, and so the Bitcoin Blockchain network must consume a significant amount of CPU for each block to be added.

If an attacker wishes to overwrite a block within the Blockchain, they would have to calculate the nonce of the block concerned, and of all subsequent blocks. And they would have to do this faster than the existing Bitcoin network. In practice, the attacker would have to assemble CPU processing power greater than the aggregate CPU power of the Blockchain network.

Such an attack – known as a "51% attack" - is highly unlikely but perhaps not impossible. However, such an attack will only allow an attacker to double-spend their own coins – they cannot amend the transactions of others. Furthermore, the existing network will not accept rewritten blocks beyond a certain "pruning" limit, so historical blocks are protected from such an attack (though a denial of service could possibly be achieved). Therefore, even a 51% attack cannot alter historical Blockchain records. Accordingly, for all practical purposes, data placed on the Bitcoin Blockchain is immutable: it cannot be altered or redacted. (see the bitcoin wiki for more details).

# Using Blockchain hashes as proofs

It has long been possible to store hash values in a public Blockchain – such as the Bitcoin Blockchain - and use that as a proof of existence for that document. However, this simplistic approach has at least two critical limitations:

- Each document proof must be stored within its own bitcoin transaction. This is expensive both in dollar and environmental terms.
- The solution does not manage the document content. Should one modify the document concerned the hash becomes meaningless; is up to the user to manage the relationships between Blockchain hashes and document contents.

ProvenDB addresses each of these concerns:

- Thousands of documents are proven with a single blockchain hash using
*Merkle Trees*. - ProvenDB stores all versions of a document within its datastore, and manages the relationships between documents and Blockchain proofs.

# Merkle trees and aggregated proofs

Merkle trees allow us to prove multiple documents within a single hash. You might think of a Merkle tree as a "tree of hashes". Each document is hashed, and then pairs of documents are hashed, then pairs of those hashes are hashed, and so on until only a single hash remains. This "root hash"can be used to prove any or all of the documents concerned.

Using Merkle trees to pin many document proofs on the blockchain with a single hash

It might be unwieldy if the entire Merkle tree were required to prove an individual document. However, a single document can be proved by supplying just the *Merkle path*; - the sequence of hash pairs that connect the document hash with the root hash.

ProvenDB uses Merkle trees to create document proofs. Each version proof is a Merkle tree consisting of all the documents in the version. A document proof is a Merkle path connecting an individual document with the version proof.

Chainpoint is an open protocol that uses Merkle trees to connect multiple proofs to the Bitcoin Blockchain. ProvenDB proofs are Chainpoint compliant.

# Managing proofs in ProvenDB

ProvenDB adds a set of commands to the MongoDB API for creating and managing proofs:

- submitProof requests that a proof for a specific database version be created
- verifyProof verifies a proof by recalculating the Merkle tree comparing the root hash to that stored on the Blockchain. You could use verifyProof to determine that data had not been tampered with.
- getProof returns a Chainpoint compliant proof for a database version
- getDocumentProof returns a Chainpoint compliant proof for a specific document or a set of documents within a single collection.

# Lifecycle of a proof

ProvenDB proofs start in 'Pending' status. At this point the proof has been submitted to the ChainPoint network but has not yet been stored on the ChainPoint Blockchain.

Once confirmed on the ChainPoint network, proofs move to 'Submitted' status. This transition usually occurs within a few minutes.

When the ChainPoint proof is anchored on the bitcoin Blockchain, the proof moves to 'Valid' status. A delay of about an hour can be expected before the final 'Valid' status is achieved.

# Version proofs and document proofs

A ProvenDB proof can be used to prove the state of a specific version of the database, or of any document that is current in that version.

For instance, let's say you created a document on the 4th of July at database version 10001. Initially, that document is not proven. The next day - on the 5th of July - you create a proof for database version 10009. This new proof includes all documents created in early versions that have not been deleted as of 5th of July. However, the blockchain proof only establishes the existence of the document on the 5th of July.

### What's Next

Forgetting |