Finally! Bitcasa CEO Explains How The Encryption Works

TechCrunch Disrupt finalist Bitcasa, a new cloud storage provider, was met with a healthy dose of skepticism last week when it claimed to be able to provide “infinite storage.”  How does it do that? It can’t do what it promises! That’s not how encryption works! And so on. VC firm Andreessen Horowitz, along with First Round Capital, Pelion Venture Partners, and TechCrunch founder Michael Arrington’s CrunchFund have invested $1.3 million in the technology, which seems to suggest there’s valuable IP behind the startup’s overly broad promises of cheap, infinite and secure storage.

My initial review of the startup was generally positive because, by all descriptions, it’s doing something innovative and new. While we raised a few general questions (does it slow you down?, will it scale?), it’s hard to review something without going hands-on. For that matter, describing the way the technology works was perhaps overly simplified. For those of you with interest in deeper technical details, here they are (well, it’s a start, at least…).

As a finalist, Bitcasa got drilled by knowledgeable judges including Ron Conway, Hadi Partovi, Marissa Mayer, Roelof Botha, Matthew Cohler and Arrington.

Hadi Partovi was on the founding teams of Tellme and iLike. As an angel investor and startup advisor, Hadi’s portfolio includes Facebook, Zappos, Dropbox, OPOWER, Flixster, Bluekai, and many others.

Like you, he wanted to know how Bitcasa’s encryption worked.

To see how Bitcasa CEO Tony Gauda answered those questions, scroll to minute 11:48 of the video at the bottom of the post.  But his main explanation is that Bitcasa takes advantage of a technique called convergent encryption, which he explains towards the end before being cut off.

Here’s the transcript of the relevant portion of the conversation.

HP: What  do you do in terms of encryption or security?

TG: We encrypt everything on the client side. We use AES-256 hash, SHA-256 hashing for all the data.

HP: So it’s encrypted all on the client side and you can’t look at it on the server side?

TG: Exactly.

HP: So if I upload a file and Marissa uploads the same file, do you store two different copies of that or one?

TG: No, we do de-duplication on the server side. So we actually determine on the server side if it’s there, and if it’s already there, we don’t have to upload it again.

HP: But how do you do that…if it’s encrypted and you don’t have the key?

TG: There’s an academic paper called Convergent Encryption. This is actually something that’s been known for many years in the encryption community. But what we actually do is…we don’t encrypt it in the way that you think we’re doing it….There’s other ways to do it.

HP, giving a weird look: OK.

(Audience giggles.)

Paul Carr: I think the audience would like to know a little more about that…what does that mean?

TG: OK, so convergent encryption….what happens is when you encrypt data, I have a key and you have a key. And let’s say that that these are completely different. Let’s say that we both have the exact same file. I encrypt it with my key and you encrypt it with your key. Now the data looks completely different because the encryption keys are different. Well, what happens if you actually derive the key from the data itself? Now we both have the exact same encryption key and we can de-dupe on the server side.

(Microphone gets really loud at the end).

TG: See how powerful that was? That was powerful!

(Audience laughs).

TG: This is convergent encryption. We didn’t invent this. We invented all types of other things to make this thing awesome. There’s been a lot of talk about “hey, how do you do this?” And the trolls are coming out on Slashdot…

Paul Carr: OK, stop pitching.