Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Moving Apache CouchDB Data

to Cloudant
The path to scalable, always-on, and managed CouchDB as a service

February 2013

Moving CouchDB Data to the Cloudant DBaaS

Cloudant Overview
Cloudant provides a managed, cloud database as a service (DBaaS) that is based on Apache CouchDB. Cloudant is a
fast, always-on, scalable database that the big data experts at Cloudant operate and grow for you so you can stay
focused on new development and not on database administration.
Samsung, Microsoft, Salesforce.com, DHL, Hothead Games, Flurry, and thousands of other developers of large-scale or
fast-growing Web and mobile applications use Cloudant. The Cloudant DBaaS features:
A schema-less (NoSQL) JSON document store
o

For operational data optimized for concurrent reads & writes, high availability, and data durability

Monitored, scaled, and managed by big-data experts at Cloudant

Accessed via an Apache CouchDB-compatible, RESTful API

APIs for specialized data management features:


o

Data replication & sync with mobile devices or local data centers

Full-text indexing & search (Apache Lucene powered)

Geo-spatial indexing and analytics

Incremental MapReduce for real-time analytics

Global data distribution


o

Across a network of data centers in North America, Europe, and Asia

Fault-tolerance via cross-data center data distribution

Multiple hosting options AWS, Azure, Joyent, Rackspace, SoftLayer

Geo-load balancing connects users to the closest data source for lower data access latency

Why Transition to Cloudant?


Developers choose CouchDB for a variety of reasons: schema freedom, ease of development, and replication & sync to
name a few. But CouchDB can be difficult to scale out to handle larger workloads. The Cloudant DBaaS is based on
Apache CouchDB, and has been enhanced with a horizontal scaling framework, Lucene full-text search, geo-spatial
indexing, fault tolerance, and other features not found in CouchDB so that you can:

Build More
Cloudant enhances the CouchDB development experience with built in scaling, fault tolerance, Lucene-powered fulltext indexing and search, and geo-spatial indexing. Having these built into your data layer makes it easier to enrich
your apps with advanced data management features.

Grow More
Growing a CouchDB database to hold more data or support many more users is hard to do. Cloudant includes a
horizontal scaling and fault-tolerance framework that makes this easy; it was initially developed to manage the
petabytes of data that the Large Hadron Collider generates every second so that it could be accessed by physics
researchers around the world.

Sleep More
Keeping CouchDB running smoothly is a 24x7 operation, and we do that for you. We monitor, grow (reconfigure,
repartition/rebalance clusters), protect and administer your data layer around the clock so you can get a good nights
sleep.

Moving CouchDB Data to the Cloudant DBaaS

Moving Your Data to Cloudant


Migrating data from CouchDB to Cloudant is conceptually straightforward. It involves:
1

Replicating data from your CouchDB database to Cloudant

Optionally, adjusting your CouchDB design docs

The process generally takes a day or two depending on the scope of your application.

Taking a Phased Approach


Migrating your current data layer to Cloudant can be done in phases; it does not have to be an all or nothing process.
You can start by migrating a single database to Cloudant while other data continues to reside on other servers.
Good candidates for migration include databases that need to be scaled out. Rather than configuring your own
CouchDB cluster and partitioning your CouchDB data across it, consider moving your data to Cloudant. Your data will
be scaled out by Cloudant as part of the process.

Importing Your Data into Cloudant


If Cloudant will hold your data in the same database and JSON structure as your existing CouchDB database does, you
can simply replicate data from your CouchDB database to Cloudant.
Otherwise, youll need to output your CouchDB data to a file containing an array of JSON objects and then perform one
or more HTTP POST requests to bulk load the docs from you export file into Cloudant.

API and Data Design Doc Changes


Cloudant is based on CouchDB, and its API is largely compatible with CouchDB. Cloudant has had to make a few
changes to the CouchDB API in order to make it faster, richer and possible to use as CouchDB a hosted and managed,
cluster-based service. These differences might require that you change your design documents or application code:
View docs must be written in Javascript, unlike CouchDB, which permits these to be written in other languages.
Temp views are disabled. They are not a best practice for production systems in CouchDB because of
performance.
Changes feed might be unordered. In Cloudant, items in the changes feed are collected from nodes in the cluster
independently, so they might not be reported in order as they are in CouchDB. The since parameter works as
expected though.
Sequence numbers are integers in couch and are ordered. In Cloudant they are opaque JSON tokens which
include cluster state information.
Server configuration commands are disabled. We have disabled server configuration commands and server
shutdown; they arent applicable within a hosted service.
Authorization & Authentication differences:
o

Cloudant permits you to share database access across Cloudant user accounts.

Cloudant supports the same authentication methods as CouchDB, except for Oauth. Full support for Oauth is
currently under development.

Moving CouchDB Data to the Cloudant DBaaS

Interview with Stockr.com


Stockr.com is a real-time social networking site that connects financial investors and traders to track stocks and
discuss public companies, without the spam that dominates most other stock sites and message boards. We spoke
with Eugene Kashpureff Jr. of Stockr.com about his experience getting started with Cloudant.

Cloudant: Why did you move


Stockr.com to Cloudant?
Eugene: Im a volunteer firefighter
at home, Im a professional
firefighter at Stockr. I have better
things to do than to sit and mess
with databases all day long. We
use CouchDB to store our sites
big data stock info, user posts,
statistical data and the like. We
have a few tables of relational
data(mostly user login info) that
we keep in MySQL, but 95% of our
data volume is in CouchDB.
Offloading the management and
worry of all that information is the
core reason behind our move.

Cloudant: How much data, access, storage?


Eugene: Between our various development, test and production environments? About half a terabyte, and growing
daily. But I dont watch that number; thats why I let you guys have our data.

Cloudant: How has the performance been on Cloudant?


Eugene: Since we moved to Cloudant weve had zero user complaints about our sites speed, when it used to be a
constant nag. There are so many fewer problems that we have to deal with every single day. I cant say weve had NO
problems since moving to Cloudant, but its far fewer than we used to have.

Cloudant: What type of issues do you no longer have to deal with?


Eugene: Database sharding, disk compaction, running out of storage space, managing hardware & daemons, all the
stuff I blindly wandered through the CouchDB documentation for -- now thats gone from my life.
Cloudant: What did you migrate from?
Eugene: We originally started with the Ubuntu CouchDB package running on one node. Then expanded to three nodes
of BigCouch (http://bigcouch.cloudant.com). Then we tried to add two nodes to that, and we couldnt get it to work, so
we decided to just move to Cloudant. In addition, each of our developers had a separate environment, and we had
numerous unpleasant surprises when a configuration or version difference was found.

Moving CouchDB Data to the Cloudant DBaaS

Cloudant: How did the migration go?


Eugene: It took us a while to get replication set up, data imported over the weekend, and then we flipped the switch
and it just worked. Only thing was we had to set up a proxy server to deal with SSL endpointing. Replication took a few
days to get right because Cloudant was having an issue with the hardware SoftLayer provisioned; a new switch was
installed and configured wrong. Just one of those set-up-new-hardware problems you always have. We moved over
40GB of raw data then re-generated indexes.

Cloudant: Did you have to make any code changes?


Eugene: Our CouchDB views were written in Python, but Cloudant requires those be written in Javascript. Mike Miller
did a lot of the work converting those views documents for us. I think he said it took him about an hour. The only thing
we changed in our actual application was the address of the CouchDB server.

Cloudant: Is there anything we could have done to make moving to Cloudant easier?
Eugene: Im not sure if theres much more you folks could have delivered. I was surprised at how quick and painless as
it was. Outside of Stockr, I use other services like AWS. Its painful. With Cloudant it was a couple of config problems,
that was it. For an easier migration process, it would have been nice to have a tool to run against our
BigCouch/CouchDB server to automatically load it all up into Cloudant, rather than having to log in as admin and
assign all those relationships by hand. New customers would appreciate that.

Cloudant: Any closing thoughts?


Eugene: With Cloudant now, I can work on making the system faster rather than trying to keep the system up.

Getting More Help


If you need help getting started with Cloudant, visit the Cloudant Developer Resources Site (https://cloudant.com/fordevelopers/) or contact us for assistance:
#cloudant on IRC
@cloudant on Twitter
support@cloudant.com

129 South Street, Boston, MA 02111


(857) 400-9900 | cloudant.com

Moving CouchDB Data to the Cloudant DBaaS

About Cloudant
Cloudant provides developers of large-scale and fast-growing web and mobile applications with the
worlds first globally distributed database as a service (DBaaS) for loading, storing, analyzing, and
distributing operational application data. As a managed service, Cloudant helps developers eliminate
the delays, costs, and distractions inherent in working with databases and their administrators, while
providing unmatched scalability, availability, and performance. The Cloudant service is available hosted
on AWS, Joyent, Rackspace, SoftLayer, and Windows Azure. Cloudant customers include Samsung,
Hothead Games, Microsoft Big Park Studios, Flurry, Salesforce.com, DHL and thousands of other
developers worldwide.

You might also like