This document provides an overview of cloud computing and Google Cloud Platform. It defines cloud computing and describes its benefits like pay per use, availability, security, and scalability. It explains the different cloud service models (SaaS, PaaS, IaaS) and Google's implementation including App Engine and Cloud Platform products. It also provides documentation links and tutorials for developing applications on App Engine using Java.
This document provides an overview of cloud computing and Google Cloud Platform. It defines cloud computing and describes its benefits like pay per use, availability, security, and scalability. It explains the different cloud service models (SaaS, PaaS, IaaS) and Google's implementation including App Engine and Cloud Platform products. It also provides documentation links and tutorials for developing applications on App Engine using Java.
This document provides an overview of cloud computing and Google Cloud Platform. It defines cloud computing and describes its benefits like pay per use, availability, security, and scalability. It explains the different cloud service models (SaaS, PaaS, IaaS) and Google's implementation including App Engine and Cloud Platform products. It also provides documentation links and tutorials for developing applications on App Engine using Java.
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 Lorenzo Zimolo lorenzo.zimolo@sinesy.it google.com/+LorenzoZimoloSinesy ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Cloud computing "Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction." NIST, il National Institute of Standards Technology http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Cloud computing: why? pay per use (many forms) availability (distributed and replicated data centers) security (physical and logical) scalability (from zero to infinite, with caps) flexibility computation power costs reduced in-house infrastructure So.... think about services you need rethink IT role in your company where is my data? ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Current providers examples Google Amazon Microsoft VMWare Force.com IBM ... In Italy Telecom Italia Aruba .... Every provider brings its own ideas and implementation of the cloud. ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Service models Software as a Service (SaaS). The capability provided to the consumer is to use the providers applications running on a cloud infrastructure. The applications are accessible from various client devices through either a thin client interface, such as a web browser (e.g., web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user specific application configuration settings. Platform as a Service (PaaS). The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment. Infrastructure as a Service (IaaS). The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls). ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Service models stack ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Cloud computing dimensions ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Google Cloud Implementation All three service models: SaaS, PaaS, IaaS. All main cloud aspects are addressed, with particular stress to: less infrastructure management pay only what you use (pay as you go) high scalability Focused on public cloud, multitenant SaaS or reserved execution runtime instances. ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Download da: https://www.google.com/intl/en/chrome/browser/ Modalit incognito: https://support.google.com/chrome/answer/95464?hl=it Store applicazioni: https://chrome.google.com/webstore/category/apps Chrome ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Google Saas GMail (and more!) Google Apps (for Business, for Education) http://www.google. com/enterprise/apps/business/ Google Maps (for Business) http://www.google.com/enterprise/mapsearth/ Google Analytics http://www.google.com/analytics/ Google Ads (AdSense, AdWords) http://www.google.com/ads/ ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Sito Google Apps: https://www.google.com/intx/it/enterprise/apps/business/ Prodotti inclusi: https://www.google.com/intx/it/enterprise/apps/business/products.html Google Apps Marketplace: http://www.google.com/enterprise/marketplace/ Google Apps for Business ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Documentazione: https://www.google.com/intx/it/enterprise/apps/business/resources/library.html Webinar: https://www.google.com/intx/it/enterprise/apps/business/resources/recorded-webinars. html Supporto e documentazione tecnica GMail: https://support.google.com/mail/?hl=en#topic=3394144 Google Apps for Business ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Documentazione Drive: https://support.google.com/drive/?hl=en#topic=14940 Documentazione su tutti i prodotti: https://support.google.com/ Status dashboard: http://www.google.com/appsstatus#hl=en&v=status Google Apps for Business ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo E-mail protocols RFC: Request For Comments. Officila documents describing Internet protocols. POP3 RFC: http://tools.ietf.org/html/rfc1939 Wikipedia: http://it.wikipedia.org/wiki/Post_Office_Protocol http://en.wikipedia.org/wiki/Post_Office_Protocol IMAP: RFC: http://tools.ietf.org/html/rfc3501 Wikipedia: http://en.wikipedia.org/wiki/Internet_Message_Access_Protocol ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo E-mail protocols SMTP RFC: https://james.apache.org/server/rfclist/smtp/rfc0821.txt Wikipedia: http://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol Spiegazione: http://computer.howstuffworks.com/e-mail-messaging/email3.htm ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Revision control - Versioning CVS: http://en.wikipedia.org/wiki/Concurrent_Versions_System Revision Control: http://en.wikipedia.org/wiki/Revision_control How il works: http://betterexplained.com/articles/a-visual-guide-to-version-control/ Open source products: SVN GIT List: http://en.wikipedia.org/wiki/List_of_revision_control_software ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Google Cloud Platform Iaas and Paas by Google Official site: https://cloud.google.com/ Products: https://cloud.google.com/products/ Documentation: https://cloud.google.com/developers/ ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Google Cloud Platform ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Load balancer Doc: http://en.wikipedia.org/wiki/Load_balancing_(computing) ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo HTTP/HTTPS HTTP: http://en.wikipedia.org/wiki/HTTP HTTPS: http://en.wikipedia.org/wiki/HTTP_Secure ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Google App Engine (GAE) Fully Managed Platform Easy Development & Deployment Focus On Your Code Not Your Server Automatic Scaling Popular Programming Language Support Flexible and Scalable Application Storage Services (Cron, Queue, Memcache, etc) Datastore Versioning and Traffic Splitting Local Developer Tools Third-party Frameworks and Extensions ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Home page: https://developers.google.com/cloud/ App Engine: https://developers.google.com/appengine/ Google Cloud Platform Developer Docs ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Java SDK 7 http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260. html Java 7 API http://docs.oracle.com/javase/7/docs/api/ Eclipse Ide for Java EE Developers 4.3 https://www.eclipse.org/downloads/ GAE Java develpment environment ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo App Engine development Google SDK https://developers.google.com/appengine/downloads Google Plugin for Eclipse: https://developers.google.com/appengine/docs/java/tools/eclipse https://developers.google.com/eclipse/docs/install-eclipse-4.3 ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Java Web Technologies: Servlets and JPSs Tutorial: http://courses.coreservlets.com/Course-Materials/csajsp2.html http://docs.oracle.com/javaee/5/tutorial/doc/bnafd.html ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo App Engine What it is: https://developers.google.com/appengine/docs/whatisgoogleappengine Scalability as it gets spike of traffic at the event of earthquake Reliability it's useless if it does not work at the time of disaster Cost efficiency it's too expensive to prepare for enough hardware resource that can handle the peak traffic. they would be idle for the most of the time ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Traditional solution What if you have: Hardware failures Traffic Spike Growing Big Data No initial fund No one to build/operate ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Hosting challenges ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo The Google Way! Encourage Google's best practice for scalability and reliability. Non-relational data model by Datastore/Bigtable sharding, denormalization... Portable and fine-grained app design fast request handling to optimize server resource utilization independent to each physical server It's not just a hosting service: App Engine empowers you to design your app in the Googley way! ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Results... Significantly lower Total Cost of Ownership Economy of scale Easy to develop and deploy Free to start - no initial cost Lower operational cost no security patches, upgrades, etc. 24x7 operation by Google SREs ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Request Queue App Engine watches Pending Request Queue of each version Let's see what would happen if your app gets a traffic spike Instances dynamically added/removed based on queue size Pendi ng Requ est Queu e Idle Instances Pending Latency ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo GAE Console and status check https://appengine.google.com/ Log View (quota errors!) Versions Downtime notify group: google-appengine-downtime-notify Status page: https://code.google.com/status/appengine ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Features, prices, quotas Features: https://developers.google.com/appengine/features/ Prices: https://developers.google.com/appengine/pricing Quotas: https://developers.google.com/appengine/docs/quotas Limit to resource usage to protect the AppEngine System Quota errors! To avoid quota errors, enable billing and set budget on resources. ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Billing To enable Billing on your account: Admin Console > Billing Status > Enable Billing Your app will be run under Billing Enabled account ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Application versions Version is stored in Java: WEB-INF/appengine-web.xml Python: app.yaml Can use any text for version Only one version can be default Min/Max idle instances are default version It takes a few moments to switch to new default version. This depends on complexity of application/start-up time and the current load on your application. Traffic splitting can route percentage to non default versions ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Logging API https://developers.google.com/appengine/docs/java/logs/ Logging in Java https://developers.google.com/appengine/docs/java/?csw=1#Java_Logging ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Apache JMeter Load test and performance measurement suite. Web Site: https://jmeter.apache.org/ ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo App Stats for GAE Application efficiency (and cost!) App Stats: https://developers.google.com/appengine/docs/java/tools/appstats Exercise: enable appstats in your Java application ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Enable app stats <filter> <filter-name>appstats</filter-name> <filter-class>com.google.appengine.tools.appstats.AppstatsFilter</filter-class> </filter> <filter-mapping> <filter-name>appstats</filter-name> <url-pattern>/*</url-pattern> </filter-mapping> <servlet> <servlet-name>appstats</servlet-name> <servlet-class>com.google.appengine.tools.appstats.AppstatsServlet</servlet-class> </servlet> <servlet-mapping> <servlet-name>appstats</servlet-name> <url-pattern>/appstats/*</url-pattern> </servlet-mapping> ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Authentication & Authorization Authentication: who are you? Authorization: what can you do in the app? ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Authentication Otherwise: Custom code ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Custom Authentication Develop any custom authentication mechanism within application Enterprise SSO systems (only if uses 80/443 ports) Username/password How? Setup application as "Open to All Google Account users" Do not restrict access ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Restricting access https://developers.google.com/appengine/docs/java/users/ https://developers.google. com/appengine/docs/java/config/webxml#Security_and_Authentication https://developers.google. com/appengine/docs/java/javadoc/com/google/appengine/api/users/package-summary web.xml <security-constraint> <web-resource-collection> <url-pattern>/profile/*</url-pattern> </web-resource-collection> <auth-constraint> <role-name>*</role-name> </auth-constraint> </security-constraint> Excercise: add a protected URL in your app ADMIN: <security-constraint> <web-resource-collection> <url-pattern>/developer</url-pattern> </web-resource-collection> <auth-constraint> <role-name>admin</role-name> </auth-constraint> </security-constraint> ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo User API Identify basic details about the logged in user Nickname Email User ID (empty for federated user) Federated identity (Open ID identifier) Federated provider (url of federation provider) Functions Create Login URL get current user is current user admin Environment variable Domain (AUTH_DOMAIN) Java UserService userService = UserServiceFactory.getUserService(); if (req.getUserPrincipal() != null) { //logged in user } else { //logged out } if (userService.isUserAdmin()) { //logged in user is an Admin } ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Configuration files web.xml https://developers.google.com/appengine/docs/java/config/webxml appengine-web.xml https://developers.google.com/appengine/docs/java/config/appconfig ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Data storage options Cloud Datastore Cloud SQL Google Cloud Storage Used for large flat files File I/O interface Blobstore Key/Value storage Google Apps Docs, Spreadsheets, Drawings, etc. Not appropriate for application back-end data ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Storage options https://developers.google.com/appengine/docs/java/storage Cloud Datastore https://developers.google.com/appengine/docs/java/datastore/ ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Cloud Datastore: motivation Single Instance Performance limited by machine resources Single point of failure Replication (copy) Consistency among instances Sharding (split among machines) Lock control (transaction) ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Consistency Strong Consistency Data is always consistent among all database instances Just after write operation Crash in the middle of write operation Eventual Consistency Takes time until all data becomes consistent after write (Think of DNS as an example) ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Datastore secrets Table Table Table write write write write write write write write write write write write ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Terminology Datastore RDBMS Category of object Kind Table One entry/object Entity Row Unique identifier of data entry Key Primary Key Individual data Property Field ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Each object in the datastore is an entity Each entity has a unique key Each entity has one or more named properties Can be multi-valued (== tests if any value matches) Variety of data types (int, float, boolean, String, Date, etc.) Each entity is of a particular kind BlogEntry Key: ID=1234 name: joe@ex.com message: xxxxx date: 1/1/2012 12:32 Following Key: joe@ex.com email: joe@ex.com following: [usr2@ex.com, usr3@ex. com] followers: [] Key: usr2@ex.com email: usr2@ex.com following: [] followers:[joe@ex.com] Entities Entity Kinds Properties Key ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Create an entity DatastoreService datastore = DatastoreServiceFactory.getDatastoreService(); Entity employee = new Entity("Employee"); employee.setProperty("name", "Antonio Salieri"); employee.setProperty("hireDate", new Date()); employee.setProperty("attendedHrTraining", true); datastore.put(employee); ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Available APIs Java Low-level API The best performance, but more coding JDO/JPA More portability by Java standard APIs Third party frameworks Objectify, Twig, Slim3... Sophisticated features with better performance Python DB API Traditional Datastore API for Python NDB API (New DB) Automatic caching, sophisticated queries, atomic transactions ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Quering the Datastore Query query = new Query("Person"); Query.Filter nameFilter = new FilterPredicate( "name", FilterOperator.EQUAL, "John"); query.setFilter(nameFilter); PreparedQuery results = datastore.prepare(query); ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Filters Filter on: property values keys ancestors Filter on property values Equality Filter (Equal to) IN -- Member of a list Inequality Filters Not equal to Less than Less than or equal to Greater than Greater than or equal to ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Adding more filters Query query = new Query("Person"); Query.Filter filter1 = new FilterPredicate(...); Query.Filter filter2 = new FilterPredicate(...); Query.Filter comboFilter = CompositeFilterOperator.and(filter1, filter2); query.setFilter(comboFilter); ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Sort by properties Sort by ascending or descending value of a property Some restrictions on sorting (discussed later) query.addSort("name", SortDirection.ASCENDING); ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Query for descendants An entity can have a parent Specify the parent when you create the entity You can query for descendants of an entity Conference1 Workshop1 Workshop2 Ticket2 Ticket1 ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Ancestor query To add an ancestor filter to a query: Query query = new Query("Kind"); query.setAncestor(parentKey); ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Index and queries SELECT * FROM Person WHERE height < 72 ORDER BY height DESC height: 76 height: 75 height: 73 height: 71 height: 70 height: 68 height: 67 height: 64 first_name: John height: 71 first_name: Bob height: 70 first_name: Kate height: 68 Index table for height Range Scan on Bigtable Entities in the query result Datastore Query ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Datastore requires indexes for every query Otherwise the query fails Not like the index in RDB which is used to improve performance The Index Scan makes it possible for query performance to scale with the size of the result set, not the data set. ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Single property index Key of Index table Query for kind Person first_name >= "A" and first_name < "C" Scan range [Person first_name A, Person first_name B] Person / first_name / Audrey Person / first_name / Ben Person / first_name / Bridgit Person / first_name / Cathy Two single- property indices are created automatically: ascending descending ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Queries supported by one-properties indexes Equality filters on one or more properties first_name = 'Bob' AND last_name = 'James' Inequality filters on one property first_name >= 'B' AND first_name < 'C' AND first_name != 'Bob' One sort order ORDER BY last_name ASC will be executed as first_name < 'Bob' OR first_name > 'Bob' ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Complex Queries Composite Index must be explicitly configured. Query for kind is Person last_name="Smith" first_name > "A" and first_name < "D" Scan range [Person Smith B, Person Smith C] Kind / last_name / first_name Person / Raley / Jane Person / Smith / Ben Person / Smith / Cathy Person / Smith / Daniel Person / Thomas / Alice Equality filter + Inequality filter ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo How to create indexes App Engine creates single property indexes for all properties You can run queries in the development server to create custom indexes You can create or edit index configuration file Java XML WEB-INF/datastore-indexes.xml WEB-INF/appengine-generated/datastore-indexes-auto.xml ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Multi valued fields Entity kind Person name = Brian lucky_number = {1, 5, 7, 9} Kind / property / value Person / lucky_number / 1 Person / lucky_number / 5 Person / lucky_number / 7 Person / lucky_number / 9 An index entry is created for EVERY value of a property ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Multi valued properties in queries Matches query for: Kind is Person lucky_number = 1 Entity kind Person name = Brian lucky_number = {1, 5, 7, 9} Multi-valued properties match a query If AT LEAST ONE value matches ALL the filters Matches query for: Kind is Person lucky_number > 2 and lucky_number < 6 Does NOT match query for: Kind is Person lucky_number > 1 and lucky_number < 5 ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Missing properties Kind last_name first_name Entities with no property or an unindexed value are not included in results Person Anderson Jane Person Arundel Person Jenny X Missing Property is not equal to Null/None Query for: Kind = Person last_name != Arthur ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Inequality filters Inequality filters: limited to one property per query Query for: first_name = Cathy last_name > Able last_name < Mooney Query for: first_name > Cathy last_name > Able OK X ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Inequality filters and sorting A property with an inequality filter must be sorted first Query for: first_name = Cathy last_name > Able sort by last_name Query for: last_name > Able sort by first_name OK X ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo JOINs are not permitted SELECT FROM Person WHERE age > 25 and country = US Use Denormalization It's a known practice for any scalable database design SELECT * FROM PERSON p, ADDRESS a WHERE a.person_id = p.id AND p.age > 25 AND a.country = US Maintain country in Person ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Aggregation Queries not supported Datastore does not support aggregation queries (group by, having, sum, avg, max, min) Use a special entity that maintains aggregated values counter entity be careful not to make the entity bottleneck (by 1 updates/sec limit) use Sharding Counter pattern or Memcache putIfUntouched() + Datastore insert ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Aggregation queries not supported Use batch processing to aggregate values asynchronously Backend instance App Engine MapReduce Datastore Statistics for counting entities updated once per day Use Sorting for MIN() or MAX() Sort by a property: the first entity will have min/max value. ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Cost of indexes Index consumes Datastore space & instance hours Take the cost of Index into account for cost estimation Read: Understanding Write Costs How Entities and Indexes are Stored New Index for a large set of entities may take a long time ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Datastore statistics View Statistics in the Admin Console Datastore > Datastore Statistics ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Setting property as unindexed conference.setUnindexedProperty( "mainContact", "Adam Bolivar"); Good practice -- don't index long strings, such as descriptions, you won't usually be querying them. You would use the Search API to search them. In addition to any unindexed properties you declare explicitly, those typed as long text strings (Text) and long byte strings (Blob) are automatically treated as unindexed. ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Deleting old index Existing indexes remain when you change the index config file To delete an unused index: Update index config file Then: appcfg.sh vacuum_indexes myapp This lets you leave an older version of the app running while new indexes are being built, and to revert to the older version if needed ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Query and indexes When do the indexes get updated? Every query to the datastore uses an index: an automatically-generated single property index or a custom index ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo When are indexes updated? Every entity update has multiple writes: commit phase: writes data in log write phase: writes data to datastore updates indexes might take longer than writing to the datastore If commit phase succeeds, write phase is guaranteed to succeed, but might not happen immediately What happens if I query an entity before the indexes are updated? ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo What do you get when you run a query? Queries always return results from the INDEX What's an ancestor query? It's a query that uses an ancestor filter. Results only include descendants of a specific entity. Results are strongly consistent -- completely up to date to get latest updates, use ancestor queries ancestor queries force the index to update ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Entity groups When you create an entity, you can specify its parent Each entity is its own entity group by default Parent child relationships are forever! Terminology Tip: Entities that descend from a common ancestor are in an entity group Conference1 Workshop1 Workshop2 Ticket2 Ticket1 ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Eventual vs Strong Consistency Queries using an ancestor filter force applicable index updates to complete strongly consistent Queries without an ancestor filter get results from the last index update eventually consistent ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Entity groups: used for... Entity groups are useful for ancestor queries to get strongly consistent results. What else are entity groups used for? Entity groups are used in: Ancestor queries Transactions Transactions! ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo What is a transaction? Atomicity Each transaction is "All or Nothing" Consistency Each transaction brings the datastore from one valid state to another Isolation Concurrent execution of transactions does not break consistency Durability Committed results of transaction persist after hardware failures Transaction - a set of operations performed on a data store, that preserves ACID characteristics. ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Snapshot isolation All reads in a transaction reflect the state of the Datastore at the time the transaction started If an entity is modified or deleted in the transaction, a query or get returns the original version of the entity, or nothing if the entity did not exist then https://developers.google.com/appengine/docs/python/datastore/transactions#Isolation_and_Consistency https://developers.google.com/appengine/articles/transaction_isolation ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Optimistic concurrency The first transaction to commit its changes succeeds All others fail. The others can try again to apply their changes to the updated data. What happens if multiple transactions try to update the same entity group at the same time? A transaction commits its changes only if: the values updated by the transaction have not changed since the snapshot was taken ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Using transactions DatastoreService datastore = DatastoreServiceFactory.getDatastoreService() Transaction txn = datastore.beginTransaction(); try { Key empKey = KeyFactory.createKey("Employee", "Joe"); Entity employee = datastore.get(empKey); /*... reading and writing on employee ...* datastore.put(employee); txn.commit(); } finally { if (txn.isActive()) { txn.rollback(); } } ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Entity operations in a transaction Single entity transactions: update a single entity group Cross-entity transactions: update up to 5 entity groups Operations on an entity group: create entities update entities delete entities Queries inside a transaction return results from the state of the datastore before the transaction started ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Transaction limits Limit to the number of entity groups 1 for single entity group transaction 5 for cross-entity group transactions Limit to number of updates per entity group per second Usually between 1 and 5 updates per second Duration limits Max duration of 60 seconds ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Best practices A transaction should happen quickly to minimize chances of external changes that conflict with the transaction Prepare data outside the transaction Prepare keys outside the transaction Use the keys to fetch entities inside the transaction ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Lets code! Working with entities: https://developers.google. com/appengine/docs/java/datastore/entities#Java_Working_with_entities Queries: https://developers.google.com/appengine/docs/java/datastore/queries https://developers.google.com/appengine/docs/java/datastore/projectionqueries ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo What is memcache? Memcache is an in-memory Key-Value Pairs data store Put a value with a key Get a value with a key "user001" : "John Doe" "user002" : "Larry Page" key or value can be anything that is serializable Memcache is a shared service accessed via App Engine APIs. value key ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Why memcache? Improve Application Performance Reduce Application Cost ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo What is memcache for? Caching In Front of Datastore Cache entities for low-latency reads Integrated into most ORM frameworks (ndb, Objectify, ...) Caching for Read heavy operations User authentication token and session data APIs call or other computation results Semi-durable Shared state Across App Instances Sessions Counters / Metrics Application Configurations ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo How fast is memcache? Datastore Query Latency Memcache Read Latency ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Memcache APIs Java JCache APIs GAE Low-Level Memcache APIs Objectify for Datastore Python google.appengine.api.memcache module ndb for Datastore ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo General pattern for Datastore Coordinate data read with Datastore: Check if Memcache value exists if it does, displays/uses cached value directly; otherwise fetch the value from Datastore and write the value to Memcache Coordinate data write with Datastore: Update Memcache value to handle race condition, leverage put if untouched/compare and set to detect race conditions Write the value to Datastore optionally, leverage the task queue for background writes ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Java code example import com.google.appengine.api.memcache.*; ... MemcacheService syncCache = MemcacheServiceFactory.getMemcacheService(); syncCache.setErrorHandler(ErrorHandlers.getConsistentLogAndContinue(Level. INFO)); value = (byte[]) syncCache.get(key); // read from cache if (value == null) { value = getDataFromDb(key); // fetch value from datastore syncCache.put(key, value); // write to cache (key and value must be serializable) } ... ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Batch operations getAll(), putAll(), deleteAll() A single read or write operation for multiple memcache entries Note Further improve Memcache performance Batch size < 32 MB ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Athomic operations increment(key, delta), incrementAll(...), Provide atomic increment of numeric value(s) getIdentifiable(), putIfUntouched() A mechanism to update a value consistently by concurrent requests Note Helps managing memcache data consistency in multi-instances/concurrent environment ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Other Asynchronous calls Provides a mechanism to make a non-blocking call for memcache operations. Namespace Logically separates data layers for different application purposes (such as multi-tenancy) across many GAE services, such as Datastore, Memcache, Task Queue etc.. ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Memcache is volatile Entries can be evicted anytime for various reasons: entry reaches expiration entry is evicted because memcache memory is full memcache server fails It's important to handle cache-miss gracefully!
Implement write-through logic by backing memcache with datastore in your application! Tip ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Memcache is not transactional $100 Instance 1 reads $100 Use getIdentifiable() and putIfUntouched (...) for optimistic locking. $100 Instance 2 reads $100 $80 Instance 2 deducts $20 $70 Instance 1 deducts $30 Tip ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Free memcache is limited Your application should function without memcache. Only need to cache what is useful and necessary. Compression Improve the cache-hit rate Dedicated memcache Cache size in GB (QPS 10K/GB) My Application Does NOT Have Enough Memcache! Tips ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Memcache key points 1. Memcache is supported natively in GAE.Take advantage of it to improve your GAE application performance. 2. Memcache supports open standard JCache API. Many advanced features are available via GAE Memcache APIs to suit your application's needs; i.e. Batch, Atomic, Asynchronous operations. 3. Seamless integration with GAE Datastore in a few libraries such as Python ndb and Java Objectify. 4. Read-frequently and write-rarely data is most suitable for use with Memcache. 5. Handle Memcache's volatility in your application. 6. Use Memcache wisely, it is not an unlimited resource. ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Task queues and cron Task Queues Push Queues Pull Queues Cron or Scheduled Tasks ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Task basic concepts Task: A task is a unit of work such as 'write object to datastore' or 'send an e-mail' All versions of an application share queues push queues for auto execution pull queues to programmatically consume tasks Tasks have a unique name generated automatically if not assigned insert new task with same name will fail Instances QueueName TaskName Tag Payload QueueName TaskName URL + Params (i.e.?id=x) Method (GET, POST, etc.) RetryOptions MaxBackoff MaxDoublings MinBackoff AgeLimit RetryLimit TaskRetryCount ExecutionCount TaskETA ExecutionDelay Tag Payload P u s h
Q u e u e P u l l
Q u e u e Task Task Task ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Tasks overview The task queue is a simple way to perform work outside of a user request. Push Queue: Pull Queue: Features: Executed ASAP May cause new instances Frontend or Backend - 10 minute deadline (Frontend) - Unlim deadline (Backend) Max 100KB task size Features: Task leased by worker REST interface (w/ACL) - Can be outside App Engine Max 1MB task size Instances 5 4 3 2 1 Instances Instances 5 4 3 2 1 Instances ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Push task creation code import com.google.appengine.api.taskqueue.Queue; import com.google.appengine.api.taskqueue.QueueFactory; import com.google.appengine.api.taskqueue.TaskOptions; Queue queue = QueueFactory.getDefaultQueue(); queue.add(TaskOptions.Builder.withUrl("/worker").param("id", "123")); //calls url "/worker" via POST with param id=123 ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Deleting a task Admin interface (one task or entire queue) Code Java: one named task: QueueFactory.getQueue("foo").deleteTask("myTask") or all tasks: QueueFactory.getQueue("foo").purge(); ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Pull queues Add task: Queue q = QueueFactory.getQueue("pull-queue"); q.add( TaskOptions.Builder.withMethod(TaskOptions.Method.PULL) .payload("hello world")); Lease then delete: tasks = q.leaseTasks(3600, TimeUnit.SECONDS, 100); //Do work!!! q.deleteTask(tasks); ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Cron App Engine allows tasks to be scheduled at defined times or regular intervals via cron. Cron tasks use a queue named "__cron". At the predefined time, it executes a GET request to the specified path Java cron.xml: <?xml version="1.0" encoding="UTF-8"?> <cronentries> <cron> <url>/recache</url> <description>Repopulate the cache</description> <schedule>every 2 minutes</schedule> <target>version-2</target> </cron> </cronentries> ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Cron configuration Parameters: url: the url to call (escape &, <, >, ', ") schedule: the times/dates to execute the task timezone: optional, the standard zoneinfo name (defaults to UTC) target: optional, the target version of application (defaults to the default) Schedule format: every 12 hours every 5 minutes from 10:00 to 14:00 2nd,third mon,wed,thu of march 17:00 every monday 09:00 1st monday of sep,oct,nov 17:00 every day 00:00 Specify "synchronized" to execute on regular interval regardless of how long it takes to execute. every 2 hours synchronized Please Note ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Cloud Storage Is a fast, scalable, highly available, strongly consistent object store Objects can have almost arbitrary size (max 5 TB) Use cases: SongPop, UBISoft Cost is: storage $0.026 /GB/month + egress traffic $0,10 /GB https://developers.google.com/storage/docs/overview ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Cloud Storage Structure 1. Projects a. All data belongs to a project 2. Buckets a. Buckets are the basic data containers b. Buckets belong to a project 3. Objects a. Objects are the individual pieces of data b. Objects belong to a bucket Note: No hierarchical structure of objects or buckets (i.e., no folders) ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Buckets Buckets are the basic container Buckets cannot be nested Bucket names must conform to standard Domain Name System (DNS) naming conventions Bucket names are global to the entire Google Cloud Storage Don't put any confidential information into a bucket name Must be unique Buckets Geographical Locations EU US experimental: regional buckets in the US Can specify region at bucket creation time ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Objects Objects are the immutable pieces of data you store in Google Cloud Storage Object names are unique within a bucket Object name can be up to 1024 unicode characters Directory structure: No concept of directories. Everything is a blob of data. Slashes ('/') are legal object name and you can mimic directory listings by using slash as the delimiter parameter myfirstbucket/faucets/grohe_201x.jpg myfirstbucket/showers/grohe_202b.jpg ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Objects Objects are strongly consistent There is no limit on how many objects you can put into a bucket Listing is eventually consistent For speed, you can index the objects using an index service if you plan to store more than a few thousands objects (e.g., use Cloud Datastore as your index) Object can be up to 5TB in size ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Cloud console Manage billing Create and manage projects Create and manage buckets Browse buckets Delete objects ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Access control Allows you to share your objects and buckets with ... Google Account User Google Apps Domain Google Groups All Authenticated Users All Users Anonymous Users ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo ACLs Access Control List Entry consists of Grantee - who Google Storage ID Google account email address Google group email address Google Apps domain Special Identifier - AllAuthenticatedUsers / AllUsers Permission - what can they do READ/WRITE/FULL_CONTROL permission is concentric Access Control List = (Grantees + Permissions)+ ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Access control summary Project Team - Who can create/delete/list buckets Bucket ACL - Who can create/delete/list objects Object ACL - Who can read object ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Bucket ACL Permissions READ - List bucket's content WRITE - Create/Overwrite/Delete objects in bucket FULL_CONTROL - READ/WRITE + READ/WRITE bucket ACL Default ACL is project-private Project Members - READ Project Editors - FULL_CONTROL Project Owners - FULL_CONTROL Default ACL can be changed with gsutil acl sub-command ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Objects ACL permissions READ - Can download object WRITE - Does not apply FULL_CONTROL - READ + READ/WRITE object ACL Default ACL is project-private Project Members - READ Project Editors - FULL_CONTROL Project Owners - FULL_CONTROL Can be changed with gsutil defacl sub-command Can specify an ACL during upload Bucket and Object ACL are independent! ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Command line tool Access Google Cloud Storage from the command line (gcutil) Allows for a wide range of bucket and object management tasks such as: Create and delete buckets or objects Get and set bucket or object ACLs Move, copy and rename objects ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo In short Google Cloud Storage is an Infrastructure as a Service (IaaS) which allows industrial strength data storage. Easy to use. Just projects, buckets and objects. Tools that make mastering the service easy. Provides a RESTful interface for programmatic access to perform Create, Read, Update, Delete (CRUD) operations. You can choose the APIs set that best satisfies your requirements from the native XML and JSON APIs to the App Engine APIs. Google Cloud Storage leverages the power, reliability, speed and ubiquity of Google world wide network. ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo In short Google Cloud Storage is an Infrastructure as a Service (IaaS) which allows industrial strength data storage. Easy to use. Just projects, buckets and objects. Tools that make mastering the service easy. Provides a RESTful interface for programmatic access to perform Create, Read, Update, Delete (CRUD) operations. You can choose the APIs set that best satisfies your requirements from the native XML and JSON APIs to the App Engine APIs. Google Cloud Storage leverages the power, reliability, speed and ubiquity of Google world wide network. ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Cloud Storage: upload file (BlobStore API) ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Code Example: upload form <% BlobstoreService blobstore = BlobstoreServiceFactory.getBlobstoreService(); String uploadUrl = blobstore.createUploadUrl("/uploadcallback", UploadOptions.Builder.withGoogleStorageBucketName(<BUCKET_NAME>)); %> . . . <form action="<%= uploadUrl %>" method="post" enctype="multipart/form-data"> <textarea name="title" placeholder="Your title or comment" maxlength="500" class="titleTextArea" required></textarea> <input type="file" name="fileName"> <input class="active btn" type="submit" value="Upload"> </form> ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Code example: upload call back servlet BlobstoreService blobstoreService = BlobstoreServiceFactory.getBlobstoreService(); Map<String, List<FileInfo>> blobs = blobstoreService.getFileInfos(req); Collection<List<FileInfo>> entries = blobs.values(); for (Iterator iterator = entries.iterator(); iterator.hasNext();) {
String gsFileName = myfileinfo.getGsObjectName(); log.info("gs storage is" + myfileinfo.getGsObjectName()); // DO WORK } ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Cloud storage: object serving (Image service) ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Code example: serving URL ServingUrlOptions options = ServingUrlOptions.Builder.withGoogleStorageFileName(image.getObjectName()); ImagesService imagesService = ImagesServiceFactory.getImagesService(); try { url = imagesService.getServingUrl(options) + "=s100"; } catch (Exception ex) { // we are in development env } ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo API Documentation Blobstore: https://developers.google.com/appengine/docs/java/blobstore/ Cloud storage and blobstore: https://developers.google. com/appengine/docs/java/blobstore/#Java_Using_the_Blobstore_API_with_Google_Cl oud_Storage Image services: https://developers.google.com/appengine/docs/java/images/ ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Course Evaluation 1. Multiple choice question test: 60% 2. Final programming excercise: 20% 3. Date example: 10% (remember: version 1) 4. Tutorial example: 10% (remember: version 2) ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Versioning System Team development with SVN o GIT ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD v 1.0 - Lorenzo Zimolo Programming test PhotoShare App
Developing New Applications and Services Storage, Back Up, and Recovery of Data Hosting Blogs and Websites Delivery of Software On Demand Analysis of Data Streaming Videos and Audios