Are you the publisher? Claim or contact us about this channel


Embed this content in your HTML

Search

Report adult content:

click to rate:

Account: (login)

More Channels


Showcase


Channel Catalog


Channel Description:

Best content from the best source handpicked by Shyam. The source include The Harvard University, MIT, Mckinsey & Co, Wharton, Stanford,and other top educational institutions. domains include Cybersecurity, Machine learning, Deep Learning, Bigdata, Education, Information Technology, Management, others.

older | 1 | .... | 57 | 58 | (Page 59) | 60 | 61 | .... | 82 | newer

    0 0


    Love, Romance and Quantum Entanglement?




    One of history’s greatest engineers Nikola Tesla, who invented the means to transfer and to distribute electricity over long distances, once said, “If you want to find the secrets of the universe, think in terms of energy, frequency and vibration.” The same holds true for Love, Romance and Quantum Entanglement; the world in which we live; our interactions with each other; and indeed our interactions within.
    Quantum Entanglement & Quantum Coherence
    Much like Love and Romance, Quantum Entanglement occurs when two entities or systems appear to us to be separate but through Quantum Coherence act as one system, with states being able to be transferred wholesale from one entity to the other. Quantum Entanglement is at the heart of understanding how events across significant distances operate at the macro- and micro- level in a correlated way despite considerable distance between them.
    The Source
    More than a century ago, in 1910, the American ‘New Thought’ author Wallace D Wattles wrote in “The Science of Getting Rich”: “Everything you see on earth is made from one original substance, out of which all things proceed… there is a thinking stuff from which all things are made, and which, in its original state, permeates, penetrates, and fills the interspaces of the universe… a thought, in this substance, produces the thing that is imaged by thought… man can form things in his thought, and, by impressing his thought upon formless substance, can cause the thing he thinks about to be created.” In other words, thoughtware, software and hardware are interchangeable. To confirm this point, the latest 3D printers can now print almost anything solid from scratch.
    Pure Energy
    Scientific experiments in Quantum Physics and particularly those at the European Organisation for Nuclear Research — CERN — at the Large Hadron Collider (LHC) in Geneva, Switzerland, continue to demonstrate that once we break everything down to its core, pure energy is behind everything. When we go down to the sub-atomic level we do not find matter, but pure energy. Some call this the unified field or the matrix. Others talk about pure potentiality: all being energy. Others feel this potentiality as love and romance.
    Vibrating Field
    This pure energy vibrates a field around it. A vibrating field of energy, which attracts — like a magnet — and attaches to energy of the same vibrating frequency. The more vibrating energy that is compressed into this field of energy, the more intense the vibration gets within that field. Eventually the energy field manifests into matter: particle-by-particle. As the father of Quantum Physics, Max Planck, once said, “All the physical matters are composed of vibration.”
    Action at a Distance
    In Quantum Mechanics, non-locality refers to "action at a distance" arising from measurement correlations on Quantum Entangled states. The dividing line between the micro world of Quantum Processes and the macro world of classical physics is fading. Evidence is being increasingly acquired of the relevance in nature of Quantum properties and processes including Quantum Entanglement. Recent science has shown that Quantum Coherence and Entanglement may provide a viable explanation for a series of mysteries in nature: how photosynthesis in plants works, how birds keep orientation while migrating, and more including love and romance.
    Illusion
    A table may look solid and still, but within the table are billions and billions of subatomic particles “running around” and “popping” with energy. The table is pure energy and movement. Everything in this universe has its own vibrational frequency. It is the law of vibration in action. However we can’t see it so it appears separate and solid to us. This is actually an illusion! Our frequency of perception of electro-magnetic waves defines what we can and cannot see within the visible spectrum of light. However, at a different frequency, like X-rays, the entire solid object would appear to be completely like a sieve.
    Unified Field
    Every object in the Universe moves and vibrates — everything is vibrating at one speed or another. Nothing rests. Everything we see around ourselves is vibrating at one frequency or another and so are we. However, our frequency is different from other entities in the universe, hence it seems like we are separated from what we see around ourselves — people, animals, plants, trees and so on. But we are not separated — we are in fact all living in a continuous ocean of energy. We are all connected at the highest level: the unified field.
    Frequency Control
    At the very leading edge of biophysics today, scientists are recognising that the molecules in our bodies are actually controlled by these frequencies. In 1974, Dr Colin W F McClare, an Oxford University Bio-Physicist, discovered that frequencies of vibrating energy are roughly one-hundred times more efficient in relaying information within a biological system than physical signals, such as hormones, neurotransmitters, other growth factors and chemicals.
    Vibration as Sound and Light
    What’s most interesting is that, if a frequency is vibrating fast enough, it’s emitted as a Sound and if it is vibrating much faster, it is emitted as a colour of Light. If we wanted to convert Sound to Light, we would simply raise its frequency by forty octaves. This results in a vibration in the trillions of cycles per second. So, if a pianist could press a key way above the eighty-eight keys that exist on a piano, that key would produce Light. This could create a chord of Light in the same way they can create a chord of sound. And it would be seen as colours of Light because it would be moving at the speed of Light.
    Resonance
    When two frequencies are brought together, the lower will always rise to meet the higher. This is the principle of resonance. So, when a piano is tuned, a tuning fork is struck, and then brought close to the piano string that carries that same musical tone. The string then raises its vibration automatically and attunes itself to the same rate at which the fork is vibrating. This principle of resonance works for biological systems too.
    Consciousness and Awareness
    Everything has a vibrating energy field and behind the field is pure energy. This pure energy has a source that is found within everything throughout the universe. What is this source? Consciousness. Yet, the source of consciousness behind everything can be measured by the amount of awareness it has within any given energy field. In other words, there are different levels of awareness to every state of consciousness.
    Vibration Frequency
    The lower the vibration frequency, the slower the vibration; the higher the vibration frequency the faster the vibration. The difference between the manifestations of the physical, mental, emotional and spiritual levels of connectivity result simply from different levels of vibrating energy or frequencies. So, while the feelings of fear, grief and despair vibrate at a very low frequency, the feelings of love, joy, romance and gratitude may vibrate much quicker at a higher frequency, and meditation states higher still, as demonstrated by the frequency of the electro-magnetic waves generated by the human brain in a number of scientific experiments around the world.
    Layers Stacked On Top of Each Other All The Way to Full Consciousness
    In short everything can be stripped away in layers in this order:
    1. Physical — 
    The lowest layer is Matter;
    2. Vibrating Field— 
    The layer above Matter is a Vibrating Field of Energy;
    3. Pure Energy— 
    If we peel away the Vibrating Field of Energy we have the layer of Pure Energy;
    4. Consciousness
    Above the layer of Pure Energy is Consciousness; and

    5. The Source— 
    The highest layer is Supra Consciousness, which is the Source of Everything.
    Layers of Awareness Within Consciousness
    Consciousness also has layers: the layers of awareness depend on their frequency of vibration. Those layers are defined in this order:
    1. Action-and-Reaction— 
    This is the lowest layer of awareness. This is, in fact, the layer of physicality and physical objects.
    2. Stimulus Response — 
    This is the next layer above objects. We find this layer of awareness in cells. At this level, the cells are only aware of the group as a whole, a herd, and responding together with no realisation of self or others. For example, nests and hives have this level of consciousness.
    3. Stimulus Individual Response— 
    This layer has even more of an awareness than Stimulus Response. This amount of awareness is the recognition of individuals. In this layer of awareness, families are created through realising individuals inside the group or the herd. Monkeys and mammals similar to them have the awareness to develop families and demonstrate this layer of awareness.
    4. Discernment or Judgment Response— Stimulus response means reacting to things. To see similarities and differences between things, is a higher order of consciousness and it allows the ability to make decisions about things, rather than just reacting to stimuli. This is the power of discernment or judgment. The mind or ego is the tool that notices similarities and differences between things. Anything that has a mind or ego has this layer of awareness or consciousness within it.
    Consciousness Layers Reveal Significance
    Every layer of consciousness above the lesser ones, has the awareness of all the lower-level layers too. For example, when we are trapped in a traffic jam, we feel much like part of a herd. We are feeling the layer of Stimulus Response. Yet when we are feeling like we are an individual, there is an extreme difference in our awareness. This understanding of higher levels of consciousness is very important. The importance lies within the fact that Consciousness is the Source of Everything and we have the awareness of every layer of consciousness in existence. This is significant because it allows us the privilege to attract and to attach to any energy field. We have the ability to vibrate at different frequencies and manifest different energy fields into matter depending on our level of consciousness.
    Conclusion
    Using the principle of resonance, we can actually increase the speed at which the energy field of our bodies and minds vibrates, through higher frequency thoughts of love, joy, romance and gratitude and accessing even higher consciousness states via meditation. Modern science demonstrates that when pure energy slows down, lower dimensional matter is created; conversely when the vibration field speeds up, the higher dimensions of consciousness can be accessed. 
    And the higher our consciousness is raised, the closer to the Source we become, perhaps in a similar fashion to quantum entanglement of distant photons acting as one. “In the beginning was the Word”, the primordial vibration of sound and light, which according to thousands of years old Vedic science resonates to different phonetic sounds at different levels including the original “Aum” and different quantum entangled frequencies of light stimuli at differing energy vortices or chakras. 
    Are love and romance any different from quantum entanglement? Probably not!

    0 0


    The Value of Data Platform-as-a-Service (dPaaS)









    Data Platform-as-a-Service (dPaaS) represents a new approach to efficiently blend people, processes and technologies.  A customizable dPaaS with unified integration and data management enables organizations to harness the value of their data assets to improve decision outcomes and operating performance.
    dPaaS provides enterprise-class scalability enabling users to work with rapidly-growing and increasingly complex data sets, including big data.  Users have the flexibility to deploy any analytics tool on top of the platform to facilitate analyses in different environments and scenarios.  The platform provides data stewards full transparency and control over data to ensure adherence with GRC (governance, regulatory, compliance) programs.
    dPaaS allows enterprises to reduce the burden of maintenance requirements for hardware and software.  Companies can shift IT budgets from capex to more predictable opex, while freeing up IT teams to work on higher-return projects using market-leading technologies in collaboration with business units.
    More Data Exacerbates Bottlenecks
    Integration and analytics are the top two technologies companies are investing in as they seek to integrate big data with traditional data in their business intelligence (BI) and analytics platforms.  Their goal is to make better decisions faster to build customer loyalty, strengthen competitiveness and achieve return on investment (ROI) and risk management objectives.
    Yet Tech-Tonics estimates that 75%-80% of BI project time and spending is consumed by preparing data for analysis.  Data integration projects alone account for approximately 25% of IT budgets.  This is the result of increased cloud and mobile apps, rapid growth of new data sources and formats, fragmentation caused by departmental data silos and ongoing merger and acquisition activity.
    Despite this investment, 83% of data integration projects fail to meet ROI expectations.  Many projects still get bogged down by a high degree of manual coding that is inefficient and often not documented.  IT teams are backlogged with data integration work, including updating and fixing older projects.
    The cost of bad data is high.  Operational inefficiency, transaction losses, fines for non-compliance and lawsuits stemming from bad data that drive erroneous assumptions and models cost U.S. companies $600 billion a year.
    The sheer volume and complexity of big data only exacerbates the workflow bottlenecks caused by a lack of decision-ready data.  Traditional practices for discovering, integrating, managing and governing data have become overburdened or incapable of handling semi-structured or unstructured data.  But despite advances in technologies to collect, store, process and analyze data, most end-users still struggle to locate the data they need when they need it to allow for more accurate, efficient and timely models and decision-making.
    Data Platform-as-a-Service: A New Approach to Better Decision Outcomes
    Companies implementing dPaaS can significantly improve success rates and return on data assets (RDA) by allowing enterprises to expand the scope of integration projects and manage larger data sets more efficiently to better leverage their BI investments.
    dPaaS promotes a data first strategy for BI initiatives.  Data is integrated from multiple sources, harmonized in a consistent state and then managed to end-user requirements.  The ability to quickly and easily connect to applications and data sources is critical in handling big data, as well as rapidly integrating new applications.  The context end-users gain shortens the path to finding patterns and relationships during data analysis, resulting in faster and more actionable insights.
    dPaaS helps streamline the complexity of matching, cleaning and preparing all data for analysis.  Data cleansing tools and a specialized matching engine helps find and fix data quality issues.  A registry of all corporate data sources maps data to its location, applications and owners.  This consistent set of master data – or “golden record” – provides a common point of reference.  Versions and hierarchies are maintained to ensure that data remains in sync at all times.
    A single, consistent set of data policies and processes also helps overcome the challenges posed by data silos across the organization.  dPaaS facilitates integrating big data with traditional enterprise sources, such as transactional and operational databases, data warehouses, CRM, SCM and ERP systems.  Interactions between applications that use the data, as well as underlying systems can be monitored to alert for performance issues and user experience.  dPaaS also ensures security best practices with stringent policy, procedure and process controls. 
    A company’s data assets only have value when they can be accessed and used appropriately by employees and customers, and the underlying business processes that support them.  A strong data governance program supported by dPaaS can serve as the foundation for corporate data strategy.  Reducing costs, enhancing IT productivity and enabling faster time-to-value through improved decision-making all make dPaaS a compelling value proposition for enterprises. 


    0 0
  • 04/27/15--08:06: What is a data lake? 04-27
  • What is a data lake?






    You’ve probably heard of data warehousing, but now there’s a newer phrase doing the rounds, and it’s one you’re likely to hear more in the future if you’re involved in big data: ‘Data Lakes’.
    So what are they? Well, the best way to describe them is to compare them to data warehouses, because the difference is very much the same as between storing something in a warehouse and storing something in a lake.
    In a warehouse, everything is archived and ordered in a defined way – the products are inside containers, the containers on shelves, the shelves are in rows, and so on. This is the way that data is stored in a traditional data warehouse.
    In a data lake, everything is just poured in, in an unstructured way. A molecule of water in the lake is equal to any other molecule and can be moved to any part of the lake where it will feel equally at home.
    This means that data in a lake has a great deal of agility – another word which is becoming more frequently used these days – in that it can be configured or reconfigured as necessary, depending on the job you want to do with it.
    A data lake contains data in its rawest form – fresh from capture, and unadulterated by processing or analysis.
    It uses what is known as object-based storage, because each individual piece of data is treated as an object, made up of the information itself packaged together with its associated metadata, and a unique identifier.
    No piece of information is “higher-level” than any other, because it is not a hierarchically archived system, like a warehouse – it is basically a big free-for-all, as water molecules exist in a lake.
    The term is thought to have first been used by Pentaho CTO James Dixon in 2011, who didn’t invent the concept but gave a name to the type of innovative data architecture solutions being put to use by companies such as Google and Facebook.
    It didn’t take long for the name to make it into marketing material. Pivotal refer to their product as a “business data lake” and Hortonworks include it in the name of their service, Hortonworks Datalakes.
    It is a practice which is expected to become more popular in the future, as more organizations become aware of the increased agility afforded by storing data in data lakes rather than strict hierarchical databases.
    For example, the way that data is stored in a database (its “schema”) is often defined in the early days of the design of a data strategy.  The needs and priorities of the organization may well change as time goes on.
    One way of thinking about it is that data stored without structure can be more quickly shaped into whatever form it is needed, than if you first have to disassemble the previous structure before reassembling it.
    Another advantage is that the data is available to anyone in the organization, and can be analyzed and interrogated via different tools and interfaces as appropriate for each job.
    It also means that all of an organization’s data is kept in one place – rather than having separate data stores for individual departments or applications, as is often the case.
    This brings its own advantages and disadvantages – on the one hand, it makes auditing and compliancy simpler, with only one store to manage. On the other, there are obvious security implications if you’re keeping “all your eggs in one basket”.
    Data lakes are usually built within the Hadoop framework, as the datasets they are comprised of are “big” and need the volume of storage offered by distributed systems.
    A lot of it is theoretical at the moment because there are very few organizations which are ready to make the move to keeping all of their data in a lake. Many are bogged down in a “data swamp” – hard-to-navigate mishmashes of land and water where their data has been stored in various, uncoordinated ways over the years.
    And it has its critics of course – some say that the name itself is a problem (and I am inclined to agree) as it implies a lack of architectural awareness, when a more careful consideration of data architecture is what’s really needed when designing new solutions.
    But for better or worse, it is a term that you will probably be hearing more of in the near future if you’re involved in big data and business intelligence.
    Are you ready to dive head first into the data lake or do you prefer to keep your data high and dry?

    0 0

    How-to: Tune Your Apache Spark Jobs (Part 1)


    Learn techniques for tuning your Apache Spark jobs for optimal efficiency.

    When you write Apache Spark code and page through the public APIs, you come across words like transformation,action, and RDD. Understanding Spark at this level is vital for writing Spark programs. Similarly, when things start to fail, or when you venture into the web UI to try to understand why your application is taking so long, you’re confronted with a new vocabulary of words like jobstage, and task. Understanding Spark at this level is vital for writing goodSpark programs, and of course by good, I mean fast. To write a Spark program that will execute efficiently, it is very, very helpful to understand Spark’s underlying execution model.
    In this post, you’ll learn the basics of how Spark programs are actually executed on a cluster. Then, you’ll get some practical recommendations about what Spark’s execution model means for writing efficient programs.

    How Spark Executes Your Program


    A Spark application consists of a single driver process and a set of executor processes scattered across nodes on the cluster.

    The driver is the process that is in charge of the high-level control flow of work that needs to be done. The executor processes are responsible for executing this work, in the form of tasks, as well as for storing any data that the user chooses to cache. Both the driver and the executors typically stick around for the entire time the application is running, although dynamic resource allocation changes that for the latter. A single executor has a number of slots for running tasks, and will run many concurrently throughout its lifetime. Deploying these processes on the cluster is up to the cluster manager in use (YARN, Mesos, or Spark Standalone), but the driver and executor themselves exist in every Spark application.

    At the top of the execution hierarchy are jobs. Invoking an action inside a Spark application triggers the launch of a Spark job to fulfill it. To decide what this job looks like, Spark examines the graph of RDDs on which that action depends and formulates an execution plan. This plan starts with the farthest-back RDDs—that is, those that depend on no other RDDs or reference already-cached data–and culminates in the final RDD required to produce the action’s results.
    The execution plan consists of assembling the job’s transformations into stages. A stage corresponds to a collection of tasks that all execute the same code, each on a different subset of the data. Each stage contains a sequence of transformations that can be completed without shuffling the full data.
    What determines whether data needs to be shuffled? Recall that an RDD comprises a fixed number of partitions, each of which comprises a number of records. For the RDDs returned by so-called narrow transformations like map and filter, the records required to compute the records in a single partition reside in a single partition in the parent RDD. Each object is only dependent on a single object in the parent. Operations like coalesce can result in a task processing multiple input partitions, but the transformation is still considered narrow because the input records used to compute any single output record can still only reside in a limited subset of the partitions.
    However, Spark also supports transformations with wide dependencies such as groupByKey and reduceByKey. In these dependencies, the data required to compute the records in a single partition may reside in many partitions of the parent RDD. All of the tuples with the same key must end up in the same partition, processed by the same task. To satisfy these operations, Spark must execute a shuffle, which transfers data around the cluster and results in a new stage with a new set of partitions.
    For example, consider the following code:

    It executes a single action, which depends on a sequence of transformations on an RDD derived from a text file. This code would execute in a single stage, because none of the outputs of these three operations depend on data that can come from different partitions than their inputs.
    In contrast, this code finds how many times each character appears in all the words that appear more than 1,000 times in a text file.

    This process would break down into three stages. The reduceByKey operations result in stage boundaries, because computing their outputs requires repartitioning the data by keys.
    Here is a more complicated transformation graph including a join transformation with multiple dependencies.
    The pink boxes show the resulting stage graph used to execute it.
    At each stage boundary, data is written to disk by tasks in the parent stages and then fetched over the network by tasks in the child stage. Because they incur heavy disk and network I/O, stage boundaries can be expensive and should be avoided when possible. The number of data partitions in the parent stage may be different than the number of partitions in the child stage. Transformations that may trigger a stage boundary typically accept a numPartitionsargument that determines how many partitions to split the data into in the child stage.
    Just as the number of reducers is an important parameter in tuning MapReduce jobs, tuning the number of partitions at stage boundaries can often make or break an application’s performance. We’ll delve deeper into how to tune this number in a later section.

    Picking the Right Operators


    When trying to accomplish something with Spark, a developer can usually choose from many arrangements of actions and transformations that will produce the same results. However, not all these arrangements will result in the same performance: avoiding common pitfalls and picking the right arrangement can make a world of difference in an application’s performance. A few rules and insights will help you orient yourself when these choices come up.
    Recent work in SPARK-5097 began stabilizing SchemaRDD, which will open up Spark’s Catalyst optimizer to programmers using Spark’s core APIs, allowing Spark to make some higher-level choices about which operators to use. When SchemaRDD becomes a stable component, users will be shielded from needing to make some of these decisions.
    The primary goal when choosing an arrangement of operators is to reduce the number of shuffles and the amount of data shuffled. This is because shuffles are fairly expensive operations; all shuffle data must be written to disk and then transferred over the network. repartition , joincogroup, and any of the *By or *ByKey transformations can result in shuffles. Not all these operations are equal, however, and a few of the most common performance pitfalls for novice Spark developers arise from picking the wrong one:

    • Avoid groupByKey when performing an associative reductive operation. For example,rdd.groupByKey().mapValues(_.sum) will produce the same results as rdd.reduceByKey(_ + _). However, the former will transfer the entire dataset across the network, while the latter will compute local sums for each key in each partition and combine those local sums into larger sums after shuffling.


    • Avoid reduceByKey When the input and output value types are different. For example, consider writing a transformation that finds all the unique strings corresponding to each key. One way would be to use map to transform each element into a Set and then combine the Sets with reduceByKey:
      This code results in tons of unnecessary object creation because a new set must be allocated for each record. It’s better to use aggregateByKey, which performs the map-side aggregation more efficiently:
    • Avoid the flatMap-join-groupBy pattern. When two datasets are already grouped by key and you want to join them and keep them grouped, you can just use cogroup. That avoids all the overhead associated with unpacking and repacking the groups.

    When Shuffles Don’t Happen

    It’s also useful to be aware of the cases in which the above transformations will not result in shuffles. Spark knows to avoid a shuffle when a previous transformation has already partitioned the data according to the same partitioner. Consider the following flow:

    Because no partitioner is passed to reduceByKey, the default partitioner will be used, resulting in rdd1 and rdd2 both hash-partitioned. These two reduceByKeys will result in two shuffles. If the RDDs have the same number of partitions, the join will require no additional shuffling. Because the RDDs are partitioned identically, the set of keys in any single partition of rdd1 can only show up in a single partition of rdd2. Therefore, the contents of any single output partition of rdd3 will depend only on the contents of a single partition in rdd1 and single partition in rdd2, and a third shuffle is not required.
    For example, if someRdd has four partitions, someOtherRdd has two partitions, and both the reduceByKeys use three partitions, the set of tasks that execute would look like:
    What if rdd1 and rdd2 use different partitioners or use the default (hash) partitioner with different numbers partitions?  In that case, only one of the rdds (the one with the fewer number of partitions) will need to be reshuffled for the join.
    Same transformations, same inputs, different number of partitions:
    One way to avoid shuffles when joining two datasets is to take advantage of broadcast variables. When one of the datasets is small enough to fit in memory in a single executor, it can be loaded into a hash table on the driver and then broadcast to every executor. A map transformation can then reference the hash table to do lookups.

    When More Shuffles are Better

    There is an occasional exception to the rule of minimizing the number of shuffles. An extra shuffle can be advantageous to performance when it increases parallelism. For example, if your data arrives in a few large unsplittable files, the partitioning dictated by the InputFormat might place large numbers of records in each partition, while not generating enough partitions to take advantage of all the available cores. In this case, invoking repartition with a high number of partitions (which will trigger a shuffle) after loading the data will allow the operations that come after it to leverage more of the cluster’s CPU.
    Another instance of this exception can arise when using the reduce or aggregate action to aggregate data into the driver. When aggregating over a high number of partitions, the computation can quickly become bottlenecked on a single thread in the driver merging all the results together. To loosen the load on the driver, one can first usereduceByKey or aggregateByKey to carry out a round of distributed aggregation that divides the dataset into a smaller number of partitions. The values within each partition are merged with each other in parallel, before sending their results to the driver for a final round of aggregation. Take a look at treeReduce and treeAggregate for examples of how to do that. (Note that in 1.2, the most recent version at the time of this writing, these are marked as developer APIs, but SPARK-5430 seeks to add stable versions of them in core.)
    This trick is especially useful when the aggregation is already grouped by a key. For example, consider an app that wants to count the occurrences of each word in a corpus and pull the results into the driver as a map.  One approach, which can be accomplished with the aggregate action, is to compute a local map at each partition and then merge the maps at the driver. The alternative approach, which can be accomplished with aggregateByKey, is to perform the count in a fully distributed way, and then simply collectAsMap the results to the driver.

    Secondary Sort

    Another important capability to be aware of is 
    the repartitionAndSortWithinPartitions transformation. It’s a transformation that sounds arcane, but seems to come up in all sorts of strange situations. This transformation pushes sorting down into the shuffle machinery, where large amounts of data can be spilled efficiently and sorting can be combined with other operations.
    For example, Apache Hive on Spark uses this transformation inside its join implementation. It also acts as a vital building block in the secondary sort pattern, in which you want to both group records by key and then, when iterating over the values that correspond to a key, have them show up in a particular order. This issue comes up in algorithms that need to group events by user and then analyze the events for each user based on the order they occurred in time. 
    Taking advantage of repartitionAndSortWithinPartitions to do secondary sort currently requires a bit of legwork on the part of the user, but SPARK-3655 will simplify things vastly.

    Conclusion

    You should now have a good understanding of the basic factors in involved in creating a performance-efficient Spark program! In Part 2, we’ll cover tuning resource requests, parallelism, and data structures.


    0 0

    How-to: Tune Your Apache Spark Jobs (Part 2)


    In the conclusion to this series, learn how resource tuning, parallelism, and data representation affect Spark job performance.
    In this post, we’ll finish what we started in “How to Tune Your Apache Spark Jobs (Part 1)”. I’ll try to cover pretty much everything you could care to know about making a Spark program run fast. In particular, you’ll learn about resource tuning, or configuring Spark to take advantage of everything the cluster has to offer. Then we’ll move to tuning parallelism, the most difficult as well as most important parameter in job performance. Finally, you’ll learn about representing the data itself, in the on-disk form which Spark will read (spoiler alert: use Apache Avro or Apache Parquet) as well as the in-memory format it takes as it’s cached or moves through the system.

    Tuning Resource Allocation

    The Spark user list is a litany of questions to the effect of “I have a 500-node cluster, but when I run my application, I see only two tasks executing at a time. HALP.” Given the number of parameters that control Spark’s resource utilization, these questions aren’t unfair, but in this section you’ll learn how to squeeze every last bit of juice out of your cluster. The recommendations and configurations here differ a little bit between Spark’s cluster managers (YARN, Mesos, and Spark Standalone), but we’re going to focus only on YARN, which Cloudera recommends to all users.
    The two main resources that Spark (and YARN) think about are CPU and memory. Disk and network I/O, of course, play a part in Spark performance as well, but neither Spark nor YARN currently do anything to actively manage them.
    Every Spark executor in an application has the same fixed number of cores and same fixed heap size. The number of cores can be specified with the --executor-cores flag when invoking spark-submit, spark-shell, and pyspark from the command line, or by setting the spark.executor.cores property in the spark-defaults.conf file or on aSparkConf object. Similarly, the heap size can be controlled with the --executor-cores flag or thespark.executor.memory property. The cores property controls the number of concurrent tasks an executor can run. --executor-cores 5 means that each executor can run a maximum of five tasks at the same time. The memory property impacts the amount of data Spark can cache, as well as the maximum sizes of the shuffle data structures used for grouping, aggregations, and joins.
    The --num-executors command-line flag or spark.executor.instances configuration property control the number of executors requested. Starting in CDH 5.4/Spark 1.3, you will be able to avoid setting this property by turning ondynamic allocation with the spark.dynamicAllocation.enabled property. Dynamic allocation enables a Spark application to request executors when there is a backlog of pending tasks and free up executors when idle.
    It’s also important to think about how the resources requested by Spark will fit into what YARN has available. The relevant YARN properties are:
    • yarn.nodemanager.resource.memory-mb controls the maximum sum of memory used by the containers on each node.
    • yarn.nodemanager.resource.cpu-vcores controls the maximum sum of cores used by the containers on each node.
    Asking for five executor cores will result in a request to YARN for five virtual cores. The memory requested from YARN is a little more complex for a couple reasons: 
    • --executor-memory/spark.executor.memory controls the executor heap size, but JVMs can also use some memory off heap, for example for interned Strings and direct byte buffers. The value of thespark.yarn.executor.memoryOverhead property is added to the executor memory to determine the full memory request to YARN for each executor. It defaults to max(384, .07 * spark.executor.memory).
    • YARN may round the requested memory up a little. YARN’s yarn.scheduler.minimum-allocation-mb andyarn.scheduler.increment-allocation-mb properties control the minimum and increment request values respectively.
    The following (not to scale with defaults) shows the hierarchy of memory properties in Spark and YARN:
    And if that weren’t enough to think about, a few final concerns when sizing Spark executors:
    • The application master, which is a non-executor container with the special capability of requesting containers from YARN, takes up resources of its own that must be budgeted in. In yarn-client mode, it defaults to a 1024MB and one vcore. In yarn-cluster mode, the application master runs the driver, so it’s often useful to bolster its resources with the --driver-memory and --driver-cores properties.
    • Running executors with too much memory often results in excessive garbage collection delays. 64GB is a rough guess at a good upper limit for a single executor.
    • I’ve noticed that the HDFS client has trouble with tons of concurrent threads. A rough guess is that at most five tasks per executor can achieve full write throughput, so it’s good to keep the number of cores per executor below that number.
    • Running tiny executors (with a single core and just enough memory needed to run a single task, for example) throws away the benefits that come from running multiple tasks in a single JVM. For example, broadcast variables need to be replicated once on each executor, so many small executors will result in many more copies of the data.
    To hopefully make all of this a little more concrete, here’s a worked example of configuring a Spark app to use as much of the cluster as possible: Imagine a cluster with six nodes running NodeManagers, each equipped with 16 cores and 64GB of memory. The NodeManager capacities, yarn.nodemanager.resource.memory-mb andyarn.nodemanager.resource.cpu-vcores, should probably be set to 63 * 1024 = 64512 (megabytes) and 15 respectively. We avoid allocating 100% of the resources to YARN containers because the node needs some resources to run the OS and Hadoop daemons. In this case, we leave a gigabyte and a core for these system processes. Cloudera Manager helps by accounting for these and configuring these YARN properties automatically.
    The likely first impulse would be to use --num-executors 6 --executor-cores 15 --executor-memory 63G. However, this is the wrong approach because:
    • 63GB + the executor memory overhead won’t fit within the 63GB capacity of the NodeManagers.
    • The application master will take up a core on one of the nodes, meaning that there won’t be room for a 15-core executor on that node.
    • 15 cores per executor can lead to bad HDFS I/O throughput.
    A better option would be to use --num-executors 17 --executor-cores 5 --executor-memory 19G. Why?
    • This config results in three executors on all nodes except for the one with the AM, which will have two executors.
    • --executor-memory was derived as (63/3 executors per node) = 21.  21 * 0.07 = 1.47.  21 – 1.47 ~ 19.

    Tuning Parallelism

    Spark, as you have likely figured out by this point, is a parallel processing engine. What is maybe less obvious is that Spark is not a “magic” parallel processing engine, and is limited in its ability to figure out the optimal amount of parallelism. Every Spark stage has a number of tasks, each of which processes data sequentially. In tuning Spark jobs, this number is probably the single most important parameter in determining performance.
    How is this number determined? The way Spark groups RDDs into stages is described in the previous post. (As a quick reminder, transformations like repartition and reduceByKey induce stage boundaries.) The number of tasks in a stage is the same as the number of partitions in the last RDD in the stage. The number of partitions in an RDD is the same as the number of partitions in the RDD on which it depends, with a couple exceptions: thecoalescetransformation allows creating an RDD with fewer partitions than its parent RDD, the union transformation creates an RDD with the sum of its parents’ number of partitions, and cartesian creates an RDD with their product.
    What about RDDs with no parents? RDDs produced by textFile or hadoopFile have their partitions determined by the underlying MapReduce InputFormat that’s used. Typically there will be a partition for each HDFS block being read. Partitions for RDDs produced by parallelize come from the parameter given by the user, orspark.default.parallelism if none is given.
    To determine the number of partitions in an RDD, you can always call rdd.partitions().size().
    The primary concern is that the number of tasks will be too small. If there are fewer tasks than slots available to run them in, the stage won’t be taking advantage of all the CPU available. 
    A small number of tasks also mean that more memory pressure is placed on any aggregation operations that occur in each task. Any joincogroup, or *ByKey operation involves holding objects in hashmaps or in-memory buffers to group or sort. joincogroup, and groupByKey use these data structures in the tasks for the stages that are on the fetching side of the shuffles they trigger. reduceByKey and aggregateByKey use data structures in the tasks for the stages on both sides of the shuffles they trigger.
    When the records destined for these aggregation operations do not easily fit in memory, some mayhem can ensue. First, holding many records in these data structures puts pressure on garbage collection, which can lead to pauses down the line. Second, when the records do not fit in memory, Spark will spill them to disk, which causes disk I/O and sorting. This overhead during large shuffles is probably the number one cause of job stalls I have seen at Cloudera customers.
    So how do you increase the number of partitions? If the stage in question is reading from Hadoop, your options are:
    • Use the repartition transformation, which will trigger a shuffle.
    • Configure your InputFormat to create more splits.
    • Write the input data out to HDFS with a smaller block size.
    If the stage is getting its input from another stage, the transformation that triggered the stage boundary will accept anumPartitions argument, such as
    What should “X” be? The most straightforward way to tune the number of partitions is experimentation: Look at the number of partitions in the parent RDD and then keep multiplying that by 1.5 until performance stops improving. 
    There is also a more principled way of calculating X, but it’s difficult to apply a priori because some of the quantities are difficult to calculate. I’m including it here not because it’s recommended for daily use, but because it helps with understanding what’s going on. The main goal is to run enough tasks so that the data destined for each task fits in the memory available to that task.
    The memory available to each task is (spark.executor.memory * spark.shuffle.memoryFraction *spark.shuffle.safetyFraction)/spark.executor.cores. Memory fraction and safety fraction default to 0.2 and 0.8 respectively.
    The in-memory size of the total shuffle data is harder to determine. The closest heuristic is to find the ratio between Shuffle Spill (Memory) metric and the Shuffle Spill (Disk) for a stage that ran. Then multiply the total shuffle write by this number. However, this can be somewhat compounded if the stage is doing a reduction:
    Then round up a bit because too many partitions is usually better than too few partitions.
    In fact, when in doubt, it’s almost always better to err on the side of a larger number of tasks (and thus partitions). This advice is in contrast to recommendations for MapReduce, which requires you to be more conservative with the number of tasks. The difference stems from the fact that MapReduce has a high startup overhead for tasks, while Spark does not.

    Slimming Down Your Data Structures

    Data flows through Spark in the form of records. A record has two representations: a deserialized Java object representation and a serialized binary representation. In general, Spark uses the deserialized representation for records in memory and the serialized representation for records stored on disk or being transferred over the network. There is work planned to store some in-memory shuffle data in serialized form.
    The spark.serializer property controls the serializer that’s used to convert between these two representations. The Kryo serializer, org.apache.spark.serializer.KryoSerializer, is the preferred option. It is unfortunately not the default, because of some instabilities in Kryo during earlier versions of Spark and a desire not to break compatibility, but the Kryo serializer should always be used
    The footprint of your records in these two representations has a massive impact on Spark performance. It’s worthwhile to review the data types that get passed around and look for places to trim some fat.
    Bloated deserialized objects will result in Spark spilling data to disk more often and reduce the number of deserialized records Spark can cache (e.g. at the MEMORY storage level). The Spark tuning guide has a great section on slimming these down.
    Bloated serialized objects will result in greater disk and network I/O, as well as reduce the number of serialized records Spark can cache (e.g. at the MEMORY_SER storage level.)  The main action item here is to make sure to register any custom classes you define and pass around using the SparkConf#registerKryoClasses API.

    Data Formats

    Whenever you have the power to make the decision about how data is stored on disk, use an extensible binary format like Avro, Parquet, Thrift, or Protobuf. Pick one of these formats and stick to it. To be clear, when one talks about using Avro, Thrift, or Protobuf on Hadoop, they mean that each record is a Avro/Thrift/Protobuf struct stored in asequence file. JSON is just not worth it. 
    Every time you consider storing lots of data in JSON, think about the conflicts that will be started in the Middle East, the beautiful rivers that will be dammed in Canada, or the radioactive fallout from the nuclear plants that will be built in the American heartland to power the CPU cycles spent parsing your files over and over and over again. Also, try to learn people skills so that you can convince your peers and superiors to do this, too.


    0 0

    Transforming an Analog Company into a Digital Company: The Case of BBVA(Banco Bilbao Vizcaya Argentaria) 





    In the past few years we have witnessed the far-reaching effects of the ongoing technological revolution on the ways we do business. Individual industries and whole sectors have been transformed; companies seemingly conjured from thin air swiftly rise to the top of their fields, joining the ranks of the world’s most valuable businesses. Conversely, long-established industry names fall into decay or disappear altogether.
    This book provides a glimpse of the severe shocks that today’s companies are called on to withstand. Technology has changed, shifting the boundaries of production and distribution possibilities. Customers have changed, as have their requirements and the ways in which we reach them. Employees have changed, and their skills and motivation are now different. Change also takes place in organizational structures, decision-making models and forms of leadership, to meet the challenges of today and face those of tomorrow: technological progress and social development never stop, creating new uncertainties on the horizon of the business world.
    These processes of transformation are all the more far-reaching, swift and radical in information-rich domains, such as the media, culture, and entertainment. Banking has changed, too. But despite being an information-rich activity—the “raw materials” of financial services are money and information—banking has changed a lot less than other industries. Money is readily digitized: when it takes the form of electronic book entries, it becomes information that can be processed and transferred in an instant.
    Various reasons have been suggested to explain why banking has changed relatively little. First, the industry is subject to heavy regulation and government intervention. This discourages potential new entrants, so incumbent banks feel less pressure to change. Another factor often pointed to is average user age, which is higher than that seen in other industries—such as music. What’s more, most people take a conservative approach to their finances. And it may well be that the rapid growth and high earnings of the financial services industry in the years leading up to the downturn nurtured complacency and inefficiencies which in other sectors would have proved fatal.
    But all this is changing. In fact, it already has changed. After the downturn, the financial services industry finds itself in an entirely new landscape. Laws and regulations are a lot tougher in the fields of consumer protection, good practice requirements, control, and capital ratios. This means thinner margins, higher costs, and lower earnings. In addition, users are now more demanding—they want improved transparency, cheaper prices and higher service quality.
    Only a major effort of transformation will enable banks to return to profit figures capable of assuring medium- and long-term survival, and, by offering a wider, improved range of services at competitive prices, to restore their tarnished reputations in the eyes of customers and society at large.
    This transformation is increasingly urgent for two powerful reasons. First, customers are changing swiftly; secondly, new competitors are stepping onto the stage.
    A whole generation of customers have grown up with the internet—they make intensive use of social media and live in a “digital mode.” The “millennial” generation—also known as Generation Y—are now aged 25 to 40. They are approaching the peak of their professional development and making major financial decisions. By 2020, “millennials” will account for a third of the population of the United States and 75% of the workforce. 90% of them deal with their banks exclusively online, and half of them do so using their smartphones.
    Over 70% of millennials say they would be happy to pay for banking products and services provided by non-banking companies—for example, telecommunications operators, technology and internet providers, online retailers. These percentages exceed 50% even among earlier generations—aged up to 55 years.1
    What this means is that banks are losing their monopoly over people’s financial trust. And later generations—like “Generation Z,” born in or after the 1990s—will no doubt bring still greater developments which are yet to be discovered.
    The United States is in most respects at the forefront of these changes, but the trend is global. It is not only in developed countries where we can see this shift. In developing countries, too, the more affluent customers are following the same pattern. What’s more, technology is making it possible to offer financial products and services to a poorer, more scattered population which conventional banks are unable to cater for at affordable prices. This potential market encompasses up to two billion new customers.
    Change is opening up opportunities that foster the rise of a new league of competitors—mostly but not exclusively spilling over from the digital world. These new entrants can be far more efficient and agile than banks, because they are not burdened with inefficient, rigid and largely obsolete technologies or expensive brick-and-mortar distribution networks.

    From Analog to Digital: Towards Knowledge-Driven Banking

    Today banks must face a tough climate: tighter margins; overcapacity; tarnished reputations; and the pressure of new high-tech competitors who can move flexibly, unburdened by cost legacies.
    But banks do enjoy a key competitive advantage: the huge mass of information they already have about their customers. The challenge is to turn that information into knowledge, and use the knowledge to give customers what they want.
    It need hardly be said that the first thing customers want is better, quicker service on transparent terms and at an affordable price, in keeping with their own individual needs.
    One of the implications is that customers should be able to interact fully with their bank, at any time and at any place, using their mobile devices. Today, there are 5 billion mobile phones in the world but only 1.2 billion bank customers. And mobile devices support an ever-increasing range of functionalities. Mobile data traffic now stands at more than 2.5 exabytes per month, and will almost treble every two years.2
    This means on the one hand that the role of bank branches has radically changed; on the other, the potential scope of the banking market has widened immensely.
    Facing new competitors who can move flexibly, unburdened by cost legacies, banks do enjoy a key competitive advantage: the huge mass of information they already have about their customers.
    In the years to come the mobile phone will win a far greater share of interactions with banks. Technological progress—APIs, cloud computing—and increased investment in mobile banking development (now standing at about $2 billion a year in terms of venture capital alone, plus the heavy internal investment of banks themselves) will lead to a powerful rise in the operational features of mobile devices and in the range and complexity of financial transactions they will support.
    Nevertheless, many people still want to deal with their bank by other means: branch offices, ATMs, computers, conventional telephones, and an increasing number of “smart” devices. So banks need to offer their customers a genuinely “omnichannel” experience. The same value proposal, the same service, must be available at any time by any channel, and you should be able to switch from one channel to another instantly and seamlessly.
    And of course customers will increasingly want their bank to offer content carrying higher value-added—products and services that fit their needs more closely.
    To meet these demands, banks must develop a new knowledge-based business model for the digital world.
    According to Peter Weill, chair of the MIT Sloan Center for Information Systems Research (CISR), 3 the new digital model has three mainstays: first, content, the things being sold; secondly, customer experience—how the product or service is presented and used; and, thirdly, the technology platform, which shapes production and distribution.
    I like to explain the construction of this new model by analogy to building a house. The technology platform is the foundation, while internal processes, organizational structures, and corporate culture are the various floors, including the installations (insulation, electricity, heating, plumbing, etc). Finally, the channels by which customers interact with the bank are the roof of the house. All these elements together make the house comfortable and safe. They let us offer the customer a good product and a satisfying experience.
    For many banks, the technology platform is a limiting factor and a nearly insurmountable challenge. Most bank platforms were designed and built in the 1960s and 70s. Professor Weill calls them “spaghetti platforms,” because of the complexity of the connections resulting from several decades of add-ons, tweaks, and repairs.
    This is why so many banks have tried to meet the digital challenge by building their “house” from the roof down, that is, starting with the channels. But that’s a stopgap solution. Without strong foundations, the increased volume and sophistication of online banking will overburden the obsolete platforms and the house will ultimately collapse.
    This is because the improvement of content and customer experience to the standards customers want will call for systematic use of a vast volume of data.
    The issue is not limited to handling an increasing volume of transactions and customer interactions. It crucially hinges on the huge amount of data collected in the course of customer contact, combined with the immense and rapidly increasing volume of information available on the internet, largely supplied by people’s social media activity and devices within the “Internet of Things.” We must capture, store and accurately process all that information to generate the knowledge to offer customers the best possible experience, even anticipating their needs and supporting them throughout their decision-making process. This is what I call “knowledge-driven banking,” which is far superior to what we now refer to as “customer-focused banking”—which in its time was a very meaningful improvement on conventional “product-based banking.”
    Banks must take the lead in Big Data techniques if they are to make use of the competitive edge granted by their incumbent status. This can only be done with huge data-processing capabilities and a technological structure that fully and seamlessly integrates the knowledge thus generated with every customer channel and every point of contact.
    Such capabilities are still beyond the grasp of conventional banking platforms. Cloud computing, however, has created the possibility of enhancing them flexibly and efficiently. Many of the new entrants to the banking field will use cloud computing, and it can be an immensely useful tool for incumbent banks as well. But security concerns and regulatory and compliance requirements call for a very careful decision as to which data, transactions, and capabilities ought to remain on the bank’s proprietary systems. The bank, what’s more, must coordinate and integrate all cloud-based services. This highly complex task will be powerfully aided by a flexible and modern technology platform.
    Having said all this, the upgrading of technology, however necessary, is not the toughest challenge to which banks must rise. To succeed in the new digital world conventional banks must completely revamp their business model. We need to reinvent operations and processes, redefine organizational structures, undertake a revolution in approaches to work, and rethink the skills and talent we need our people to display. In short, we need a transformation in corporate culture, a complete reinvention of the business itself.
    Transforming a conventional “analog” bank into a new “digital” provider of knowledge-driven services can only be a protracted and complex process. We must keep up an ongoing tension of testing and reassessing what we have, trial and error, an unending search. That is to say, we can never stop innovating. Our approach to work must accordingly be far more agile and flexible, less hierarchical, rich in communication across divisions, more open and more collaborative. The new culture we are called on to develop must be compatible with keeping up, to its full extent, the operational pace of our present business, our relationship with our customers and all our stakeholders. The process might be compared to changing the tires of a truck while still in motion.
    Very few banks in the world have put themselves to the task with the necessary determination and depth. But our very survival is at stake. A new competitive landscape is taking shape in the financial services industry. A new ecosystem to which we must adapt.
    Any number of these projects focus on transactions—payments, transfers, financial asset sales—such as PayPal, Dwolla, Square, M-Pesa, Billtrust, Kantox, Traxpay, etc. Adjoining this field we find companies that offer alternative currencies, such as Bitcoin, Bitstamp, Xapo, BitPay, etc.
    Moving beyond the field of payments, initiatives are under way in other segments formerly monopolized by conventional banks: product and service selection advice (Bankrate, MoneySuperMarket, LendingTree, Credit Karma); personal finance management (Fintonic, Moven, MINT, etc.); investment and wealth management and advice (Betterment, Wealthfront, SigFig, Personal Capital, Nutmeg); crowdfunding capital and debt financing (Lending Club, Kickstarter, Crowdfunder, AngelList, etc.). Lending to individuals—so far thought of as the segment most resistant to disintermediation—is being addressed by the preapproved loans industry (Lending Club, Prosper, Kreditech, Lenddo and many others).
    Some companies are even trying to extract value from banking transaction data itself by providing customers with APIs to access their data, or directly supplying the tools for any business to manage its financial transactions on its own or for a bank to develop its digital offering (Bancbox, Open Bank Project, Plaid, etc.).
    The major online players (Google, Facebook, Amazon and Apple), leading telecoms companies and big retailers are taking a real interest in offering financial products to supplement their existing goods and services. There are several reasons for this. First, it enables them to offer their customers a fully rounded experience. Secondly, a financial relationship potentially entails multiple and recurring customer interactions through which a wealth of information can be extracted.
    These players can supply a broader range of financial products and services and, eventually, create a fully fledged banking offer. At the very least they can create “packages” that combine their own products and services with financial products and services. These are packages that conventional banks will be hard-pressed to replicate.
    Transforming a conventional “analog” bank into a new “digital” provider of knowledge-driven services can only be a protracted and complex process. We must keep up an ongoing tension of testing and reassessing what we have, trial and error, an unending search. We can never stay nnovating
    We are witnessing the emergence of start-ups that focus on single segments of the value chain. These new entrants use the latest technology and lean, flexible structures to offer highly specific products. They can do so at cheap prices and offering a great customer experience by dint of speed, agility, and intensive use of Big Data technologies.
    We are witnessing the disaggregation of the financial services industry, with a multitude of highly specialized competitors operating in different segments. What’s more, major players are likely to enter the market with wider product ranges. So the banking industry—clearly burdened by overcapacity and in need of a far-reaching process of consolidation—will see an influx of competitors who will put still more pressure on incumbent banks’ growth potential and bottom line.
    Those banks that let the challenge of transformation go unmet, or fail in the attempt, are doomed to disappear. This won’t happen straight away. The regulatory framework is still a formidable barrier to some areas of banking, and many customers remain attached to established practice.
    But these barriers will undergo an inexorable decline. Disintermediation in an ever-widening portion of the value chain will drive out the incumbent banks, leaving them with the heavily regulated areas only. Elsewhere they will be relegated to providing back-end tasks and mere infrastructure, at a distance from the end-customer.
    Yet those banks that successfully achieve transformation will leverage their knowledge of their customers to remain as the main point of contact, offering a wider and better range of services, whether sourced internally or through platforms where various specialist providers and customers themselves can interact. Only the platform owner will be able to integrate all the knowledge generated about end-customers to enhance their experience and widen and improve the available product range (whether produced by the owner itself or other suppliers which the owner admits to the platform).
    None of this is “economic fiction”—this ecosystem model is already a reality in the digital world. It is yet to reach the domain of banking in anything more than embryonic form,4 but its emergence is inevitable. At some point in the course of this process there will arise a competitive face-off between “digital” banks and the leading internet-based providers. The banks, using the financial information available to them supplemented by public sources, will seek to offer a wider and better range of financial and non-financial products. Their rivals will use their information about their users to offer financial services among others.
    Against a background of swift technological progress and the rise of all kinds of new competitors, it will be hard to tell who is making the right decisions and implementing them tenaciously and imaginatively. Customer behavior will ultimately cast its light on the landscape. Those perceived as leaders in digital transformation will earn better prospects of growth and profit, which in turn will win them the technical and financial capabilities to make best use of the process of consolidation: they will attract the best talent, bolster their reputation facing customers and suppliers, and capture a larger, wider and truly global market share.

    The Transformation of BBVA

    At BBVA we soon became aware of the depth and reach of the change faced by the banking industry when many still thought our field would be perpetually shielded by regulations and user conservatism.
    Seven years ago we undertook the task of rebuilding our technology platform from scratch. We entirely transformed our technology function so that we could at one and the same time keep existing systems in full operation and develop new systems in line with the latest technological advances. We doubled our systems investment from €1.2 billion in 2006 to €2.4 billion in 2013. A substantial change took place in the proportion of funds spent to keep systems operational (“run”) to funds invested in new development (“change”), moving from the industry standard of 80%/20% to a new standard of 60%/40%.
    After seven years of work, at BBVA we have achieved a state-of-the-art technology platform.5 As a result, while we processed 90 million transactions a day in 2006, we were able to process 250 million transactions a day by 2013. We estimate we will reach 1.2 to 1.4 billion transactions by 2020. At the same time, the new platform enables us to meet increasing security requirements. From 2006 to 2013, the number of attempted attacks on BBVA multiplied by a factor of 60. However, technology-driven fraud in 2013 was less than half what it was in 2010.
    In short, our technology platform is able to satisfy the requirements of data capture, storage and management, which are growing exponentially in step with our progress into the digital age. We are aware that this task can never be achieved completely. There will always be new and more complex demands. But we also believe we are ahead of our peers, and able to compete successfully with new digital entrants.
    Technology, though essential, is only a tool in the hands of our people to help them build a better experience for customers.
    What is needed here is a revolution in operations, processes, and organizational structures; a major shift in approaches to work and in the required skills and talent. This must signify a radical transformation of our corporate culture or, following our earlier analogy, the way in which we build the “floors” of our digital “house.”
    From the outset we identified a number of essential cultural traits that we had to encourage: agility, flexibility, the primacy of collaborative work, an entrepreneurial spirit, and support for innovation. Given this, it fell to us to advocate open innovation models as a way to overcome the limitations to which organizations are typically subject, and place the development of value proposals in the hands of the best talent—whether it be found in employees, customers, outside partners or any other of the company’s stakeholders.
    The cultural transformation is undoubtedly even harder to achieve than the technological one, because we lack any obvious model or benchmark. We have to work with the infinite complexities of people, social relations and pre-existing cultures.
    After seven years of work, at BBVA we have achieved a state-of-the-art technology platform. As a result, while we processed 90 million transactions a day in 2006, we were able to process 250 million transactions a day by 2013
    Over the past few years, at BBVA we have comprehensively re-engineered our processes in step with our technological overhaul. And we have promoted a change of culture. To spread the new culture and help it gradually permeate the entire organization, three approaches proved particularly useful to us.
    First, leadership and top-down role models. At every public or internal presentation, the senior management of the Group stressed the need to embrace change, encourage innovation, and engage in collaborative work. We had to lose our fear of failure—which can be a rich source of learning and a driver of creativity. This approach was coupled with a major effort of internal and external communication on the bank’s developments in the digital domain. We set up models to be followed by raising awareness of our strategies and forward steps in this field, and of the people undertaking them.
    Secondly, we leveraged our selection and training policies. We have invested more than €40 million a year in this area. The bank’s training division operates physical venues that serve as a powerful point of reference for all our employees and stakeholders, and enable us to share experiences and knowledge. Our training centers teach the key subjects involved in addressing change at BBVA (strategy, marketing, finance, technology, leadership) in partnership with external institutions that are at the forefront of their respective fields: e.g., London Business School, IBM, Center for Creative Leadership, Wharton, Harvard, IESE, IE, Boston College.
    Our greatest effort, however, was to build one of the most innovative e-learning platforms in the world. This system enables us to provide over 3 million hours of online education and training, involving more than 175,000 course-takers (i.e., an average of 1.7 courses were taken annually by each Group employee).
    New technologies (e-learning platform, use of mobile devices) and new learning approaches (newsletters, on-the-job learning, MOOCs) are becoming an increasingly important way of providing a range of flexible training options that everyone can access.
    In the field of selection our goal is to be present wherever the talent and knowledge we need is likely to emerge. We have relationships in place with the leading, cutting-edge business schools, such as those referred to earlier and others. Our key “brand” as an employer is aligned with the goal of being “the best—and first—digital bank in the world.” We are leveraging intensive use of social media (LinkedIn, Facebook, Twitter) to achieve global positioning in keeping with our requirements and expectations.
    Finally, we regard the new corporate buildings now under construction as powerful instruments to accelerate change. In the countries where we have a significant presence we are bringing together BBVA employees at new headquarters. Though originally driven by financial and efficiency -related criteria, this effort is now a lever in the service of achieving our transformation.
    Over the past few years, at BBVA we have comprehensively re-engineered our processes in step with our technological overhaul. And we have promoted a change of culture
    In this book the article produced by the BBVA New Headquarters Team6 provides a detailed account of how this project was implemented and what it was intended to achieve. Here I shall do no more than stress the point that our aim is to create a new work experience for the digital era. The design of this experience must be both global and focused on people, meeting their functional and emotional needs. Under the new approaches to work we aim to put in place the key vector is collaborative work as a way to bring forth collective intelligence and stimulate innovation. This requires simultaneous action in three distinct but interrelated realms: physical space, technology, and culture (behaviors). Given the interrelatedness of the three settings, the new headquarters are proving to be powerful drivers of behavioral change.
    Over these years BBVA’s transformation drive has already garnered meaningful results. The Group’s active digital customers in December 2011 numbered 5 million; by mid-2014, that figure had climbed to 8.4 million. Active customers using mobile technologies grew from 0.3 million to 3.6 million (i.e., they multiplied twelvefold). We have reconfigured our branch network, too. On the one hand, we created small “convenience” branches focusing on customer self-service; on the other, we operate larger branch offices where financial specialists can provide customers with personalized advice and higher value-added. These developments and the introduction of a system supporting remote personalized advice have enabled BBVA to raise the average office time spent on sales efforts from 38% to 45%, while the proportion of sales staff to total Group employees has risen from 28% to 38%, coupled with significant growth in our cross-selling success rate.
    We regard the new corporate buildings now under construction as powerful instruments to accelerate change
    We have also set in motion a highly ambitious Big Data project. After a period of getting things ready and attracting the right talent, the initiative is enjoying real success as to customer segmentation, improved credit risk scoring, and fraud reduction, among other areas.
    We have taken steps to encourage the emergence of an open innovation platform and community. Developers come together to present, critique and improve their ideas, and help one another develop new concepts and prototypes in a process of co-creation.
    BBVA is already launching new products designed and produced specially for the digital world, such as BBVA Wallet and Wizzo. The new products are a good “test bench” and have proved a powerful way of building teams and helping them learn. They are also doing very well in the market.
    While building our own capabilities and talent, we also keep an eye on outside talent. BBVA Ventures is a San Francisco-based venture capital firm with a global reach that invests in start-ups that develop innovative financial services.
    BBVA Ventures helps us stay on top of what is happening in the realm of digital banking. It also enables us to form alliances involving promising teams and initiatives, and may open the door to acquiring talent, technologies or business models with a disruptive potential in the industry. Specifically, this was the case with our recent acquisition of Simple, a pioneering start-up that focuses squarely on the user experience in mobile banking. Our active presence in the domain of digital start-ups and our bid to support open platforms have led to another first of its kind deal: the recent agreement between BBVA Compass with Dwolla, a payment system start-up. Bank customers can now use Dwolla’s real-time payment network to make their money transfers.
    These are all meaningful forward steps that carry the organization and its people onward to a new corporate culture. In fact, at BBVA we are confident of having gained a competitive edge over competing conventional banks both technologically and in terms of revamping processes, organizational structures, and corporate culture.
    But the pace of change in the digital realm and the ongoing acceleration of the innovation cycle prompted us to speed up our transformation and turn around our organizational structure in radical ways to place the digital world at the center of our vision for the future. It was for this reason that in 2014 we created the Digital Banking Area.

    BBVA Digital Banking: a Radical Organizational Change to Accelerate Transformation

    The main purpose of this new business Area is to speed up the Group’s transformation into a digital bank. The Area is directly responsible for developing existing distribution channels, adapting internal processes and designing a new range of digital products and services capable of delivering the best possible customer experience.
    The guiding idea is that the Area will, in addition to enhancing BBVA’s digital business and presence, work as a catalyst to transform the entire Group. In step with the digitization of the business, the Digital Banking Area will spread throughout every unit of the organization the relevant procedures, work methods, and culture. BBVA Digital Banking is the “BBVA version” of what John Kotter—in his paper collected in this book—has called the “dual model” for driving change forward.
    The new Digital Banking Area’s mission has four key angles: a new, enhanced customer experience; knowledge-driven personalization using the best data analysis technologies; communication in clear, concise language; and access to products and services at any time and from any place.
    This mission specification calls for a new set of behaviors. First, the customer must be the focal point of every decision. Secondly, we must be able to experiment and react quickly, launch initiatives and end them promptly if unsuccessful, and use the iteration method to improve. Thirdly, we must target what’s truly important for customers and the business—do fewer but more relevant things, and do them better. Fourthly, we must be accountable: we should set specific, quantifiable objectives, constantly measure our forward and backward steps, take any necessary corrective action promptly, and quickly take stock of any failures.
    BBVA Digital Banking brings together all digital ventures and initiatives throughout the Group and, in light of the above principles, drives forward an ambitious project for change, which has two branches: transformation in project management and transformation in human resource management.
    In project management we have opted for the Agile methodology. In a nutshell, this involves forming time-limited multidisciplinary teams specifically tailored to the requirements of the given project; team members make intensive use of online tools to work collaboratively. To allow them the widest flexibility in their approach teams are given full decision-making independence—although they are of course accountable for their decisions—and use iterative trial and error to test, improve, or reject the initial concept.
    This new project management model entails a new human resource management model. The Digital Banking Area has been granted full authority to make its own hiring decisions. But we also want to encourage the entire Group to develop, search for, and recognize talent both within the organization and outside. We need our people to have the right know-how but also, and more importantly, we need them to have the cultural features that fit in with this new model—flexible, non-hierarchical, highly mobile, and subject to very quick project progress assessment cycles.
    The Area is sub-divided into several functional units: Marketing, Customer Experience and Business Intelligence, Omni Channel, Technology, Strategy and Planning, and Talent and Culture. These units provide support for six business divisions. Four of those divisions are geographical, corresponding to the Group’s main regions: Spain and Portugal, United States, Mexico, and South America.The other two divisions are global: Forms of Payment and New Digital Businesses.
    BBVA Digital Banking brings together all digital ventures and initiatives throughout the Group and drives forward an ambitious project for change, which has two branches: transformation in project management and in human resource management
    This organizational structure reflects the Digital Banking Area’s goal of trans- forming BBVA’s existing activity and finding new, knowledge-driven lines of business in the digital realm.
    In the business units, objectives—and, accordingly, decisions and priorities—are shaped by the features of each market, particularly the extent of its digitization and the specific opportunities available.
    So in the more developed markets, such as Spain and the United States, the Digital Banking Area’s scope of action extends to digital transformation of the entire franchise and the development of a new business model. This means the Digital Banking Area is directly responsible for the entire market offering, the distribution model, and process design. As a result, the Area takes on joint responsibility with the local business unit for the income statement as a whole.
    In Mexico and South America, the focus lies on developing the digital offering, whether at the request of the local unit or on the Area’s own initiative. In addition the Area manages all digital functions—product development, channels, marketing, processes, technology, etc. These efforts are aimed at the existing “digitized” population, but also seek to develop low-cost models capable of rolling out a profitable financial product range targeting low-income segments (financial inclusiveness). In these regions, the Digital Banking Area is co-responsible, in conjunction with the local business unit, for the “digital” income statement.
    The Digital Banking Area is still at an early stage of development. It is attracting and integrating talent, building teams, specifying projects, and creating ties, agreements, and operational schemes with other business and support areas within the Group.
    In the field of New Digital Businesses the aim is to develop new business models and value proposals beyond the scope of conventional banking. This unit—which is 100% digital in its organizational structure and culture—is entirely independent from the bank and operates as a global business line with its own income statement. It takes its projects forward internally or in partnership with others with a view to maximize the return on BBVA capabilities and assets and external talent. New Digital Businesses is accordingly in charge of the BBVA Group’s interactions with the digital ecosystem. One of the entities reporting to it is BBVA Ventures. It is also in charge of executing BBVA’s mergers and acquisitions in the digital sector.
    The Digital Banking Area is still at an early stage of development. It is attracting and integrating talent, building teams, specifying projects, and creating ties, agreements, and operational schemes with other business and support areas within the Group.
    But even at this incipient phase results are coming through. The Area has already launched almost fifty Agile-driven projects across all business areas involving close to 500 people. One key concern is to initiate and accelerate new projects; another highly significant consideration is to continue projects scheduled or started previously. Highly ambitious targets have been set for 2015 which involve a very significant portion of Group human and technological resources.
    Leaving this aside, however, I believe that the Area’s most meaningful impact is that its principles and work approaches are permeating the organization as a whole—through its ties with other areas, its visibility in internal communications, and the backing it finds in the organization’s leadership.
    This is why I think that the creation of BBVA Digital Banking was a bold and insightful decision in aid of speeding up the Group’s digital transformation. And I also believe we will shortly see the tangible results of the projects now being set in motion and of the Area’s role as a catalyst for change in the organization’s working approaches and culture.

    Final Thoughts: Change Requires Leadership

    It was some time ago that BBVA perceived the risks and opportunities inherent in technological change, and for several years we have worked towards reinventing ourselves and moving on from analog banking—however efficient and profitable it might have been by the standards of the twentieth century—into a knowledge-driven digital business of the twenty-first century.
    This article focuses on two of the milestones in this process. First, the construction over the past seven years of an entirely new technology platform capable of supporting the data capture, storage and processing requirements of digital banking, which are far more demanding than those seen in conventional banking. Our platform now places us clearly ahead of our peers.
    Secondly, in the domain of cultural transformation we have worked hard on several fronts and undertaken many significant projects. Our overarching goal has been to shape an organization that nurtures change and innovation—not as ends in themselves, but as a means to deliver the best possible customer experience. Very recently, the launch of the Digital Banking Area has marked a turning-point in the transformation of our processes, structures, approaches to work, capabilities and mindset, in alignment with the demands of the digital world.
    We have come far, and are now in a position to lead the process of transformation of the banking industry and so become the first—and best—knowledge-based bank, fully in alignment with the digital ecosystem.
    But we are aware that there is still a long road ahead. Our transformation, as we now envision it, is in progress and far from complete. Far more importantly, technological change continues apace, and society is changing with it. We are witnessing the dawn of Big Data technology. The Internet of Things is only just taking off, but is set to grow exponentially. In these realms, as in so many others—some of which we can as yet barely imagine—“more is different,” in the words of Kenneth Cukier in his article for this book, “Big Data and the Future of Business.”
    So we are running a race which, for as long as the present stage of scientific and technological progress accelerates, has no discernible finish line. If we are not to lose our way, if we are not to become complacent or resign ourselves to being second-best, we must modify people’s attitude to change—we must not merely accept, but embrace and promote change. This calls for strong and cohesive leadership throughout the organization. Our leaders must advocate change, encourage change by example, recognize those who support change, and take steps to remove the practices and structures that stand in the way of change.
    This is surely the key to BBVA’s successful transformation so far. And this is the kind of leadership we need in future if we are to achieve our goal that BBVA should become the foremost figure in transforming the best of analog banking into the best of knowledge-based banking for the twenty-first century.

    0 0


    Bluetooth Disc Plays Your Digital Music Like a Vinyl Record






    JESSE ENGLAND HAS an evocative way of describing record players: He considers them “altars” for music. “There’s no contemporary media environment that I know of in the past 50 years that requires that amount of reverence and that amount of care,” he says.
    It’s hard to argue with him. Compared to Spotify and iTunes, playing vinyl is an elaborate process: You have to flip through your collection, select a record and carefully remove it from the sleeve. If you’re really obsessive, you’ll give it a wipe with record brush before placing it on the turntable and delicately lowering the stylus. Listening is similarly involved: You have to be careful not to bump the table, and you’ve got to get up every 15 to 20 minutes or so and flip the disc. 
    “Universal Record,” England’s latest project, revives that ritual for the modern age. It’s a vibrating plastic disc that lets you play music from any digital source, via Bluetooth, on any record player. It’s a clever hack, but it’s even more interesting as a piece of media archaeology, focusing our attention not on the sound quality of vinyl but on the experience of using it.

    univeralrec


    England, who lives in Pittsburgh, Pennsylvania, and holds a MFA degree from Carnegie Mellon University, is fond of examining media through unlikely technological mash-ups. With “Sincerity Machine,” he modified a vintage typewriter to print text in Comic Sans. He used a laser engraving machine to cut letterforms in acrylic and glued them onto the typewriter’s strikes. Universal Record was even more straightforward: Inside the chunky plastic disc is a transducer—essentially “a speaker without a cone,” England says—which receives audio via Bluetooth. The vibrations from the transducer are picked up by the stylus. Thus, through vibration, the digital music is made analog.
    The point isn’t simply to fetishize vinyl. “I am critical of unbridled and unchecked nostalgia as a marker of credibility,” England says. Rather, it’s about exposing folks to different ways of seeing and listening. The project encourages us to consider both what’s lost and gained as we shed old forms of media. I doubt anyone would think to describe iTunes as an “altar” to music (other than a sacrificial one, perhaps).
    You don't have to burn e-books; you just need to get the right person to type a command, and all the copies disappear.

    England thinks there’s additional value in reacquainting people with vinyl—a more permanent form of media whose embedded values are increasingly rare in our dematerialized, digital world. “I am concerned about the trend of all media and all information, by extension, only being available through a networked source,” England says. “I think the experience of listening to music today is exponentially richer and more fulfilling in terms of access and the sheer amount of content, but with that there’s an associated volatility.”
    With services like Netflix and Spotify, entertainment is built on the ever-shifting tectonics of licensing agreements. For music listeners, it’s easy to imagine rare material falling through those cracks. But the problems are even more profound when you consider societies in which the free flow of information isn’t a given. Networked media is dangerously susceptible to censorship and suppression. You don’t have to burn e-books; you just need to get the right person to type a command, and all the copies disappear. “I’m concerned with people in any part of the world being comfortable with media being redefined from something that you have into something that you have access to,” England says.
    Vinyl records, England says, have a built-in reliability. “You can play them back with the most basic of playback tools. You can just take a needle and a paper cone and you’d be able to recall the sound.” And how is the Universal Record’s sound quality? “All in all, it’s listenable,” England says. “But it really exists for the concept.”




    0 0


    Wearable Device Helps Vision Impaired Avoid Collision


    People who have lost some of their peripheral vision, such as those with retinitis pigmentosa, glaucoma, or brain injury that causes half visual field loss, often face mobility challenges and increased likelihood of falls and collisions. As therapeutic vision restoration treatments are still in their infancy, rehabilitation approaches using assistive technologies are often times viable alternatives for addressing mobility challenges related to vision loss.
    Researchers from Massachusetts Eye and Ear, Schepens Eye Research Institute used an obstacle course to evaluate a wearable collision warning device they developed for patients with peripheral vision loss. They found the device may help patients with a wide range of vision loss avoid collisions with high-level obstacles. Their findings are featured on Investigative Ophthalmology and Visual Science (IOVS).
    “We developed this pocket-sized collision warning device, which can predict impending collisions based on time to collision rather than proximity. It gives warnings only when the users approach to obstacles, not when users stand close to objects and not when moving objects just pass by.
     So, the auditory collision warnings given by the device are simple and intuitively understandable. We tested the device in a density obstacle course to evaluate its effect on collision avoidance in people with peripheral vision loss. To show its beneficial effect, we compared the patients’ mobility performance with the device and without it. 
    Just demonstrating the device can give warning for obstacles in walking would not prove the device is useful. We have to compare with a baseline, which is walking without the device in this case.” said the senior author Gang Luo, Ph.D., Associate Scientist at Mass. Eye and Ear/Schepens, and Assistant Professor of Ophthalmology at Harvard Medical School.
    This shows the researcher adjusting the device.
    Gang Luo, Ph.D., Associate Scientist at Mass. Eye and Ear/Schepens, and Assistant Professor of Ophthalmology at Harvard Medical School, adjusts the wearable device that his team created to help those who are visually impaired avoid collision while walking. Image credit: Peter Mallen, Mass. Eye and Ear.
    Twenty five patients with tunnel vision or hemianopia completed the obstacle course study and the number of collisions and walking speed were measured.
    Compared to walking without the device, collisions were reduced significantly by about 37% with the device and walking speed barely changed. No patient had more collisions when using the device than when not using it.

    This video shows a pilot testing of a portable collision device developed for visually impaired. A normally sighted person was wearing a goggle with a small opening, which gave him only 10-deg field of view. A warning device in his pocket gave auditory warnings when imminent collision risk was detected.


    “We are excited about the device’s potential value for helping visually impaired and completely blind people walk around safely. Our next job is to test its usefulness in patients’ daily lives in a clinical trial study.” Dr. Luo said.

    This video demonstrates an end-user prototype of collision warning device we developed for visually impaired and blind people.

    This study is entitled Evaluation of a portable collision warning device for patients with peripheral vision loss in an obstacle course. Other authors are Shrinivas Pundlik and Matteo Tomasi.



    0 0

    How the Nepal Earthquake Happened?




    A little before noon Saturday in Nepal, a chunk of rock about 9 miles below the earth’s surface shifted, unleashing a shock wave—described as being as powerful as the explosion of more than 20 thermonuclear weapons—that ripped through the Katmandu Valley.
    In geological terms, the tremor occurred like clockwork, 81 years after the region’s last earthquake of such a magnitude, in 1934.
    Records dating to 1255 indicate the region—known as the Indus-Yarlung suture zone—experiences a magnitude-8 earthquake approximately every 75 years, according to a report by Nepal’s National Society for Earthquake Technology.
    Earthquakes dissipate energy, like lifting the lid off a pot of boiling water. But it builds back up after you put the lid back on.
    Lung S. Chan, geophysicist at the University of Hong Kong
    The reason is the regular movement of the fault line that runs along Nepal’s southern border, where the Indian subcontinent collided with the Eurasia plate 40 million to 50 million years ago.
    “The collision between India and Eurasia is a showcase for geology,” said Lung S. Chan, a geophysicist at the University of Hong Kong. The so-called India plate is pushing its way north toward Asia at a rate of about 5 centimeters, or 2 inches, a year, he said. “Geologically speaking, that’s very fast.”
    As the plates push against each other, friction generates stress and energy that builds until the crust ruptures, said Dr. Chan, who compared the quake to a thermonuclear weapons explosion. In the case of Saturday’s quake, the plate jumped forward about 2 meters, or 6.5 feet, said Hongfeng Yang, an earthquake expert at the Chinese University of Hong Kong.Saturday’s quake was also relatively shallow, according to the U.S. Geological Survey. Such quakes tend to cause more damage and more aftershocks than those that occur deeper below the earth’s surface.
    After an earthquake, the plates resume moving and the clock resets. “Earthquakes dissipate energy, like lifting the lid off a pot of boiling water,” said Dr. Chan. “But it builds back up after you put the lid back on.”
    Nepal is prone to destructive earthquakes, not only because of the massive forces involved in the tectonic collision, but also because of the type of fault line the country sits on. Normal faults create space when the ground cracks and separates. Nepal lies on a so-called thrust fault, where one tectonic plate forces itself on top of another.
    The most visible result of this is the Himalayan mountain range. The fault runs along the 1,400-mile range, and the constant collision of the India and Eurasia plates pushes up the height of the peaks by about a centimeter each year.
    Despite the seeming regularity of severe earthquakes in Nepal, it isn’t possible to predict when one will happen. Historic records and modern measurements of tectonic plate movement show that if the pressure builds in the region in a way that is “generally consistent and homogenous,” the region should expect a severe earthquake every four to five decades, Dr. Yang said.
    The complexity of the forces applying pressure at the fault means scientists are incapable of predicting more than an average number of earthquakes that a region will experience in a century, experts say.
    Still, earthquakes in Nepal are more predictable than most, because of the regular movement of the plates. Scientists aren’t sure why this is.
    The earth’s tectonics plates are constantly in motion. Some faults release built-up stress in the form of earthquakes. Others release that energy quietly. “Some areas, like Nepal, release energy as a large earthquake, once in a while,” said Dr. Chan. “These regions all have different natures for reasons geologists don’t really know.”
















    0 0

    Preserving Innovation Flair,




    Budding start-ups often aim at the big prize of either going public or getting acquired, but both avenues can hurt innovation. What’s the best path to growth while maintaining your firm’s creative flair?
    Facebook’s bumpy first year as a public company recently sparked debate about the creative benefits of private ownership. The social giant drew heavy criticism for what some felt was an awkward adaptation to the increasing importance of mobile, as well as for a series of copyright and privacy controversies. Only in recent weeks did Facebook’s share price top its IPO price.
    Before its inevitable listing, the company is said to have worried about the effects of public scrutiny on its innovative potential. Now it wrestles with the beast that is the public marketplace and is grilled on new projects and tweaks to its model at every turn.
    Meanwhile, Dell’s innovation slump and its founder’s buyout proposal are due to the pressures the company has felt at the hands of a demanding public market.
    When entrepreneurs ask themselves whether they should take their startups public or sell to the highest bidder, they are in fact putting their firm’s innovative potential in question, according to a new paper by INSEAD professor Vikas A. Aggarwal and Wharton professor David H. Hsu.
    In Entrepreneurial Exits and Innovation, the first paper to address how entrepreneurs should evaluate the alternative liquidity paths available to them, Aggarwal and Hsu find that going public or being acquired does in fact influence a firm’s innovation output.
    By measuring the number of patents filed by venture-backed biotechnology firms that were founded between 1980 and 2000, along with the associated citations to these patents, the authors found three different innovation consequences for firms transitioning from being start-ups to being public or acquired firms.
    The IPO effect
    According to the study going public caused innovation quality to suffer the most.
    Aggarwal and Hsu’s research determines that this is mostly due to information disclosure. As public companies must disclose their inventions as well as their results, managers may opt to back safer projects in order to produce results in the short term.
    “What happens in the case of private companies is that you’re able to operate under the radar screen, and that allows you select projects that may have a higher risk of failure. That then allows you to make investments where you’re not under the constant scrutiny of larger owners or the public market,” said Aggarwal in an interview with INSEAD Knowledge.
    “The competitive aspect of disclosure can be quite important, particularly when you have to disclose what’s in your pipeline and who are you partnering with; this influences the types of projects you will select” he added.
    This mechanism of information disclosure is accelerated under analyst scrutiny, as more information is uncovered and divulged to investors hungry to know about pipelines and future share prices.
    “Analyst scrutiny and the number of products a firm has in its early stage pipeline are both metrics for which there is greater oversight and risk. When analysts are scrutinising a company that has a lot of early stage projects in the pipeline, those are the conditions under which we would expect the disclosure mechanism to be most salient” Aggarwal said.
    Getting acquired
    In a merger, the effects are also negative when the company in question is bought out by a publicly listed one. While acquired firms saw an increase in the quantity of their innovations, they experienced a decline in overall innovation quality. This has to do with managers of the acquiring firm pushing for short-term and observable outcomes. Such a focus however may be detrimental to the long-term innovation potential of the organisation, according to Aggarwal.
    It’s not all bad news for firms that get acquired though. In the study Aggarwal and Hsu found that companies being bought by a private entity rather than a public one see an increase in innovation quality. This is because private acquirers maintain more information confidentiality relative to their public counterparts. Lower technology overlap between the two firms also helps insulate the acquired firm, protecting its ability to innovate.
    The Silver (Lake) Lining
    “This has important implications for private equity ownership,” says Aggarwal. “In particular, it says a lot about the value that private equity firms can create. Dell is a great example: One of the reasons they’ve been less innovative over the past decade or so is because they’ve been under constant public scrutiny. Part of the motivation behind the buyout is to spur innovation at all levels of the company.”
    Of course, Aggarwal readily admits that it’s possible to go public without losing the capacity to innovate (think Amazon). “I think the question is: Would Amazon have been more innovative under a private ownership regime? We know that Facebook held off on its IPO for as long as possible, in part because it feared the consequences of being in the public eye and the possible innovation implications that would have; while we of course never know the counterfactual, what our research tells us is that private relative to public ownership for a given company is likely to spur innovation.”

    0 0


    First Quantum Music Composition Unveiled



    Physicists have mapped out how to create quantum music, an experience that will be profoundly different for every member of the audience, they say

    One of the features of 20th century art is its increasing level of abstraction from cubism and surrealism in the early years to abstract expressionism and mathematical photography later. So an interesting question is what further abstractions can we look forward to in the 21th century?

    Today we get an answer thanks to the work of Karl Svozil, a theoretical physicist at the University of Technology in Vienna and his pal Volkmar Putz. These guys have mapped out a way of representing music using the strange features of quantum theory. The resulting art is the quantum equivalent of music and demonstrates many of the bizarre properties of the quantum world.


    Svozil and Putz begin by discussing just how it might be possible to represent a note or octave of notes in quantum form and by developing the mathematical tools for handling quantum music.

    They begin by thinking of the seven notes in a quantum octave as independent events whose probabilities add up to one. In this scenario, quantum music can be represented by a mathematical structure known as a seven-dimensional Hilbert space.

    A pure quantum musical state would then be made up of a linear combination of the seven notes with a specific probability associated with each. And a quantum melody would be the evolution of such a state over time.

    An audience listening to such a melody would have a bizarre experience. In the classical world, every member of the audience hears the same sequence of notes. But when a quantum musical state is observed, it can collapse into any one of the notes that make it up. The note that is formed is entirely random but the probability that it occurs depends on the precise linear makeup of the state.

    And since this process is random for all observer, the resulting note will not be the same for each member of the audience.

    Svozil and Puz call this “quantum parallel musical rendition.” “A classical audience may perceive one and the same quantum musical composition very differently,” they say.

    As an example they describe the properties of a quantum composition created using two notes: C and G. They show how in one case, a listener might perceive a note as a C in 64 percent of cases and as a G in 36 percent of cases.

    They go on to show how a quantum melody of two notes leads to four possible outcomes: a C followed by a G, a G followed by a C, a C followed by a C, and G followed by a G. And they calculate the probability of a listener experiencing these during a given performance. “Thereby one single quantum composition can manifest itself during listening in very different ways,” say Svozil and Putz. This is the world’s first description of a quantum melody.

    The researchers go on to discuss the strange quantum phenomenon of entanglement in the context of music. Entanglement is the deep connection between quantum objects that share the same existence even though they may be in different parts of the universe. So a measurement on one immediately influences the other, regardless of the distance between them.

    Exactly what form this might take in the quantum musical world isn’t clear. But it opens the prospect of an audience listening to a quantum melody in one part of the universe influencing a quantum melody in another part.

    Svozil and Putz also take a stab at developing a notation for quantum music (see picture above).

    That takes musical composition to a new level of abstraction. “This offers possibilities of aleatorics in music far beyond the classical aleatoric methods of John Cage and his allies,” they say.

    There is one obvious problem, however. Nobody knows how to create quantum music or how a human might be able to experience it. Svozil and Putz’s work is entirely theoretical.

    That shouldn’t stop the authors or anybody else from performing a quantum musical composition. It ought to be straightforward to simulate the effect using an ordinary computer and a set of headphones. So instead of quantum music, we could experience a quantum music simulation.

    That’s interesting work that has implications for other art forms too. How about quantum sculpture that changes for each observer or a quantum mobile that is entangled with another elsewhere in the universe.

    One thing seems clear. Quantum art is coming, or at least the simulation of it.So don’t be surprised if you find a quantum melody playing at an auditorium near you someday soon.




    0 0


    Four Questions to Revolutionise Your Business Model









    Innovation is about more than groundbreaking technology. Rigorous, systematic questioning of risks in your business model can unleash opportunities for game changing performance improvements.
    For a decade or more, listings website Craigslist seemed a rare exception to the Internet’s innovate-or-die rule. Very early in the new millennium, the San Francisco-based site nestled comfortably into the space once solely occupied by local newspapers’ classified sections, thanks to a singular, free-ad-based business model predicated on low operating costs. It has since expanded into more than 70 countries without making any major changes to either how it does business or its infamously drab design.

    A recent spate of copyright-related legal actions, however, may reveal that Craigslist’s fountain of youth is at risk of running dry. In April 2013, a U.S. district court dismissed its claim that it held exclusive license to its listings. The year before, Craigslist added language to its terms of service asserting exclusive ownership of all user posts, prompting outcry from some consumer groups. (The language was deleted after just three weeks). At least to some, the company was starting to display a punitive side not in keeping with its stated mission to provide a public service.

    It may be time for Craigslist to rethink its long-standing refusal to innovate, but this particular issue can’t be resolved with a raft of fancy new features. To satisfy its critics while keeping startups at bay, the company would have to turn a cold eye to what has been the unquestioned basis for its success: the vaunted Craigslist business model. In their new book The Risk-Driven Business Model: Four Questions That Will Define Your Company, Karan Girotra, INSEAD Professor of Technology and Operations Management, and Serguei Netessine, INSEAD Timken Chaired Professor of Global Technology and Innovation, make a forceful case for this sort of business model innovation (or “BMI”) and map out how Craigslist – or any company in danger of dulling its competitive edge – could use it to their advantage.

    Four W’s to Manage Risk

    The “four questions” of the book’s title -- Who, What, When, and Why – are hardly unique to the business world, but according to the authors, few firms subject their business model to such basic scrutiny frequently enough. There’s no substitute, Girotra and Netessine say, for the fundamental questions, such as “What should we sell?” and “When should we introduce our new products?”

    “I like to compare it to financial auditing, which every organisation does every year, many times,” Netessine said in an interview with INSEAD Knowledge. “Often, a public company will do it once a quarter. But then you ask the same company how often [it examines] its own business models, they’ll tell you, ‘Well, I don’t know. Twenty years ago? Thirty years ago?’”

    When business models are allowed to gather dust, the authors contend, hidden risks accumulate that could unravel companies. Though these risks come in many varieties, the book concentrates on two main kinds: information risk and incentive alignment risk. As Girotra explains, information risk “arises out of not knowing something, for instance which colour of iPhone 5 will be popular.” Incentive alignment risk occurs when the best interests of stakeholders diverge. It was awareness of this risk, for example, that led entertainment rental company Blockbuster to change its contracts with movie studios in the 1990s so that stores could order more copies of the most valuable new releases. The result: a big boost in overall profits, and a new (if temporary) lease on life for Blockbuster.

    Blockbuster’s more recent difficulties, the authors suggest, perhaps could have been avoided had the firm kept the “4W”s top of mind. “The 4Ws anchor our framework,” they write, “because they are the innovator’s focal point for reducing both of our characteristic types of risk. By changing models to address the effects of these risks, you can limit the inefficiencies they cause and thereby unlock new value.”

    Model Innovators

    Many companies have turned to BMI techniques in times of existential crisis, but Amazon stands out to Girotra and Netessine for prioritising proactive, rather than reactive, reinvention of its business model. The company underwent multiple shifts in its business model in its first 15 years, they say, taking it from a “sell all, carry few” system heavily dependent on book wholesalers and publishers to a major wholesaler in its own right with a far-flung, ever-expanding network of warehouses. “It is amazing, I think, how Amazon has kept far more discipline than almost any organisation you could think of,” Girotra said.

    But BMI isn’t just for the big boys. To escape the deepening shadow of Amazon, for example, smaller online retail companies could use the 4Ws to carve out a niche for themselves. That was the case with Diapers.com, which launched in 2005 with a business model aimed squarely at new parents. “Diapers had one amazing thing going for them,” Netessine enthused. “Demand for them is extremely easy to predict…For the next two, three years, you know exactly how much your customers are going to buy, and that makes it very, very easy to manage at very, very low cost and much higher efficiency than, say, for Amazon.”

    Small wonder, then, that when Amazon bought Diapers.com in 2010, it allowed the new acquisition to operate at a respectful distance from the parent brand. “Perhaps because they wanted to keep both business models separate, so that both strengths continue to be strengthened,” Girotra said.

    Enter the “Insurgency”

    Effective implementation of BMI comes in three phases, the authors said. “The first phase is generating ideas of what kind of innovations [a company] might be able to do. Next phase is selecting between these innovations. And the final phase is really refining and testing them out, seeing if they really work or not,” Girotra said.

    When gathering ideas, the authors recommend including as much input as possible from throughout the organisation. But for efficiency’s sake, it may be best for top management to hand-pick a diverse team to spearhead the refinement and experimentation phases. Girotra explained, “You don’t really make war on the existing business model, you really have an insurgency of a few people sitting outside the traditional structure who start developing the model.”

    But unlike insurgencies that overthrow governments, these would be focused on “evidence-based experiential evaluation” that “eliminates a lot of the ideology around the existing and the new,” the authors said. Ideological agnosticism as well as broad representation on the team will help soothe any sore spots within the organisation as thoroughgoing change commences.

    Though it may appear risky to loosen attachments to business-as-usual, Netessine stressed that experimentation is a key driver of innovation, even when it yields short-term losses. “If you want to innovate, many of those innovations will fail. Many more will fail than succeed. The important thing is to keep the process running.”


    0 0

    What's The New Quantum Renaissance?





    "Perhaps the quantum computer will change our everyday lives in this century in the same radical way as the classical computer did in the last century": these were the words of the Nobel committee upon awarding Serge Haroche and David Wineland the Nobel Prize for Physics for their work on Quantum Systems. Quantum computers can model complex molecules which can contribute to improvements in health and medicine through quantum chemistry. They are also capable of modelling complex materials which can impact energy efficiency and storage through room temperature superconductivity, as well as solving complex mathematical problems which will benefit safety, security and simulation.

    New Renaissance
    We are at the beginning of a new renaissance that explores the quantum nature of our shared reality. This new age is fast forwarding us to an era of Quantum engines, devices and systems which are beginning to deliver nascent solutions in:
    1. Q-bit mathematics 
    2. Q-bit algorithms & simulation
    3. Quantum clocks
    4. Quantum sensors
    5. Quantum precision Components
    6. Quantum cryptography
    7. Quantum telecommunications
    8. Quantum computing
    9. Quantum healthcare
    10.Quantum energy devices
    The most significant innovations and inventions of our time are increasingly likely to be manifest at Quantum levels. Multiple paradigm-changing Quantum Technologies provide significant solutions to current global challenges in many critical areas of human endeavour and may also present outstanding opportunities for proactive, brave and technologically savvy inventors, innovators and investors.
    The Quantum Age Begins
    The dawn of the Quantum Age -- full of promising new Quantum Technologies -- demonstrates that not only is Quantum Mechanics relevant but every literate person can appreciate its profound beauty at many subtle levels. It is estimated that about 30 percent of the US Gross Domestic Product (GDP) already stems from inventions based on quantum physics: from lasers through to microprocessors and mobile phones. 
    The discovery that subtle effects of Quantum Mechanics allow fundamentally new modes of information processing is outmoding the classical theories of computation, information, cryptography and telephony which are now being superseded by their Quantum equivalents. Quantum Technologies are essentially about the coherent control of individual photons and atoms and explore both the theory and the practical possibilities of inventing and constructing Quantum Mechanisms and Quantum Devices spanning the 3Cs: Computing, Cryptography and Communications.
    Quantum Entanglement & Quantum Coherence
    Quantum Entanglement occurs when two entities or systems appear to us to be separate but through Quantum Coherence act as one system, with states being able to be transferred wholesale from one entity to the other but without a known signal being transferred. Quantum Entanglement is at the heart of our understanding how significant events across the universe operate at the macro- and micro- level in synchronicity despite considerable distance between them. Quantum Entanglement suggests that information is exchanged instantaneously between Quantum Entangled particles regardless of the distance between them.
    Fascinating Action at a Distance
    In Quantum Mechanics, non-locality refers to "action at a distance" arising from measurement correlations on Quantum Entangled states. The dividing line between the micro world of Quantum Processes and the macro world of classical physics is now fading faster than ever before. Evidence is quickly mounting of the use in nature of Quantum properties and processes including Quantum Entanglement. Recent science has shown that Quantum Coherence and Entanglement provide the only viable explanation for a host of mysteries in nature: how photosynthesis in plants works, how birds migrate, how millions of cells co-ordinate hundreds of thousands of activities simultaneously without significant errors, and more.
    Instantaneous Communication
    It did take a long time to prove that Quantum Entanglement truly existed. It wasn’t until the 1980s that it was clearly demonstrated. In 1982, at the University of Paris, a research team led by physicist Alain Aspect performed what may turn out to be one of the most important experiments of the 20th century. Aspect and his team discovered that under certain circumstances subatomic particles such as electrons are able to instantaneously communicate with each other regardless of the distance separating them.
    Holographic Universe
    Quantum Coherence and Quantum Entanglement phenomena have inspired some physicists to offer ever more radical explanations including that of the holographic universe! The implications of a holographic universe are truly mind boggling... Aspect’s findings imply that objective reality does not exist, that despite its apparent solidity the universe is at heart a phantasm, a gigantic and splendidly detailed hologram. To understand why a number of physicists including David Bohm made this startling assertion, one must first understand a little about holograms.
    Hologram
    A hologram is a three-dimensional photograph made with the aid of a laser. To make a hologram, the object to be photographed is first bathed in the light of a laser beam. Then a second laser beam is bounced off the reflected light of the first and the resulting interference pattern -- the area where the two laser beams superimpose -- is captured on film. When the film is developed, it looks like a meaningless swirl of light and dark lines. But as soon as the developed film is illuminated by another laser beam, a three-dimensional image of the original object appears!
    Fractals in Nature and Mathematics
    The three-dimensionality of such images is not the only remarkable characteristic of holograms. If a hologram of a rose is cut in half and then illuminated by a laser, each half is still found to contain the entire image of the rose. Indeed, even if the halves are divided again, each snippet of film is always found to contain a smaller but intact version of the original image. Unlike normal photographs, every part of a hologram contains all the information possessed by the whole! This is exactly like fractals in nature and mathematics.
    Holistic in Every Part
    The "whole in every part" nature of a hologram provides us with an entirely new way of understanding organisation and order. For most of its history, Western science has laboured under the bias that the best way to understand a physical phenomenon, whether a frog or an atom or a national economy, is to dissect it and to study its respective components. A hologram teaches us that some things in the universe may be understood only as integrated holistic systems. If we try to take apart something constructed holographically, we will not get the pieces of which it is made. We will only get smaller wholes or less evolved, less detailed, incomplete miniatures of the whole picture.
    Unity Consciousness: Extensions of the Same Source
    This insight suggested to some scientists, including David Bohm, another way of understanding Aspect’s discovery. Bohm believed the reason subatomic particles are able to remain in contact with one another regardless of the distance separating them is not because they are sending some sort of mysterious signal back and forth, but because their separateness is, in fact, an illusion. Bohm suggested that at some deeper level of reality such particles are not individual entities, but are actually system components of the same fundamental something!
    Investing in Future Technologies
    The same principles of Quantum Entanglement, Resonance and Coherence apply to other fields, such as telecommunications, computers, and energy. Imagine communication devices that need no cables or even a wireless infrastructure. Imagine information being able to be transported magically over distances in a holistic state-dependent way instead of bit by bit or in packets.
    Highly Secure Communications
    There are some real and amazing applications of Quantum Entanglement in the security world. It can be used to produce unbreakable encryption. If we send each half of a set of entangled pairs to either end of a communications link, then the randomly generated but linked properties can be used as a key to encrypt information. If anyone intercepts the information it will break the entanglement, and the communication can be stopped before the eavesdropper picks up any data.
    Infinity Manifest
    The Roman philosopher Cicero observed more than two thousand years ago, "Everything is alive; everything is interconnected!" or "Omnia vivunt, omnia inter se conexa!" We are beginning to see the entire universe as a holographically interlinked network of energy and information, organically whole, undergoing rapid evolution. The "point" of the Singularity is reached essentially when all of the scientific and technological innovation trends appear to go out of control at the human level, ie, they have moved beyond our event horizon, and we can no longer follow along any previous linear logic or understanding to comprehend their combined effects. That technological change is instantaneous and omnipresent and defined by the Quantum Age. As we initiate, establish and activate the new "International Quantum: Exchange for Innovation Club" or "IQ:EI Club" within the aegis of Quantum Innovation Labs (QiLabs.net) we aim to be at the forefront of nurturing Quantum Technology ideas, inventions and innovations.

    0 0

    Quantum Code Breaker, How prepared is the IT Industry?? Quantum Cryptography, How close we are to it.???

    From China to USA and from Russia to Europe, the race is on to construct the first Quantum Code Breaker, as the winner will hold the key to the entire Internet. From trans-national multibillion-dollar financial transactions to top-secret government and military communications, all would be vulnerable to the secret-code-breaking ability of the Quantum Computer that can run Shor's Quantum Factoring alogrithm. Those Quantum computers that can implement the new mathematics could quickly break our most sophisticated encryption codes protecting the internet based secure information, banking and payment transactions.
    Given the powerful governments' development of quantum information science, the race to build the world’s first Quantum Computer for universal code-breaking continues to get red hot. What do the major governments seek? The governments' quest is to build a Quantum Computer capable of solving complex mathematical problems and hacking the public-key encryption codes used to secure the Internet. A universal 21st century Bletchley Park solution, if one imagines, for the new Enigma-type encryption of the internet. This refers to Shor’s quantum factoring algorithm, which can be utilised to unveil the encrypted communications of the entire Internet if a Quantum Computer is built to run the algorithm.
    Most of our personal data is protected by complex encryption systems such as the widely used RSA algorithm, but these systems may have to change owing to an unexpected threat from quantum physics. Chaoyang Lu at the University of Science and Technology of China in Hefei and co-workers have already developed and demonstrated a photonic Quantum Computer that can quickly crack the RSA code -- a task that would take hundreds of years on current supercomputers.
    In theory, the huge prime number 'keys' hidden by RSA can be found using a routine called Shor's algorithm. However, Shor's algorithm requires several calculations to be performed at the same time, which is only possible with Quantum Computers that use Quantum Bits (QuBits), which can be in a superposition of multiple logical states (an entangled state). Schrödinger’s cat and the notion of Quantum Entanglement is at the heart of it all. Quantum Entanglement has to be viewed in the historical context of Einstein’s 30-year battle with the physics community over the true meaning of Quantum Theory.
    At more or less the same time as in China, an almost identical experiment was performed independently at the University of Queensland in Australia, implying that the technique is robust. By learning to manipulate more Qubits, researchers could eventually unlock larger numbers and explore an entirely new realm of mathematics. Philosophically, what are the ramifications of quantum technologies? Other key products and applications include: Quantum physics simulators, synchronised clocks, quantum search engines, quantum sensors and imaging devices.
    What is the remedy to the threat posed by the Quantum code breaker? Quantum cryptography, which is unbreakable even by the Quantum Computer! 

    0 0

    Data science demands elastic infrastructure




    Those companies that try to run big data projects in data centers may be setting themselves up for failure. Matt Asay explains. 

    As companies struggle to make sense of their increasingly big data, they're laboring to figure out the morass of technologies necessary to become successful. However, many will remain stymied, because they keep trying to fit a necessarily fluid process of asking questions of one's data with outmoded, rigid data infrastructure.
    Or as Amazon Web Services (AWS) data science chief Matt Wood tells it, they need the cloud.
    While the cloud isn't a panacea, its elasticity may well prove to be the essential ingredient to big data success.

    How much cloud do I need?

    The problem with trying to run big data projects within a data center revolves around rigidity. As Matt Wood told me in a recent interview, this problem "is not so much about absolute scale of data but rather relative scale of data."
    In other words, as a company's data volume takes a step function up or down, enterprise infrastructure can't keep up. In his words, "Customers will tool for the scale they're currently experiencing," which is great... until it's not.
    In a separate conversation, he elaborates:
    "Those that go out and buy expensive infrastructure find that the problem scope and domain shift really quickly. By the time they get around to answering the original question, the business has moved on. You need an environment that is flexible and allows you to quickly respond to changing big data requirements. Your resource mix is continually evolving--if you buy infrastructure, it's almost immediately irrelevant to your business because it's frozen in time. It's solving a problem you may not have or care about any more."
    Success in big data depends upon iteration, upon experimentation as you try to figure out the right questions to ask and the best way to answer them. This is hard when dealing with a calcified infrastructure.

    A eulogy for the data center?

    Of course, it's not quite so simple as "all cloud, all the time."
    Data, it would seem, has to obey fundamental laws of gravity, as Basho CTO Dave McCrory told TechRepublic in an interview:
    "Big data workloads will live in large data centers where they are most advantaged. Why will they live in specific places? Because data attracts data.
    "If I already have a large quantity of data in a specific cloud, I'm going to be inclined to store additional quantities of large data in the same place. As I do this and add workloads that interact with this data, more data will be created."
    Over time, enterprises will look to the public cloud for all the reasons Wood describes, but legacy data is unlikely to make the migration. There's simply no reason to try to house old data in new infrastructure. Not most of the time.
    But some companies will find that they're more comfortable with existing data centers and will eschew the cloud. I'm not talking about hide-bound enterprise curmudgeons that shout "Phooey!" every time AWS is mentioned, either. No, sometimes the most data center-centric of companies will be the innovators like Etsy.
    As Etsy CTO Kellan Elliott-McCrea informed TechRepublic, once Etsy had "gained confidence" in its ability to manage its Hadoop clusters (and other technology), they brought them in-house, netting a 10X increase in utilization and "very real cost savings."
    Nor is Etsy alone. Other new-school web companies like Twitter have opted to run their own data centers, finding that this gives them greater control over their data.

    You're no Twitter

    As highly as you may estimate your abilities, the reality is that you're probably not an Etsy, Twitter, or Google. As painful as it is to say it, most of us are average. By definition.
    This is what Microsoft's great genius was: rather than cater to the Übermensch of IT, Microsoft lowered the bar to becoming productive as a system administrator, developer, etc. In the process, Microsoft banked billions in profits, helping make a good sysadmin better or a decent developer good.
    Regardless, all enterprises need to establish infrastructure that helps them to iterate. Some, like Etsy, may have figured out how to do this in their data centers--but for most of us, most of the time, Wood's advice rings true: "You need an environment that is flexible and allows you to quickly respond to changing big data requirements."
    In other words, odds are that you're going to need the cloud.

    0 0


    The Value of Data Platform-as-a-Service (dPaaS)









    Data Platform-as-a-Service (dPaaS) represents a new approach to efficiently blend people, processes and technologies.  A customizable dPaaS with unified integration and data management enables organizations to harness the value of their data assets to improve decision outcomes and operating performance.
    dPaaS provides enterprise-class scalability enabling users to work with rapidly-growing and increasingly complex data sets, including big data.  Users have the flexibility to deploy any analytics tool on top of the platform to facilitate analyses in different environments and scenarios.  The platform provides data stewards full transparency and control over data to ensure adherence with GRC (governance, regulatory, compliance) programs.
    dPaaS allows enterprises to reduce the burden of maintenance requirements for hardware and software.  Companies can shift IT budgets from capex to more predictable opex, while freeing up IT teams to work on higher-return projects using market-leading technologies in collaboration with business units.
    More Data Exacerbates Bottlenecks
    Integration and analytics are the top two technologies companies are investing in as they seek to integrate big data with traditional data in their business intelligence (BI) and analytics platforms.  Their goal is to make better decisions faster to build customer loyalty, strengthen competitiveness and achieve return on investment (ROI) and risk management objectives.
    Yet Tech-Tonics estimates that 75%-80% of BI project time and spending is consumed by preparing data for analysis.  Data integration projects alone account for approximately 25% of IT budgets.  This is the result of increased cloud and mobile apps, rapid growth of new data sources and formats, fragmentation caused by departmental data silos and ongoing merger and acquisition activity.
    Despite this investment, 83% of data integration projects fail to meet ROI expectations.  Many projects still get bogged down by a high degree of manual coding that is inefficient and often not documented.  IT teams are backlogged with data integration work, including updating and fixing older projects.
    The cost of bad data is high.  Operational inefficiency, transaction losses, fines for non-compliance and lawsuits stemming from bad data that drive erroneous assumptions and models cost U.S. companies $600 billion a year.
    The sheer volume and complexity of big data only exacerbates the workflow bottlenecks caused by a lack of decision-ready data.  Traditional practices for discovering, integrating, managing and governing data have become overburdened or incapable of handling semi-structured or unstructured data.  But despite advances in technologies to collect, store, process and analyze data, most end-users still struggle to locate the data they need when they need it to allow for more accurate, efficient and timely models and decision-making.
    Data Platform-as-a-Service: A New Approach to Better Decision Outcomes
    Companies implementing dPaaS can significantly improve success rates and return on data assets (RDA) by allowing enterprises to expand the scope of integration projects and manage larger data sets more efficiently to better leverage their BI investments.
    dPaaS promotes a data first strategy for BI initiatives.  Data is integrated from multiple sources, harmonized in a consistent state and then managed to end-user requirements.  The ability to quickly and easily connect to applications and data sources is critical in handling big data, as well as rapidly integrating new applications.  The context end-users gain shortens the path to finding patterns and relationships during data analysis, resulting in faster and more actionable insights.
    dPaaS helps streamline the complexity of matching, cleaning and preparing all data for analysis.  Data cleansing tools and a specialized matching engine helps find and fix data quality issues.  A registry of all corporate data sources maps data to its location, applications and owners.  This consistent set of master data – or “golden record” – provides a common point of reference.  Versions and hierarchies are maintained to ensure that data remains in sync at all times.
    A single, consistent set of data policies and processes also helps overcome the challenges posed by data silos across the organization.  dPaaS facilitates integrating big data with traditional enterprise sources, such as transactional and operational databases, data warehouses, CRM, SCM and ERP systems.  Interactions between applications that use the data, as well as underlying systems can be monitored to alert for performance issues and user experience.  dPaaS also ensures security best practices with stringent policy, procedure and process controls. 
    A company’s data assets only have value when they can be accessed and used appropriately by employees and customers, and the underlying business processes that support them.  A strong data governance program supported by dPaaS can serve as the foundation for corporate data strategy.  Reducing costs, enhancing IT productivity and enabling faster time-to-value through improved decision-making all make dPaaS a compelling value proposition for enterprises. 


    0 0
  • 04/27/15--08:06: What is a data lake? 04-27
  • What is a data lake?






    You’ve probably heard of data warehousing, but now there’s a newer phrase doing the rounds, and it’s one you’re likely to hear more in the future if you’re involved in big data: ‘Data Lakes’.
    So what are they? Well, the best way to describe them is to compare them to data warehouses, because the difference is very much the same as between storing something in a warehouse and storing something in a lake.
    In a warehouse, everything is archived and ordered in a defined way – the products are inside containers, the containers on shelves, the shelves are in rows, and so on. This is the way that data is stored in a traditional data warehouse.
    In a data lake, everything is just poured in, in an unstructured way. A molecule of water in the lake is equal to any other molecule and can be moved to any part of the lake where it will feel equally at home.
    This means that data in a lake has a great deal of agility – another word which is becoming more frequently used these days – in that it can be configured or reconfigured as necessary, depending on the job you want to do with it.
    A data lake contains data in its rawest form – fresh from capture, and unadulterated by processing or analysis.
    It uses what is known as object-based storage, because each individual piece of data is treated as an object, made up of the information itself packaged together with its associated metadata, and a unique identifier.
    No piece of information is “higher-level” than any other, because it is not a hierarchically archived system, like a warehouse – it is basically a big free-for-all, as water molecules exist in a lake.
    The term is thought to have first been used by Pentaho CTO James Dixon in 2011, who didn’t invent the concept but gave a name to the type of innovative data architecture solutions being put to use by companies such as Google and Facebook.
    It didn’t take long for the name to make it into marketing material. Pivotal refer to their product as a “business data lake” and Hortonworks include it in the name of their service, Hortonworks Datalakes.
    It is a practice which is expected to become more popular in the future, as more organizations become aware of the increased agility afforded by storing data in data lakes rather than strict hierarchical databases.
    For example, the way that data is stored in a database (its “schema”) is often defined in the early days of the design of a data strategy.  The needs and priorities of the organization may well change as time goes on.
    One way of thinking about it is that data stored without structure can be more quickly shaped into whatever form it is needed, than if you first have to disassemble the previous structure before reassembling it.
    Another advantage is that the data is available to anyone in the organization, and can be analyzed and interrogated via different tools and interfaces as appropriate for each job.
    It also means that all of an organization’s data is kept in one place – rather than having separate data stores for individual departments or applications, as is often the case.
    This brings its own advantages and disadvantages – on the one hand, it makes auditing and compliancy simpler, with only one store to manage. On the other, there are obvious security implications if you’re keeping “all your eggs in one basket”.
    Data lakes are usually built within the Hadoop framework, as the datasets they are comprised of are “big” and need the volume of storage offered by distributed systems.
    A lot of it is theoretical at the moment because there are very few organizations which are ready to make the move to keeping all of their data in a lake. Many are bogged down in a “data swamp” – hard-to-navigate mishmashes of land and water where their data has been stored in various, uncoordinated ways over the years.
    And it has its critics of course – some say that the name itself is a problem (and I am inclined to agree) as it implies a lack of architectural awareness, when a more careful consideration of data architecture is what’s really needed when designing new solutions.
    But for better or worse, it is a term that you will probably be hearing more of in the near future if you’re involved in big data and business intelligence.
    Are you ready to dive head first into the data lake or do you prefer to keep your data high and dry?

    0 0

    How-to: Tune Your Apache Spark Jobs (Part 1)


    Learn techniques for tuning your Apache Spark jobs for optimal efficiency.

    When you write Apache Spark code and page through the public APIs, you come across words like transformation,action, and RDD. Understanding Spark at this level is vital for writing Spark programs. Similarly, when things start to fail, or when you venture into the web UI to try to understand why your application is taking so long, you’re confronted with a new vocabulary of words like jobstage, and task. Understanding Spark at this level is vital for writing goodSpark programs, and of course by good, I mean fast. To write a Spark program that will execute efficiently, it is very, very helpful to understand Spark’s underlying execution model.
    In this post, you’ll learn the basics of how Spark programs are actually executed on a cluster. Then, you’ll get some practical recommendations about what Spark’s execution model means for writing efficient programs.

    How Spark Executes Your Program


    A Spark application consists of a single driver process and a set of executor processes scattered across nodes on the cluster.

    The driver is the process that is in charge of the high-level control flow of work that needs to be done. The executor processes are responsible for executing this work, in the form of tasks, as well as for storing any data that the user chooses to cache. Both the driver and the executors typically stick around for the entire time the application is running, although dynamic resource allocation changes that for the latter. A single executor has a number of slots for running tasks, and will run many concurrently throughout its lifetime. Deploying these processes on the cluster is up to the cluster manager in use (YARN, Mesos, or Spark Standalone), but the driver and executor themselves exist in every Spark application.

    At the top of the execution hierarchy are jobs. Invoking an action inside a Spark application triggers the launch of a Spark job to fulfill it. To decide what this job looks like, Spark examines the graph of RDDs on which that action depends and formulates an execution plan. This plan starts with the farthest-back RDDs—that is, those that depend on no other RDDs or reference already-cached data–and culminates in the final RDD required to produce the action’s results.
    The execution plan consists of assembling the job’s transformations into stages. A stage corresponds to a collection of tasks that all execute the same code, each on a different subset of the data. Each stage contains a sequence of transformations that can be completed without shuffling the full data.
    What determines whether data needs to be shuffled? Recall that an RDD comprises a fixed number of partitions, each of which comprises a number of records. For the RDDs returned by so-called narrow transformations like map and filter, the records required to compute the records in a single partition reside in a single partition in the parent RDD. Each object is only dependent on a single object in the parent. Operations like coalesce can result in a task processing multiple input partitions, but the transformation is still considered narrow because the input records used to compute any single output record can still only reside in a limited subset of the partitions.
    However, Spark also supports transformations with wide dependencies such as groupByKey and reduceByKey. In these dependencies, the data required to compute the records in a single partition may reside in many partitions of the parent RDD. All of the tuples with the same key must end up in the same partition, processed by the same task. To satisfy these operations, Spark must execute a shuffle, which transfers data around the cluster and results in a new stage with a new set of partitions.
    For example, consider the following code:

    It executes a single action, which depends on a sequence of transformations on an RDD derived from a text file. This code would execute in a single stage, because none of the outputs of these three operations depend on data that can come from different partitions than their inputs.
    In contrast, this code finds how many times each character appears in all the words that appear more than 1,000 times in a text file.

    This process would break down into three stages. The reduceByKey operations result in stage boundaries, because computing their outputs requires repartitioning the data by keys.
    Here is a more complicated transformation graph including a join transformation with multiple dependencies.
    The pink boxes show the resulting stage graph used to execute it.
    At each stage boundary, data is written to disk by tasks in the parent stages and then fetched over the network by tasks in the child stage. Because they incur heavy disk and network I/O, stage boundaries can be expensive and should be avoided when possible. The number of data partitions in the parent stage may be different than the number of partitions in the child stage. Transformations that may trigger a stage boundary typically accept a numPartitionsargument that determines how many partitions to split the data into in the child stage.
    Just as the number of reducers is an important parameter in tuning MapReduce jobs, tuning the number of partitions at stage boundaries can often make or break an application’s performance. We’ll delve deeper into how to tune this number in a later section.

    Picking the Right Operators


    When trying to accomplish something with Spark, a developer can usually choose from many arrangements of actions and transformations that will produce the same results. However, not all these arrangements will result in the same performance: avoiding common pitfalls and picking the right arrangement can make a world of difference in an application’s performance. A few rules and insights will help you orient yourself when these choices come up.
    Recent work in SPARK-5097 began stabilizing SchemaRDD, which will open up Spark’s Catalyst optimizer to programmers using Spark’s core APIs, allowing Spark to make some higher-level choices about which operators to use. When SchemaRDD becomes a stable component, users will be shielded from needing to make some of these decisions.
    The primary goal when choosing an arrangement of operators is to reduce the number of shuffles and the amount of data shuffled. This is because shuffles are fairly expensive operations; all shuffle data must be written to disk and then transferred over the network. repartition , joincogroup, and any of the *By or *ByKey transformations can result in shuffles. Not all these operations are equal, however, and a few of the most common performance pitfalls for novice Spark developers arise from picking the wrong one:

    • Avoid groupByKey when performing an associative reductive operation. For example,rdd.groupByKey().mapValues(_.sum) will produce the same results as rdd.reduceByKey(_ + _). However, the former will transfer the entire dataset across the network, while the latter will compute local sums for each key in each partition and combine those local sums into larger sums after shuffling.


    • Avoid reduceByKey When the input and output value types are different. For example, consider writing a transformation that finds all the unique strings corresponding to each key. One way would be to use map to transform each element into a Set and then combine the Sets with reduceByKey:
      This code results in tons of unnecessary object creation because a new set must be allocated for each record. It’s better to use aggregateByKey, which performs the map-side aggregation more efficiently:
    • Avoid the flatMap-join-groupBy pattern. When two datasets are already grouped by key and you want to join them and keep them grouped, you can just use cogroup. That avoids all the overhead associated with unpacking and repacking the groups.

    When Shuffles Don’t Happen

    It’s also useful to be aware of the cases in which the above transformations will not result in shuffles. Spark knows to avoid a shuffle when a previous transformation has already partitioned the data according to the same partitioner. Consider the following flow:

    Because no partitioner is passed to reduceByKey, the default partitioner will be used, resulting in rdd1 and rdd2 both hash-partitioned. These two reduceByKeys will result in two shuffles. If the RDDs have the same number of partitions, the join will require no additional shuffling. Because the RDDs are partitioned identically, the set of keys in any single partition of rdd1 can only show up in a single partition of rdd2. Therefore, the contents of any single output partition of rdd3 will depend only on the contents of a single partition in rdd1 and single partition in rdd2, and a third shuffle is not required.
    For example, if someRdd has four partitions, someOtherRdd has two partitions, and both the reduceByKeys use three partitions, the set of tasks that execute would look like:
    What if rdd1 and rdd2 use different partitioners or use the default (hash) partitioner with different numbers partitions?  In that case, only one of the rdds (the one with the fewer number of partitions) will need to be reshuffled for the join.
    Same transformations, same inputs, different number of partitions:
    One way to avoid shuffles when joining two datasets is to take advantage of broadcast variables. When one of the datasets is small enough to fit in memory in a single executor, it can be loaded into a hash table on the driver and then broadcast to every executor. A map transformation can then reference the hash table to do lookups.

    When More Shuffles are Better

    There is an occasional exception to the rule of minimizing the number of shuffles. An extra shuffle can be advantageous to performance when it increases parallelism. For example, if your data arrives in a few large unsplittable files, the partitioning dictated by the InputFormat might place large numbers of records in each partition, while not generating enough partitions to take advantage of all the available cores. In this case, invoking repartition with a high number of partitions (which will trigger a shuffle) after loading the data will allow the operations that come after it to leverage more of the cluster’s CPU.
    Another instance of this exception can arise when using the reduce or aggregate action to aggregate data into the driver. When aggregating over a high number of partitions, the computation can quickly become bottlenecked on a single thread in the driver merging all the results together. To loosen the load on the driver, one can first usereduceByKey or aggregateByKey to carry out a round of distributed aggregation that divides the dataset into a smaller number of partitions. The values within each partition are merged with each other in parallel, before sending their results to the driver for a final round of aggregation. Take a look at treeReduce and treeAggregate for examples of how to do that. (Note that in 1.2, the most recent version at the time of this writing, these are marked as developer APIs, but SPARK-5430 seeks to add stable versions of them in core.)
    This trick is especially useful when the aggregation is already grouped by a key. For example, consider an app that wants to count the occurrences of each word in a corpus and pull the results into the driver as a map.  One approach, which can be accomplished with the aggregate action, is to compute a local map at each partition and then merge the maps at the driver. The alternative approach, which can be accomplished with aggregateByKey, is to perform the count in a fully distributed way, and then simply collectAsMap the results to the driver.

    Secondary Sort

    Another important capability to be aware of is 
    the repartitionAndSortWithinPartitions transformation. It’s a transformation that sounds arcane, but seems to come up in all sorts of strange situations. This transformation pushes sorting down into the shuffle machinery, where large amounts of data can be spilled efficiently and sorting can be combined with other operations.
    For example, Apache Hive on Spark uses this transformation inside its join implementation. It also acts as a vital building block in the secondary sort pattern, in which you want to both group records by key and then, when iterating over the values that correspond to a key, have them show up in a particular order. This issue comes up in algorithms that need to group events by user and then analyze the events for each user based on the order they occurred in time. 
    Taking advantage of repartitionAndSortWithinPartitions to do secondary sort currently requires a bit of legwork on the part of the user, but SPARK-3655 will simplify things vastly.

    Conclusion

    You should now have a good understanding of the basic factors in involved in creating a performance-efficient Spark program! In Part 2, we’ll cover tuning resource requests, parallelism, and data structures.


    0 0


    The Perilous World of Machine Learning for Fun and Profit: Pipeline Jungles and Hidden Feedback Loops



                                                                            George prototyping a machine learning model.


    I haven't written a blog post in ages. And while I don't want to give anything away, the main reason I haven't been writing is that I've been too busy doing my day job at MailChimp. The data science team has been working closely with others at the company to do some fun things in the coming year.

    That said, I got inspired to write a quick post by this excellent short paper out of Google,  "Machine Learning: The High Interest Credit Card of Technical Debt."

    Anyone who plans on building production mathematical modeling systems for a living needs to keep a copy of that paper close.

    And while I don't want to recap the whole paper here, I want to highlight some pieces of it that hit close to home.
    Pipeline Jungles
    Picture
    George prototyping a machine learning model.
    There was a time as a boy when my favorite book was George's Marvelous Medicine by Roald Dahl. The book is full of all that mischief and malice that makes Dahl books so much fun. 

    In the book, George wanders around his house finding chemicals to mix up into a brown soup to give to his grandmother in place of her normal medicine. And reading this bit of felony grand-matricide as a child always made me smile.

    Prototyping a new machine learning model is like George's quest for toxic chemicals. It's a chance for the data scientist to root around their company looking for data sources and engineering features that help predict an outcome.

    A little bit of these log files. A dash of Google Analytics data. Some of Marge-from-Accounting's spreadsheet.

    POOF! We have a marvelous model.

    How fun it is to regale others with tales of how you found that a combination of reddit upvotes, the lunar calendar, and the number of times your yoga instructor says FODMAPs is actually somewhat predictive!

    But now it's the job of some poor sucker dev to take your prototype model, which pulls from innumerable sources (hell, you probably scraped Twitter too just for good measure), and turn it into a production system.

    All of a sudden there's a "pipeline jungle," a jumbled up stream of data sources and glue code for feature engineering and combination, to create something programmatically and reliably in production that you only had to create once manually in your George's-Marvelous-Medicine-revelry.

    It's easy in the research and design phase of a machine learning project to over-engineer the product. Too many data sources, too many exotic and brittle features, and as a corollary, too complex a model. One trap the paper points out is leaving in low powered features in your prototype model, because well, they help a little, and they're not hurting anyone right? 

    What's the value of those features versus the cost of leaving them in? That's extra code to maintain, maybe an extra source to pull from. And as the Google paper notes, the world changes, data changes, and every model feature is a potential risk for breaking everything.

    Remember, the tech press (and vendors) would have you build a deep learning model that's fed scraped data from the internet's butthole, but it's important to exercise a little self-control. As the authors of the technical debt paper put it, "Research solutions that provide a tiny accuracy benefit at the cost of massive increases in system complexity are rarely wise practice." Preach.

    Who's going to own this model and care for it and love it and feed it and put a band-aid on its cuts when its decision thresholds start to drift? Since it's going to cost money in terms of manpower and tied up resources to maintain this model, what is the worth of this model to the business? If it's not that important of a model (and by important, I'm usually talking top line revenue), then maybe a logistic regression with a few interactions will do you nicely.
    Humans are Feedback Loop Machines
    Picture
    The graveyard at Haworth
    In Haworth England, they used to bury bodies at the top of a hill above the town. When someone died, they got carted up to the overcrowded graveyard and then their choleraic juices would seep into the water supply and infect those down below, creating more bodies for the graveyard.

    Haworth had a particularly nasty feedback loop.

    Machine learning models suck up all sorts of nasty dead body water too.

    At MailChimp, if I know that a user is going to be a spammer in the future, I can shut them down now using a machine learning model and a swift kick to the user's derriere.

    But that future I'm changing will someday, maybe next week, maybe next year, be the machine learning system's present day.

    And any attempt to train on present day data, data which has now been polluted by the business's model-driven actions (dead spammers buried at the top of the hill), is fraught with peril. It's a feedback loop. All of a sudden, maybe I don't have any spammers to train my ML model on, because I've shut them all down. And now my newly trained model thinks spamming is more unlikely than I know it to be.

    Of course, such feedback loops can be mitigated in many ways. Holdout sets for example.

    But we can only mitigate a feedback loop if we know about its existence, and we as humans are awesome at generating feedback loops and terrible at recognizing them. 

    Think about time-travel in fiction. Once you have a time machine (and make no mistake, a well-suited ML model is pretty close to a forward-leaping time machine when it comes to things like sales and marketing), it's easy to jump through time and monkey with events, but it's hard to anticipate all the consequences of those changes and how they might alter your future training data.

    And yet when the outputs of ML models are put in the hands of others to act on, you can bet that the future (and the future pool of training data with it) will be altered. That's the point! I don't predict spammers to do nothing about them! Predictions are meant to be acted upon.

    And so, when the police predict that a community is full of criminals and then they start harassing that community, what do you think is going to happen? The future training data gets affected by the police's "special attention." Predictive modeling feeds back into systematic discrimination.

    But we shouldn't expect cops to understand that they're burying their dead at the top of the hill.

    This is one of my fears with the pedestrianization of data science techniques. As we put predictive models more and more in the hands of the layperson, have we considered that we might cut anyone out of the loop who even understands or cares about their misuse?
    Get Integrated, Stay Alert
    The technical debt paper makes this astute observation, "It’s worth noting that glue code and pipeline jungles are symptomatic of integration issues that may have a root cause in overly separated 'research' and 'engineering' roles."

    This is absolutely true. When data scientists treat production implementation as a black box they shove their prototypes through and when engineers treat ML packages as black boxes they shove data pipelines through, problems abound.

    Mathematical modelers need to stay close to engineers when building production data systems. Both need to keep each other in mind and keep the business in mind. The goal is not to use deep learning. The goal is not to program in Go. The goal is to create a system for the business that lives on. And in that context, accuracy, maintainability, sturdiness...they all hold equal weight.

    So as a data scientist keep your stats buds close and your colleagues from other teams (engineers, MBAs, legal, ....) closer with the goal of getting work done together. It's the only way your models will survive past prototype.


    0 0

     Shoes That Grow Five Sizes In Five Years For Kids In Developing Countries


    “I had no idea how important shoes were,” founder Kenton Lee told BuzzFeed News.

    Kenton Lee was working at an orphanage in Kenya when he noticed a little girl with the ends of her shoes cut off and her toes sticking out. It was then that he came up with the idea for The Shoe That Grows.


    “For years the idea of these growing shoes wouldn’t leave my mind,” he told BuzzFeed News.
    The first step was starting Because International with a few friends in 2006, a nonprofit devoted to “working with and helping those in extreme poverty,” their site says.
    Kenton Lee
    Proof of Concept
     

    Lee and his team at first tried to give the idea to companies like Nike, Crocs, and Toms, to no avail. Eventually they found a “shoe development company” called Proof of Concept who agreed to help them with the design.


    The shoe is made out of a high quality soft leather on top, and extremely durable rubber soles similar material to a tire, Lee said. They expand through a simple system of buckles, snaps, and pegs.
    Proof of Concept
    Because International
     

    The shoes are predicted to last a minimum of five years, and expand five sizes in that time. The small size will fit preschoolers through fifth graders, while the large will fit fifth through ninth graders.

    This Guy Invented Shoes That Grow Five Sizes In Five Years For Kids In Developing Countries
    Via theshoethatgrows.org

    “I had no idea how important shoes were before I went to Kenya,” Lee said. “But kids, especially in urban areas, can get infections from cuts and scrapes on their feet from going barefoot, and contract diseases that cause them to miss school.”


    The 30-year-old, who started a church in Idaho with his wife, said he wanted to put these kids in the best possible position to succeed in their lives.
    “If I can provide a kid with protection so they stay healthy and keep going to school, I’ll have done my part.”
    Because International / Via becauseinternational.org

    The shoes cost $10 a pair, and each pair goes into a “duffle bag” that can fit 50 pairs of shoes. Once one organization’s duffle bag is full, Because International ships it to the organization that flies with them to one of seven countries.


    Donors can either buy shoes to distribute themselves, or buy a pair of shoes and choose one of five American nonprofit organizations to distribute them to orphanages and churches around the world.
    Because International
    Because International
     

    So far about 2,500 children across seven countries are wearing the shoes, including in Ghana, Haiti, Peru, Colombia, and Kenya.


    “We have about 500 left of our first order, currently being stored in a room in my house where my son sometimes chews on them,” said Lee, referring to his 11-month-old son (he also has another one the way). He said they have an order of 3,000 more pairs coming in July for people to donate.
    The Shoe That Grows / Via theshoethatgrows.org

    “We considered making even larger ones for teenagers,” Lee added, “but we were told that they didn’t want to wear ‘charity shoes,’ they wanted to wear something cooler.”


    He said he’s now being flooded with requests, mostly from Americans, to make adult-sized versions.

older | 1 | .... | 57 | 58 | (Page 59) | 60 | 61 | .... | 82 | newer