The information age is producing humungous amount of data that requires a paradigm shift in thinking about how to store and process this data. Traditional architectures fade into cost overruns, scalability and ROI issues. Distributed computing is the answer. In traditional High performance settings, one often assumes a "well-behaved" system: no faults or failures, minimal security requirements, consistency of state among application components, availability of global information and simple resource sharing policies. While those assumptions are arguably valid in tightly coupled systems, they break down as systems become more distributed.
This presentation will talk about the Cloud computing in Yahoo with emphasis on Hadoop Grid. The Grid Computing group at Yahoo! Bangalore focuses on Grid frameworks that scale to thousands of machines and handle peta-bytes of data. The group is especially involved in the development of the Open Source Hadoop platform and its deployment within Yahoo!