SQL Server Parallel Data Warehouse
Organizations need actionable and timely business insights from rapidly growing data. Take advantage of Microsoft SQL Server Parallel Data Warehouse and its massively parallel processing (MPP) architecture to gain scalable performance, flexibility, and hardware choices with the most comprehensive data warehouse solution available
The biggest gain in the management of volumes of data that is now part of SQL Server is that of the parallel management of volumes of data. In parallel processing of data, data is stored on more than one device so that more than one processor can access and manage data at the same time. In order to understand the value of parallel management of data
Key Design Elements
· Increased speed and scalability requiring fewer computing resources and less IT effort
· More high density storage and compute power with Microsoft Windows Server Storage Spaces and Hyper-V
· Up to 15 times compression of data to help save up to 70 percent of storage requirements
· 50-percent reduction in hardware cost along with 50-percent reduction in energy consumption
Built for Big Data: Enables integrated query across Hadoop and relational data with PolyBase, a fundamental breakthrough on the data processing engine
Next-generation performance at scale:
Engineered for optimal value: Redefines the PDW appliance with software innovations such as xVelocity columnstore, PolyBase, Windows Server 2012 Hyper-V, Storage Spaces and automatic memory tuning while providing a monitoring pack for integration with existing deployments of Microsoft System Center
PolyBase can accept a standard Transact-SQL query that joins tables from a relational source with tables from a Hadoop cluster referencing a non-relational source and seamlessly return the results to the user. IT is not required to pre-load data from Hadoop into the warehouse and users are not required to learn MapReduce to make the query. This gives organizations the ability to perform interactive analysis on data of virtually any velocity, complexity, and size with simplicity. With the export option in PolyBase, organizations can store their historical web log files in a directory on a Hadoop cluster on low-cost commodity hardware, and then process it with PolyBase by specifying the directory location while performing the query.
BI Integration: Native Microsoft BI integration enables users to quickly create compelling visualizations and make key business decisions from nearly any type of data (relational or non-relational) from familiar tools such as Microsoft Excel. Power View is built into Excel, enabling users to quickly and easily visualize data, while sharing and collaborating on business insights in a familiar Microsoft SharePoint Server environment. Additionally, Power View eliminates the need to build cubes and enables organizations to query much more data without any constraints on tabular model limits.
SQL Server 2012 PDW has a feature called PolyBase, that enables you to integrate Hadoop data with PDW data. By using PDW with PolyBase capabilities, a user can:
- Use an external table to define a table structure for Hadoop data.
- Query Hadoop data by running SQL statements
- Integrate Hadoop data with PDW data by running a PDW query that joins Hadoop data to a relational PDW table.
- Persist Hadoop data in PDW by querying Hadoop and saving the results to a PDW table.
- Use Hadoop as an online data archive by exporting PDW data to Hadoop. Since the data is stored online in Hadoop, user will be able to retrieve the data by querying it from PDW.
For more details on Use SSIS for ETL from Hadoop
http://sqlmag.com/blog/use-ssis-etl-hadoop