Spark is a high-speed and hybrid cluster computing system. Spark provides high-level APIs in Python and R, Java, Scala, and an optimized engine that enables general execution graphs. And provides a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.
At Sumanas Technologies we make use of Spark Streaming ETL (extract, transform, load) tools for batch processing, in data warehouse environments to read data, convert it to a database compatible format, and then write it to the target database. Additionally perform tasks like Data enrichment, Trigger event detection, Complex session analysis. With its machine learning capabilities, integrated framework for performing advanced analytics and Machine Learning Library (MLlib) we work in areas such as clustering, classification, and dimensionality reduction to solve tasks such as predictive intelligence, customer segmentation for marketing purposes, and sentiment analysis.