# Apache Spark Reviews 2026. Verified Reviews, Pros & Cons | Capterra

> Is Apache Spark the right Data Analysis solution for you? Explore 16 verified user reviews from people in industries like yours to make a confident choice.

Source: https://www.capterra.com/p/233703/Apache-Spark/reviews

---

Apache Spark

4.6 (16)

[View alternatives](https://www.capterra.com/p/233703/Apache-Spark/alternatives/)

Provider data verified by our Software Research team, and reviews moderated by our Reviews Verification team. [Learn more](https://www.capterra.com/our-story/)

* * *

Last updated March 13th, 2026

# Reviews of Apache Spark

Ease of use

4.2

Customer Service

4.3

## Showing most helpful reviews

Showing 1-16 of 16 Reviews

Sort by:

Most Helpful

Rating

Company Size

Reviewer's Role

Length of Use

Frequency of Use

Filippo C.  
Senior Data Scientist  
Marketing and Advertising  
Used the software for: 2+ years

### "The ultimate tool for big data processing!"

October 2, 2023

5.0

Apache Spark is helping me normalize and process hundred of millions of records at a time. Its machine learning library allows further processing to extract valuable insights from advertising and marketing data. As an open source software, it displays hundreds of connectors to read/write data from/to the most common cloud databases, regardless of the data cloud center provider (AWS, Microsoft, GoogleCloud)

Pros

Apache Spark is an open source big data engine that allows to process millions and even billions of records. It supports integration with common programming languages (R, Python, Scala, SQL) which makes it quite easy to use. It's used world wide for data science and engineering and it relies on parallel in-memory computation that boost your code performance. It really helped me in processing any kind of data I had

Cons

It takes time to learn how Spark works and how operations are distributed among the driver and the workers. Also error messages are not completely clear and sometimes debugging is not straightforward.

Switched from

[The Jupyter Notebook](https://www.capterra.com/p/234967/The-Jupyter-Notebook/)

Jupyter Notebook is based on serial computation and it doesn't fit large data processing.

Review Source

VP

Vidya P.  
Cloud engineer  
Information Technology and Services  
Used the software for: 2+ years

### "Spark does sparkle with data processing"

April 27, 2025

5.0

Overall I have been working with spark for 8 yrs now and it's most widely used framework for analytics and data pipelines which needs expertise to get most out of it

Pros

Spark is in memory computation and very capable of large data processing and it's integration capabilities with different data sources make it special

Cons

Apache spark needs more memory which can be cost consuming and it needs good understanding of framework to get most out of it

Switched from

[Apache Hive](https://www.capterra.com/p/170238/Apache-Hive/)

Hive was not peformant and spark evolved from hive

Review Source

VR

Verified Reviewer  
Software Engineer  
Information Technology and Services  
Used the software for: 6-12 months

### "How Apache spark is necessary to learn for a data engineer."

June 27, 2022

3.0

I have use this in the project work in organization to perform ETL operations on a large scale of data set for the clients.

Pros

Its a framework that allows you to process large amounts of data and stored in to any database.its uses the in memory data processing that is way faster then other framework like hadoop. It supports wide range of languages like scala , java , python so a user can easily adapt with this framework in any one of this language if he have gone through.

Cons

It is difficult here when it comes to process the BLOB files its bit tricky to understand this framework because of lack of documentation on Apache spark.

Reason for choosing Apache Spark

Its way more faster then other ETL framework in the market.

Review Source

VR

Verified Reviewer  
Data Analytics Manager  
Logistics and Supply Chain  
Used the software for: 2+ years

### "The best big data processing engine for both batch and streaming workloads"

June 20, 2022

4.0

We used spark to process large data sets both for ETL and machine learning use cases. We used a managed service on top of it to lessen the operational burden and we were able to deploy powerful and reliable data and ML pipelines.

Pros

I like that it can handle both batch and streaming workloads within technically a single framework. It lessens the technical burden for us in learning 2 separate APIs.

Cons

Spark is quite expensive to use. So there's a lot of manual operations and optimizations to do to make sure you don't incur unnecessary costs.

Alternatives considered

[Presto](https://www.capterra.com/p/206764/Presto/)

Reason for choosing Apache Spark

Great ecosystem overall for Apache Spark.

Switched from

[Apache Hive](https://www.capterra.com/p/170238/Apache-Hive/)

Spark has a much better ecosystem today compared to Hive.

Review Source

SA

Samuel A.  
Software  
Telecommunications  
Used the software for: 6-12 months

### "Apache Spark: Ignite Your Data Processing Power"

May 18, 2023

5.0

Pros

Apache Spark is a powerful, free tool that handles big data efficiently, works well with other programming languages, and offers a wide variety of libraries and techniques for high performance

Cons

Apache Spark struggles with small data sets, lacks automatic optimization, and could improve in providing more diverse machine learning libraries and managing memory shortage

Switched from

[Apache Hive](https://www.capterra.com/p/170238/Apache-Hive/)

Spark offers a more robust and diverse ecosystem today compared to Hive

Review Source

Shaunak P.  
Data Platform Engineer  
Information Technology and Services  
Used the software for: 2+ years

### "World's best distributed compute engine for big data!"

January 17, 2023

5.0

I work on data-intensive applications and thus spark was the best pick. It is open-source and is maintained by top-notch engineers in the world. The best thing is that is free and will be free until you use a third party SaaS product like Azure Databricks or Synapse Analytics.

Pros

I am a user of Apache Spark for more than 2 years now, and on daily basis. It handles the large dataset easily by distributing the data across a cluster or nodes which has its own memory and compute. This unification makes it very efficient to work on big data projects. We can submit the jobs to spark via interactive or batch methods. Developers and engineers can write code without worrying about knowing the underlying technology that handles the massive parallel operations.

Cons

There are no cons at all. The performance is all about how to fine tune your clusters. And a lot of API configs are available to achieve it. Languages like Python makes it much easier than Scala to get started with Apache Spark.

Review Source

VR

Verified Reviewer  
BigData Engineer  
Information Technology and Services  
Used the software for: 1-2 years

### "Apache Spark brings that spark in your Data Engineering life"

February 18, 2023

4.0

Process large data in a distributed fashion that increases the efficiency by delivering the best results that cannot be achieved by traditional programming.

Pros

Ofcourse it has a lot of pros, but to name a few I liked are, firstly it is very fast in distributed computing with its fault tolerant architecture implemented. Second, great for solving complex logical transformations with the support of python, scala and other popular languages. Third, can tune the usage of spark for the best performance using persist, parallelism techniques, caching techniques. Lastly, a good set of libraries to help execute the daily jobs.

Cons

As the spark optimizations are done to handle large data, it does not perform well when you have a small amount of data. Another downside is that it has to depend on external storage systems because it only does in-memory computation but lacks storage management system, memory out of bound when there is shortage of memory.

Review Source

VR

Verified Reviewer  
Data Scientist  
Automotive  
Used the software for: 2+ years

### "Best Product in the Market for Big Data Distributed Computing!"

September 9, 2022

5.0

It has been a great experience using Apache spark with python, it has helped us reduce our data processing time by 90% , thanks to its master-worker nodes architecture which has helped us achieve a distributed computing.

Pros

1.In memory computation makes it faster 2.Can be deployed on scalable platform like Kubernetes hence making it suitable for big data processing 3.Supports SDKs in multiple languages 4.Easy to understand architecture 5.Good Support for Machine learning modules in Pyspark

Cons

1\. Manual optimization has to be done to achieve high parallel processing 2. Lesser documentation support makes it difficult to solve runtime issues 3. Less customization option available

Reason for choosing Apache Spark

Because in memory computation in spark makes it relatively fast to process high volume of data

Review Source

VR

Verified Reviewer  
Software Engineer  
Computer Software  
Used the software for: 1-2 years

### "Powerful and Open-source"

March 18, 2023

4.0

Pros

For me processing was so fast, I compared it with Hadoop on really big data, and it performed better.

Cons

One thing is that there is no automatic optimization process. And there is not much variety of machine learning libraries and algorithms as other big data frameworks.

Review Source

Paweł W.  
Software Engineer  
Computer Software  
Used the software for: 6-12 months

### "Apache Spark for Big Data processing"

September 18, 2022

5.0

Fantastic Open Source tool with pretty much no competitor

Pros

Apache Spark is a fantastic Open Source project rich in a lot of features connected to Big Data processing. Setting up jobs for processing data streams and creating batch jobs is superb easy. It is very efficient and the provides API in a number of programming languages

Cons

The more advanced concepts are quite hard to grasp

Review Source

EP

Elena P.  
Principal  
Management Consulting  
Used the software for: 1-2 years

### "Ottimizzazione spazio di archiviazione dei big data e analisi"

October 15, 2023

5.0

Spark ci ha permesso di ottimizzare lo spazio in memoria, risolvendo il problema dell’acquisto di grandi (e costosi) spazi di archiviazione. Le sue funzioni di sicurezza e protezione dei dati, inoltre, garantiscono l’integrità di quanto salvato.

Pros

Adoro la velocità di elaborazione dei dati, la versatilità nell’approccio di elaborazione e l’incredibile ottimizzazione di uso della memoria che si può ottenere con Spark. Notevole l’ecosistema di integrazioni possibili mediante API. Facile da usare, così come semplice è l’integrazione e l’implementazione con i sistemi informatici dell’azienda. Ottimo il supporto del servizio clienti, ma soprattutto quello della community!

Cons

Con elevata complessità all’interno dei dati è difficile individuare e risolvere bug , specialmente se parliamo di elaborazioni di big data. Non è adatto a lavorare su database di piccole dimensioni.

Review Source

AKD

Ashok Kumar D.  
Functional Analyst  
Information Technology and Services  
Used the software for: 1-2 years

### "Analyze Your Data with Apache Spark"

September 3, 2023

5.0

Pros

Excellent in processing real time data.Steam processing is good.

Cons

Machine learning should be improved a level better.

Review Source

SN

Shilpa N.  
Pricing Lead  
Information Technology and Services  
Used the software for: 1-2 years

### "Boon of Data-Apache Spark"

September 18, 2023

5.0

Pros

It is like inbuilt of new technologies together, which works without any fail.It collects data from various sources and works on real time.

Cons

Need more memory.

Review Source

Meale Y.  
Seller  
Construction  
Used the software for: 6-12 months

### "Data transformation"

August 25, 2023

3.0

Pros

Apache spark is a good software when it is a question of transforming data and analysing it.

Cons

The software is not perfect when i need many joins.

Review Source

VD

Vinoth D.  
Quality Analyst  
Information Technology and Services  
Used the software for: 6-12 months

### "Handle Big Data with Apache Spark"

September 5, 2023

5.0

Pros

It likes the most is Machine Learning for larger data.It handles more data very fast.Error Detection very prominent.

Cons

Lack of Memory.Difficult to learn.Need customer support.

Review Source

AT

Arun T.  
Business Analyst  
Information Technology and Services  
Used the software for: 6-12 months

### "For Big Data-Apache Spark"

October 5, 2023

5.0

Pros

It helps to process the large data quickly.It has good memory queue.It Process in real time.

Cons

Initially difficult to learn for beginners.

Review Source

Similar Products

Featured

## Related categories

[Data Analysis](https://www.capterra.com/data-analysis-software/)

## Send me user reviews about this product

### Fill out the form and we'll send a list of the top-rated software based on real user reviews directly to your inbox.