Skip to content Skip to sidebar Skip to footer
Showing posts with the label Apache Spark

Rdd Collect Issue

I configured a new system, spark 2.3.0, python 3.6.0, dataframe read and other operations working a… Read more Rdd Collect Issue

Error Pythonudfrunner: Python Worker Exited Unexpectedly (crashed)

I am running a PySpark job that calls udfs. I know udfs are bad with memory and slow due to seriali… Read more Error Pythonudfrunner: Python Worker Exited Unexpectedly (crashed)

Sparkexception: Python Worker Failed To Connect Back When Execute Spark Action

When I try to execute this command line at pyspark arquivo = sc.textFile('dataset_analise_senti… Read more Sparkexception: Python Worker Failed To Connect Back When Execute Spark Action

Sum In Spark Gone Bad

Based on Unbalanced factor of KMeans?, I am trying to compute the Unbalanced Factor, but I fail. Ev… Read more Sum In Spark Gone Bad

Cartesian Product Of Two Rdd In Spark

I am completely new to Apache Spark and I trying to Cartesian product two RDD. As an example I have… Read more Cartesian Product Of Two Rdd In Spark

Pyspark Outofmemoryerrors When Performing Many Dataframe Joins

There's many posts about this issue, but none have answered my question. I'm running into O… Read more Pyspark Outofmemoryerrors When Performing Many Dataframe Joins