Whats new in Spark will look the activity in the Spark commit logs every week and attempt to summarize what new features and bug fixes have occurred. This not intended to summarize everything, mostly things that might be useful to application developers. Without further ado lets get started:
- zipPartions was added to the Scala & Java APIs. zipPartions joins an up to 4 RDDs using a user supplied function. It requires that all of the RDDs have the same number of partitions with no such requirement on size within each partition. https://github.com/mesos/spark/commit/c9c4954d994c5ba824e71c1c5cd8d5de531caf78 https://github.com/mesos/spark/commit/c9c4954d994c5ba824e71c1c5cd8d5de531caf78
- unPersist() was added to the Scala & Java APIs allow the removal of an RDD from persistence once it is no longer needed. https://github.com/mesos/spark/commit/93091f6936262a4006d875bf69b3f8c31c291617
- Bugfix was added to validate that local directories can be created when being added. https://github.com/mesos/spark/commit/c9c4954d994c5ba824e71c1c5cd8d5de531caf78
- Spark’s block UI manager had a bug with Spark Streaming blocks which was fixed https://spark-project.atlassian.net/browse/SPARK-740 https://github.com/mesos/spark/commit/538ee755b41585c638935a93ec838b635149f659
- The shuffle writer now looks at spark.shuffle.file.buffer.kb to determine the buffer to use. Previously the default buffer was 8kb which could cause a lot of unnecessary disk seeks, and the new default is 100kb. https://github.com/mesos/spark/commit/1055785a836ab2361239f0937a1a22fee953e029




