Community
cancel
Showing results for 
Search instead for 
Did you mean: 
zuo_z_
Beginner
97 Views

run spark kmeans with daal throw jvm error

I run SampleKmeans.java  in spark cluster with daal:( data size Kmeans10w.csv 174m)

bin/spark-submit \
--class daal.SampleKmeans \
--master yarn-client \
--num-executors 10 \
--executor-memory 30g \
--driver-memory 30g \
--executor-cores 2 \
--jars /opt/zzb/algebird-core_2.10-0.5.0.jar,/opt/zzb/daal.jar \
/opt/zzb/grider_2.10-0.1.0.jar \
yarn-client /hdfs/matrix/Kmeans10w.csv 100 20 5 1000

throws error :

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fc7f0d9892f, pid=19166, tid=140499757270784
#
# JRE version: Java(TM) SE Runtime Environment (7.0_71-b14) (build 1.7.0_71-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.71-b01 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libJavaAPI.so+0x255b92f]  daal::algorithms::kmeans::init::internal::KMeansinitStep2MasterKernel<(daal::algorithms::kmeans::init::Method)1, double, (daal::CpuType)3>::compute(unsigned long, daal::data_management::interface1::NumericTable const* const*, unsigned long, daal::data_management::interface1::NumericTable const* const*, daal::algorithms::kmeans::init::interface1::Parameter const*)+0x35f
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /opt/spark/hs_err_pid19166.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

Heap
PSYoungGen      total 1529856K, used 759185K [0x0000000795500000, 0x0000000800000000, 0x0000000800000000)
  eden space 1311744K, 55% used [0x0000000795500000,0x00000007c19ba6a0,0x00000007e5600000)
  from space 218112K, 15% used [0x00000007e5600000,0x00000007e76aa088,0x00000007f2b00000)
  to   space 218112K, 0% used [0x00000007f2b00000,0x00000007f2b00000,0x0000000800000000)
ParOldGen       total 3495424K, used 80K [0x00000006bff80000, 0x0000000795500000, 0x0000000795500000)
  object space 3495424K, 0% used [0x00000006bff80000,0x00000006bff94020,0x0000000795500000)
PSPermGen       total 56320K, used 56063K [0x00000006aff80000, 0x00000006b3680000, 0x00000006bff80000)
  object space 56320K, 99% used [0x00000006aff80000,0x00000006b363fd68,0x00000006b3680000)
 

then i set jvm params:

bin/spark-submit \
--class daal.SampleKmeans \
--master yarn-client \
--num-executors 10 \
--executor-memory 30g \
--driver-memory 30g \
--executor-cores 2 \
--conf "spark.driver.extraJavaOptions=-Xms5g -Xmx5g -XX:PermSize=1g -XX:MaxPermSize=1g" \
--jars /opt/zzb/algebird-core_2.10-0.5.0.jar,/opt/zzb/daal.jar \
/opt/zzb/grider_2.10-0.1.0.jar \

throws errors still。The log i can see::

GC Heap History (2 events):
Event: 34.345 GC heap before
{Heap before GC invocations=1 (full 0):
 PSYoungGen      total 1529856K, used 1311744K [0x0000000795500000, 0x0000000800000000, 0x0000000800000000)
  eden space 1311744K, 100% used [0x0000000795500000,0x00000007e5600000,0x00000007e5600000)
  from space 218112K, 0% used [0x00000007f2b00000,0x00000007f2b00000,0x0000000800000000)
  to   space 218112K, 0% used [0x00000007e5600000,0x00000007e5600000,0x00000007f2b00000)
 ParOldGen       total 3495424K, used 0K [0x00000006bff80000, 0x0000000795500000, 0x0000000795500000)
  object space 3495424K, 0% used [0x00000006bff80000,0x00000006bff80000,0x0000000795500000)
 PSPermGen       total 1048576K, used 40067K [0x000000067ff80000, 0x00000006bff80000, 0x00000006bff80000)
  object space 1048576K, 3% used [0x000000067ff80000,0x00000006826a0d88,0x00000006bff80000)
Event: 34.382 GC heap after
Heap after GC invocations=1 (full 0):
 PSYoungGen      total 1529856K, used 33582K [0x0000000795500000, 0x0000000800000000, 0x0000000800000000)
  eden space 1311744K, 0% used [0x0000000795500000,0x0000000795500000,0x00000007e5600000)
  from space 218112K, 15% used [0x00000007e5600000,0x00000007e76cba88,0x00000007f2b00000)
  to   space 218112K, 0% used [0x00000007f2b00000,0x00000007f2b00000,0x0000000800000000)
 ParOldGen       total 3495424K, used 80K [0x00000006bff80000, 0x0000000795500000, 0x0000000795500000)
  object space 3495424K, 0% used [0x00000006bff80000,0x00000006bff94020,0x0000000795500000)
 PSPermGen       total 1048576K, used 40067K [0x000000067ff80000, 0x00000006bff80000, 0x00000006bff80000)
  object space 1048576K, 3% used [0x000000067ff80000,0x00000006826a0d88,0x00000006bff80000)
}

then i set jvm params bigger like :--conf "spark.driver.extraJavaOptions=-Xms10g -Xmx10g -XX:PermSize=1g -XX:MaxPermSize=1g" \

throws errors still。The log i can see:

Event: 39.652 GC heap before
{Heap before GC invocations=1 (full 0):
 PSYoungGen      total 3058688K, used 2621952K [0x000000072aa80000, 0x0000000800000000, 0x0000000800000000)
  eden space 2621952K, 100% used [0x000000072aa80000,0x00000007cab00000,0x00000007cab00000)
  from space 436736K, 0% used [0x00000007e5580000,0x00000007e5580000,0x0000000800000000)
  to   space 436736K, 0% used [0x00000007cab00000,0x00000007cab00000,0x00000007e5580000)
 ParOldGen       total 6990848K, used 0K [0x000000057ff80000, 0x000000072aa80000, 0x000000072aa80000)
  object space 6990848K, 0% used [0x000000057ff80000,0x000000057ff80000,0x000000072aa80000)
 PSPermGen       total 1048576K, used 42037K [0x000000053ff80000, 0x000000057ff80000, 0x000000057ff80000)
  object space 1048576K, 4% used [0x000000053ff80000,0x000000054288d580,0x000000057ff80000)
Event: 39.695 GC heap after
Heap after GC invocations=1 (full 0):
 PSYoungGen      total 3058688K, used 35054K [0x000000072aa80000, 0x0000000800000000, 0x0000000800000000)
  eden space 2621952K, 0% used [0x000000072aa80000,0x000000072aa80000,0x00000007cab00000)
  from space 436736K, 8% used [0x00000007cab00000,0x00000007ccd3b9a8,0x00000007e5580000)
  to   space 436736K, 0% used [0x00000007e5580000,0x00000007e5580000,0x0000000800000000)
 ParOldGen       total 6990848K, used 80K [0x000000057ff80000, 0x000000072aa80000, 0x000000072aa80000)
  object space 6990848K, 0% used [0x000000057ff80000,0x000000057ff94020,0x000000072aa80000)
 PSPermGen       total 1048576K, used 42037K [0x000000053ff80000, 0x000000057ff80000, 0x000000057ff80000)
  object space 1048576K, 4% used [0x000000053ff80000,0x000000054288d580,0x000000057ff80000)
}

the SampleKmeans.java is the example include in daal project。the size of the test data "Kmeans10w.csv" is 174m and i can run it with with miLlib very well。

It seems daal does not work well, anybody can help me , thanks!!!

0 Kudos
2 Replies
Ilya_B_Intel
Employee
97 Views

Hi Zuo

Can you also tell how did you modify SampleKmeans.java to load your data file?

Did you modify SparkKmeans.java in the following part? 

    private static final long nBlocks   = 1;
    private static final long nClusters = 20;
    private static final int nIterations = 5;
    private static final int nVectorsInBlock = 10000;

Do I understand correctly, that your data has 174*10^6 rows, what number of columns is there?

zuo_z_
Beginner
97 Views

Hi ILYA B,thanks for your reply.

i modify the Main method of SampleKmeans.java as follow:

    public static void main(String[] args) {
        long start = System.currentTimeMillis();
        DaalContext context = new DaalContext();

        /* Create a JavaSparkContext that loads defaults from system properties and the classpath and sets the name */
        JavaSparkContext sc = new JavaSparkContext(new SparkConf().setAppName("Spark Kmeans").setMaster(args[0]));

        /* Read from distributed HDFS data set from path */
        StringDataSource templateDataSource = new StringDataSource( context, "" );
        DistributedHDFSDataSet dd = new DistributedHDFSDataSet(args[1], templateDataSource );

        long nBlocks   = Integer.valueOf(args[2]);
        long nClusters = Integer.valueOf(args[3]);
        int nIterations = Integer.valueOf(args[4]);
        int nVectorsInBlock = Integer.valueOf(args[5]);

        JavaPairRDD<Integer, HomogenNumericTable> dataRDD = dd.getAsPairRDDPartitioned(sc, Integer.valueOf(nBlocks + ""), nVectorsInBlock);
        long start1 = System.currentTimeMillis();

       // JavaPairRDD<Integer, HomogenNumericTable> repartition = dataRDD.repartition(Integer.valueOf(nBlocks + ""));

        /* Compute k-means for dataRDD */
        SparkKmeans.nBlocks = nBlocks;
        SparkKmeans.nClusters = nClusters;
        SparkKmeans.nIterations = nIterations;
        SparkKmeans.nVectorsInBlock = nVectorsInBlock;

        SparkKmeans.KmeansResult result = SparkKmeans.runKmeans(context, dataRDD);
        long end = System.currentTimeMillis();

        /* Print results */
        //HomogenNumericTable Centroids  = result.centroids;
        //printNumericTable("First 10 dimensions of centroids:", Centroids, 20, 10);
        System.out.println("load time " + (start1 - start)/1000 + "s.");
        System.out.println("run time " + (end - start1)/1000 + "s.");
        System.out.println("part " + dataRDD.partitions().size());

        sc.stop();
        context.dispose();
    }

and  SparkKmeans.java as follow :

    static long nBlocks   = 1;
    static long nClusters = 20;
    static int nIterations = 5;
    static int nVectorsInBlock = 10000;

my data has 100 * 1000 rows。 there is 100 features per row. the capability size of the data file is 174.3m 。

any errors in my code ,thanks!

 

 

Reply