ÉîÈëÃ÷È·Spark4.0.0ÖеÄRDD£º´óÊý¾Ý´¦Öóͷ£µÄ»ù´¡¼Ü¹¹
ÔÚÏÖÔÚÊý¾ÝÇý¶¯µÄʱ´ú£¬ÕÆÎÕ¸ßЧµÄÊý¾Ý´¦Öóͷ£ÊÖÒÕ³ÉΪÿ¸ö´óÊý¾Ý¿ª·¢Õߵıر¸ÊÖÒÕ¡£Spark×÷Ϊµ±½ñ×îÊ¢ÐеÄÂþÑÜʽÅÌËã¿ò¼ÜÖ®Ò»£¬ÆäÖеĵ¯ÐÔÂþÑÜʽÊý¾Ý¼¯£¨RDD£©ÎÞÒÉÊÇÆä½¹µã»ùʯ¡£´Ó×î³õµÄ¼òÆÓ¿´·¨µ½ÏÖÔÚÑݱä³öµÄǿʢ¹¦Ð§£¬RDDΪ´ó¹æÄ£Êý¾ÝµÄ´æ´¢¡¢²Ù×÷¡¢×ª»»ÌṩÁ˼«¾ßÎÞаÐԺ͸ßÐÔÄܵĽâ¾ö¼Æ»®¡£
Ò»¡¢Ê²Ã´ÊÇRDD£¬ÎªÊ²Ã´ËüÔÆÔÆÖ÷Òª£¿RDD£¨ResilientDistributedDataset£©¿ÉÒÔÃ÷ȷΪһÖÖµ¯ÐÔ¡¢ÂþÑÜʽµÄÖ»¶ÁÊý¾ÝÜöÝÍ¡£ËüµÄ×î´óÓÅÊÆÔÚÓÚ¡°µ¯ÐÔ¡±£¬Òâζ×Å×ÝÈ»ÔÚ½Úµãʧ°ÜµÄÇéÐÎÏ£¬Ò²ÄÜͨ¹ýѪͳÐÅÏ¢ÖØÐÞɥʧµÄÊý¾Ý¡£Ïà±È¹Å°åµÄHadoopMapReduce£¬RDD¾ßÓиü¸ßµÄËÙÂʺ͸ü¸»ºñµÄ²Ù×÷½Ó¿Ú£¬¼«´óµØ¼ò»¯ÁË´óÊý¾ÝµÄ±à³ÌÖØÆ¯ºó¡£
ÎȹÌÐÔ£ºÒ»µ©½¨É裬RDDÖеÄÊý¾ÝÊDz»¿É±äµÄ¡£ÕâËäÈ»ÏÞÖÆÁËÖ±½ÓÐ޸쬵«´øÀ´Á˸üºÃµÄÈÝ´íÐԺͲ¢ÐÐÐÔÄÜ¡£·ÖÇø»úÖÆ£ºÊý¾ÝÔÚ¼¯ÈºÖб»»®·ÖΪ¶à¸ö·ÖÇø£¬Ö§³Ö²¢ÐвÙ×÷£¬´ó´óÌáÉý´¦Öóͷ£ËÙÂÊ¡£³¤ÆÚ»¯ÄÜÁ¦£º¿ÉÒÔ½«RDD´æÈëÄÚ´æ»òÕßÓ²ÅÌ£¬Àû±ãºóÐø¶à´Î²Ù×÷£¬Ìá¸ßЧÂÊ¡£
»ùÓÚѪͳµÄÈÝ´íÐÔ£ºÍ¨¹ýѪͳͼ£¬RDD¿ÉÒÔ¿ìËÙÖØÐÞɥʧµÄÊý¾Ý£¬²»ÒÀÀµÓÚ¸´ÖÆ»úÖÆ¡£
ת»»²Ù×÷£¨Transformation£©£ºÈçmap¡¢filter¡¢flatMap¡¢reduceByKeyµÈ£¬±¬·¢ÐµÄRDD£¬ÎªÊý¾ÝÁ÷µÄÒ»Ö±ÑݱäÌṩ»ù´¡¡£ÕâЩ²Ù×÷ÊǶèÐÔÇóÖµµÄ£¬Ö»Óд¥¿¯Ðж¯Ê±²Å×îÏÈÖ´ÐС£Ðж¯²Ù×÷£¨Action£©£ºÈçcollect¡¢count¡¢reduceµÈ£¬ÓÃÓÚ´¥·¢Ã÷ʵµÄÔËË㣬·µ»ØÐ§¹û»ò½«Ð§¹ûдÈë´æ´¢¡£
ËÄ¡¢´ÓÈëÃŵ½ÐÑÄ¿£º¹¹½¨ÄãµÄµÚÒ»¸öRDD³ÌÐò¹ØÓÚÐÂÊÖÀ´Ëµ£¬Ã÷È·RDDµÄʵÀý²Ù×÷ÓÈΪÖ÷Òª¡£Ê¾Àý£º¶ÁÈ¡Îı¾Îļþ£¬¾ÙÐе¥´Ê¼ÆÊý
vallines=sparkContext.textFile("hdfs://·¾¶/Îļþ")valwords=lines.flatMap(_.split(""))valwordCounts=words.map(word=>(word,1)).reduceByKey(_+_)wordCounts.collect().foreach(println)
Õâ¸ö¼òÆÓÀý×Ó£¬ÔÚÏÖʵÉú²úÇéÐÎÖпÉÒÔ´¦Öóͷ£º£Á¿Êý¾ÝµÄÂþÑÜʽÅÌË㣬ÕÃÏÔÁËRDDµÄǿʢÓë±ãµ±¡£
Êý¾Ý·ÖÇøÕ½ÂÔ£ººÏÀíÉèÖ÷ÖÇøÊý£¬×èÖ¹Êý¾ÝÇãб³¤ÆÚ»¯Õ½ÂÔ£ºÆ¾Ö¤ÐèҪѡÔñ´æÈëÄÚ´æÕÕ¾ÉÓ²ÅÌÁ¬Ïµ»º´æ£º½«Öظ´Ê¹ÓõÄRDD»º´æ£¬ïÔÌÖØ¸´ÅÌËã×ÊÔ´ÉèÖ㺺ÏÀíµ÷ÅäSpark¼¯Èº×ÊÔ´£¬ÌáÉý×÷ÒµÐÔÄÜ
Áù¡¢Î´À´Õ¹Íû£ºRDDÔÚSparkÉú̬ÖеÄְλËäÈ»½üÄêÀ´DataFrameºÍDatasetµÈ¸ß¼¶APIÖð½¥Ê¢ÐУ¬µ«RDDÒÀÈ»Ôڵײã²Ù×÷ºÍÖØ´óÊý¾Ý´¦Öóͷ£³¡¾°ÖÐÊÎÑÝ×Ų»¿ÉÌæ»»µÄ½ÇÉ«¡£Spark4.0.0¼ÌÐøÍÆ¶¯RDDµÄÐÔÄÜÓÅ»¯ºÍÒ×ÓÃÐÔÌáÉý£¬Îª´óÊý¾ÝÆÊÎöÌṩ¸ü¼áʵµÄ»ù´¡¡£
×ܽ᣺Ã÷È·ºÍÕÆÎÕSparkµÄ½¹µã¡ª¡ªRDD£¬ÊÇÿ¸ö´óÊý¾Ý´ÓÒµÕßÂõÏòÊý¾Ý´¦Öóͷ£¾Þ½³µÄµÚÒ»²½¡£´Ó»ù´¡µÄ¿´·¨£¬µ½¸»ºñµÄ²Ù×÷£¬ÉõÖÁµ½ÐÔÄܵ÷ÓÅ£¬ÉîÈëѧϰRDDµÄ·½·½ÃæÃ棬¶¼½«ÎªÄãµÄSpark¿ª·¢õè¾¶µÓÚ¨¼áʵµÄ»ù´¡¡£ÏÂÒ»²¿·Ö£¬ÈÃÎÒÃÇÉîÈë̽ÌÖÔõÑùÔÚÏÖʵÏîÄ¿ÖиßЧʹÓÃRDD£¬²¢Á¬Ïµ×îеÄSpark4.0.0ÌØÕ÷£¬¿ªÆôÄãµÄ´óÊý¾ÝÐÂʱ´ú¡£
ʵսָÄÏ£ºÔÚSpark4.0.0ÖиßЧʹÓÃRDD£¬ÍÚ¾ò´óÊý¾Ý¼ÛÖµ
ÉÏÒ»²¿·Ö£¬ÎÒÃÇ´ÓÀíÂÛ²ãÃæÏµÍ³½â¶ÁÁËRDDµÄ½¹µã¿´·¨ºÍ»ù±¾²Ù×÷¡£ÔÚÕâÒ»²¿·Ö£¬ÎÒÃǽ«×ªÏòʵսӦÓã¬×ÊÖúÄãÔÚÏÖʵÏîÄ¿Öгä·ÖÑéÕ¹RDDµÄDZÁ¦¡£Á¬ÏµSpark4.0.0µÄ×îÐÂÌØÕ÷£¬Ì½Ë÷ÔõÑùÓÅ»¯ÐÔÄÜ¡¢¼ò»¯¿ª·¢Á÷³Ì£¬ÒÔ¼°Ó¦¶ÔÖØ´óÊý¾Ý³¡¾°¡£
Ò»¡¢Á¬ÏµSpark4.0.0ÌØÕ÷Ë¢ÐÂÃ÷È·RDDÔËÓÃËæ×ÅSpark4.0.0°æ±¾µÄÍÆ³ö£¬Ðí¶àµ×²ãÐÔÄܺÍAPIʹÓö¼»ñµÃÔöÇ¿£¬ÓÈÆäÊÇÔÚѪͳ׷×Ù¡¢ÈÝ´í»úÖÆÒÔ¼°×ÊÔ´µ÷Àí·½Ãæ¡£Ð°æµÄRDD¸üºÃµØÖ§³Ö´óÊý¾ÝÇéÐεĶàÑùÐèÇó£¬ÌØÊâÊÇÓëDataFrameºÍDatasetµÄÐ×÷£¬ÈÃÊý¾ÝµÄÎÞа´¦Öóͷ£¸üÉÏÒ»²ãÂ¥¡£
·ÖÇøÓÅ»¯ºÏÀíÉèÖ÷ÖÇøÊý£¬ÒÀ¾ÝÊý¾ÝÁ¿ºÍ¼¯Èº×ÊÔ´µ÷½â¡£¹ýÉٵķÖÇø¿ÉÄܵ¼Ö½ڵã¿ÕÏУ¬¹ý¶àÔò´øÀ´µ÷Àí¿ªÏú¡£Ê¹ÓÃrepartition()»òcoalesce()¿ÉÒÔÎÞаµ÷½â¡£³¤ÆÚ»¯Õ½ÂÔ¹ØÓÚ¶à´ÎʹÓõÄRDD£¬½¨Ò黺´æÖÁÄÚ´æ»ò´æÓ²ÅÌ¡£ºÃ±È£ºvalcachedRDD=largeRDD.persist(StorageLevel.MEMORY_AND_DISK)×èÖ¹Êý¾ÝÇãбijЩ²Ù×÷£¨ÈçgroupByKey£©ÈÝÒ×Òý·¢Êý¾ÝÇãб£¬Ó°ÏìÕûÌåÐÔÄÜ¡£
¿ÉÒÔͨ¹ýµ÷Àí·ÖÇø»òÔ¤¾ÛºÏ½â¾ö¡£Ê¹ÓÃѹËõÓëÐòÁл¯Ñ¹ËõRDDÖеÄÊý¾Ý£¬ïÔÌÄÚ´æÕ¼Ó㬲¢Ñ¡Ôñ¸ßЧÐòÁл¯ÃûÌã¬Ìá¸ßIOÐÔÄÜ¡£
Èý¡¢¹¹½¨ÖØ´óµÄÊý¾Ý´¦Öóͷ£Á÷³ÌÁ¬Ïµ¶à²Ù×÷Á´Éè¼Æ£¬ÊµÏÖÖØ´óµÄÂß¼´¦Öóͷ£¡£ºÃ±È£¬Êý¾Ýϴ媺óÔÙ¾ÙÐÐÌØÕ÷ÌáÈ¡£¬×îºó´æÈëÊý¾Ý¿ÍÕ»¡£Ê¾Àý£º
valcleanedData=rawData.filter(_.isValid).map(transform).reduceByKey(_+_)valenrichedData=cleanedData.join(otherData).map(....)enrichedData.saveAsTextFile("hdfs://·¾¶/Êä³öĿ¼")
ËÄ¡¢Á¬ÏµÍⲿ´æ´¢ºÍÏÖ´ú»¯¹¤¾ß²»µ«ÏÞÓÚHDFS£¬¿ÉÒÔÓëKafka¡¢Cassandra¡¢ElasticsearchµÈÁ¬Ïµ£¬¹¹½¨ÊµÊ±»ò½üʵʱµÄ´¦Öóͷ£Á÷Ë®Ïß¡£
ÓÅ»¯ÑªÍ³×·×ÙË¢ÐÂ×ÊÔ´µ÷ÅäÓëµ÷ÀíÔöÇ¿ÈÝ´íÄÜÁ¦Ö§³Ö¸üÎÞаµÄAPIÀ©Õ¹
ÈÕÖ¾ÆÊÎöÓû§ÐÐΪÆÊÎöÊµÊ±ÍÆ¼öº£Á¿Îı¾´¦Öóͷ£Á¬ÏµRDDµÄµ¯ÐÔ£¬Ö§³Öº£Á¿Êý¾ÝµÄ¸ßËÙÂþÑÜʽ´¦Öóͷ££¬È·±£ÖÖÖÖÓªÒµÐèÇóµÃÒÔ¸ßЧÍê³É¡£
ǰÆÚÍýÏ룺Ã÷È·Êý¾ÝÔ´¡¢´¦Öóͷ£Âß¼ºÍÄ¿µÄ´æ´¢Éè¼ÆºÏÀíµÄ·ÖÇøºÍ³¤ÆÚ»¯»úÖÆÕë¶ÔÏêϸӦÓÃÓÅ»¯µ÷Àí²ÎÊýÒýÈë¼à¿Ø¹¤¾ß£¬ÊµÊ±ÊÓ²ì×÷ÒµÐÔÄܰ´ÆÚά»¤ºÍÉý¼¶Spark¼¯Èº£¬Ê¹ÓÃÐÂÌØÕ÷ÌáÉýÐÔÄÜ
°Ë¡¢Î´À´Ç÷ÊÆ£ºRDD+AI+ÔÆ¶ËËæ×ÅÈ˹¤ÖÇÄܺÍÔÆÅÌËãµÄÉú³¤£¬RDDµÄ½ÇɫҲÔÚÒ»Ö±Ñݱ䡣´Ó»ù´¡µÄÊý¾Ý´¦Öóͷ££¬µ½Á¬ÏµMLlibʵÏÖ»úеѧϰ£¬ÔÙµ½ÔÚÔÆ¶ËʵÏÖµ¯ÐÔÉìËõ£¬RDDÔÚ´óÊý¾ÝÉú̬ÖеÄְ뽫Óú·¢Ö÷Òª¡£
×Ü½á£ºÕÆÎÕÁËRDDµÄÀíÂÛ»ù´¡ºó£¬½«ÆäÓ¦Óõ½ÏÖʵ³¡¾°ÖУ¬Á¬ÏµSpark4.0.0µÄÖî¶àÐÂÌØÕ÷£¬ÄÜÈÃÄãÔÚ´óÊý¾ÝÖ®º£ÖÐÓÎÈÐÓÐÓà¡£ÆñÂÛÊÇÅú´¦Öóͷ£¡¢Á÷´¦Öóͷ££¬ÕÕ¾ÉÖØ´óµÄETLÁ÷³Ì£¬RDD¶¼ÄÜΪÄãÌṩ×î¼áʵµÄ»ù´¡¡£¼ÌÐøÌ½Ë÷£¬Ò»Á¬ÓÅ»¯£¬Ä㽫ÔÚ´óÊý¾ÝÁìÓòʵÏÖÔ½À´Ô½¶àµÄÊÂÒµ£¡