窗口转换算子预览
之前扒源码看到过Flink的窗口有很多种:
packageorg.apache.flink.streaming.api.windowing.assigners;/**
* A {@code WindowAssigner} assigns zero or more {@link Window Windows} to an element.
*
* <p>In a window operation, elements are grouped by their key (if available) and by the windows to
* which it was assigned. The set of elements with the same key and window is called a pane. When a
* {@link Trigger} decides that a certain pane should fire the {@link
* org.apache.flink.streaming.api.functions.windowing.WindowFunction} is applied to produce output
* elements for that pane.
*
* @param <T> The type of elements that this WindowAssigner can assign windows to.
* @param <W> The type of {@code Window} that this assigner assigns.
*/@PublicEvolvingpublicabstractclassWindowAssigner<T,WextendsWindow>implementsSerializable{privatestaticfinallong serialVersionUID =1L;/**
* Returns a {@code Collection} of windows that should be assigned to the element.
*
* @param element The element to which windows should be assigned.
* @param timestamp The timestamp of the element.
* @param context The {@link WindowAssignerContext} in which the assigner operates.
*/publicabstractCollection<W>assignWindows(T element,long timestamp,WindowAssignerContext context);/** Returns the default trigger associated with this {@code WindowAssigner}. */publicabstractTrigger<T,W>getDefaultTrigger(StreamExecutionEnvironment env);/**
* Returns a {@link TypeSerializer} for serializing windows that are assigned by this {@code
* WindowAssigner}.
*/publicabstractTypeSerializer<W>getWindowSerializer(ExecutionConfig executionConfig);/**
* Returns {@code true} if elements are assigned to windows based on event time, {@code false}
* otherwise.
*/publicabstractbooleanisEventTime();/**
* A context provided to the {@link WindowAssigner} that allows it to query the current
* processing time.
*
* <p>This is provided to the assigner by its containing {@link
* org.apache.flink.streaming.runtime.operators.windowing.WindowOperator}, which, in turn, gets
* it from the containing {@link org.apache.flink.streaming.runtime.tasks.StreamTask}.
*/publicabstractstaticclassWindowAssignerContext{/** Returns the current processing time. */publicabstractlonggetCurrentProcessingTime();}}
窗口算子需要传的参数WindowAssigner有如下实现类:
源码:
packageorg.apache.flink.streaming.api.datastream;publicclassKeyedStream<T, KEY>extendsDataStream<T>{// ------------------------------------------------------------------------// Windowing// ------------------------------------------------------------------------/**
* Windows this {@code KeyedStream} into tumbling time windows.
*
* <p>This is a shortcut for either {@code .window(TumblingEventTimeWindows.of(size))} or {@code
* .window(TumblingProcessingTimeWindows.of(size))} depending on the time characteristic set
* using {@link
* org.apache.flink.streaming.api.environment.StreamExecutionEnvironment#setStreamTimeCharacteristic(org.apache.flink.streaming.api.TimeCharacteristic)}
*
* @param size The size of the window.
* @deprecated Please use {@link #window(WindowAssigner)} with either {@link
* TumblingEventTimeWindows} or {@link TumblingProcessingTimeWindows}. For more information,
* see the deprecation notice on {@link TimeCharacteristic}
*/@DeprecatedpublicWindowedStream<T, KEY,TimeWindow>timeWindow(Time size){if(environment.getStreamTimeCharacteristic()==TimeCharacteristic.ProcessingTime){returnwindow(TumblingProcessingTimeWindows.of(size));}else{returnwindow(TumblingEventTimeWindows.of(size));}}/**
* Windows this {@code KeyedStream} into sliding time windows.
*
* <p>This is a shortcut for either {@code .window(SlidingEventTimeWindows.of(size, slide))} or
* {@code .window(SlidingProcessingTimeWindows.of(size, slide))} depending on the time
* characteristic set using {@link
* org.apache.flink.streaming.api.environment.StreamExecutionEnvironment#setStreamTimeCharacteristic(org.apache.flink.streaming.api.TimeCharacteristic)}
*
* @param size The size of the window.
* @deprecated Please use {@link #window(WindowAssigner)} with either {@link
* SlidingEventTimeWindows} or {@link SlidingProcessingTimeWindows}. For more information,
* see the deprecation notice on {@link TimeCharacteristic}
*/@DeprecatedpublicWindowedStream<T, KEY,TimeWindow>timeWindow(Time size,Time slide){if(environment.getStreamTimeCharacteristic()==TimeCharacteristic.ProcessingTime){returnwindow(SlidingProcessingTimeWindows.of(size, slide));}else{returnwindow(SlidingEventTimeWindows.of(size, slide));}}/**
* Windows this {@code KeyedStream} into tumbling count windows.
*
* @param size The size of the windows in number of elements.
*/publicWindowedStream<T, KEY,GlobalWindow>countWindow(long size){returnwindow(GlobalWindows.create()).trigger(PurgingTrigger.of(CountTrigger.of(size)));}/**
* Windows this {@code KeyedStream} into sliding count windows.
*
* @param size The size of the windows in number of elements.
* @param slide The slide interval in number of elements.
*/publicWindowedStream<T, KEY,GlobalWindow>countWindow(long size,long slide){returnwindow(GlobalWindows.create()).evictor(CountEvictor.of(size)).trigger(CountTrigger.of(slide));}/**
* Windows this data stream to a {@code WindowedStream}, which evaluates windows over a key
* grouped stream. Elements are put into windows by a {@link WindowAssigner}. The grouping of
* elements is done both by key and by window.
*
* <p>A {@link org.apache.flink.streaming.api.windowing.triggers.Trigger} can be defined to
* specify when windows are evaluated. However, {@code WindowAssigners} have a default {@code
* Trigger} that is used if a {@code Trigger} is not specified.
*
* @param assigner The {@code WindowAssigner} that assigns elements to windows.
* @return The trigger windows data stream.
*/@PublicEvolvingpublic<WextendsWindow>WindowedStream<T, KEY,W>window(WindowAssigner<?superT,W> assigner){returnnewWindowedStream<>(this, assigner);}}
与Window有关的算子有已经过时的2种timeWindow时间窗口、countWindow计数窗口,以及一个window窗口分配器算子。
过时的时间窗口算子
以处理时间为例子。事件时间的情况与之类似。之后尽量使用新API,但不会深究老API与老的DataSet批处理。
滚动时间窗口
packagecom.zhiyong.flinkStream;importcom.zhiyong.flinkStudy.FlinkWordCountDemo2FlatMapFunction;importcom.zhiyong.flinkStudy.WordCountSource1ps;importorg.apache.flink.api.common.eventtime.WatermarkStrategy;importorg.apache.flink.api.common.functions.FlatMapFunction;importorg.apache.flink.api.java.functions.KeySelector;importorg.apache.flink.api.java.tuple.Tuple2;importorg.apache.flink.streaming.api.TimeCharacteristic;importorg.apache.flink.streaming.api.datastream.DataStreamSource;importorg.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;importorg.apache.flink.streaming.api.environment.StreamExecutionEnvironment;importorg.apache.flink.streaming.api.windowing.time.Time;importorg.apache.flink.table.planner.expressions.In;importorg.apache.flink.util.Collector;/**
* @program: study
* @description: Flink窗口转换算子
* @author: zhiyong
* @create: 2022-03-23 23:59
**/publicclassWindowDemo{publicstaticvoidmain(String[] args)throwsException{StreamExecutionEnvironment env =StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);//过时方法,必须设置,才能正常使用算子,否则报错DataStreamSource<String> data = env.addSource(newWordCountSource1ps());SingleOutputStreamOperator<Tuple2<String,Integer>> result1 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).timeWindow(Time.seconds(30))//过时的滚动时间窗口.sum(1);
result1.print("过时的滚动时间窗口");
env.execute("streaming");}privatestaticclassFlatMapFunction1implementsFlatMapFunction<String,Tuple2<String,Integer>>{@OverridepublicvoidflatMap(String value,Collector<Tuple2<String,Integer>> out)throwsException{for(String cell:value.split("\\s+")){
out.collect(Tuple2.of(cell,1));}}}}
由于自定义数据源不带时间戳,不使用上述过时方法设置env会报错:
Exception in thread "main"org.apache.flink.runtime.client.JobExecutionException:Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
at org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:137)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:258)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1389)
at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:47)
at akka.dispatch.OnComplete.internal(Future.scala:300)
at akka.dispatch.OnComplete.internal(Future.scala:297)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:224)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:221)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
at org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$DirectExecutionContext.execute(AkkaFutureUtils.java:65)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:621)
at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:24)
at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:23)
at scala.concurrent.Future.$anonfun$andThen$1(Future.scala:532)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:63)
at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:100)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:100)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1067)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1703)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:172)Caused by:org.apache.flink.runtime.JobException:Recovery is suppressed by NoRestartBackoffTimeStrategy
at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138)
at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82)
at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:252)
at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:242)
at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:233)
at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:684)
at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79)
at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444)
at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethod)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316)
at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at akka.actor.Actor.aroundReceive(Actor.scala:537)
at akka.actor.Actor.aroundReceive$(Actor.scala:535)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
at akka.actor.ActorCell.invoke(ActorCell.scala:548)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
at akka.dispatch.Mailbox.run(Mailbox.scala:231)
at akka.dispatch.Mailbox.exec(Mailbox.scala:243)...4 more
Caused by:java.lang.RuntimeException:Record has Long.MIN_VALUE timestamp (= no timestamp marker).Is the time characteristic set to'ProcessingTime', or did you forget tocall'DataStream.assignTimestampsAndWatermarks(...)'?
at org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows.assignWindows(TumblingEventTimeWindows.java:83)
at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:293)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:233)
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:496)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
at java.lang.Thread.run(Thread.java:750)Process finished withexit code 1
为了使用该过时的时间窗口算子,使用了该过时方法。设置后可以成功运行:
过时的滚动时间窗口>(zhiyong13,1)
过时的滚动时间窗口>(zhiyong6,1)
过时的滚动时间窗口>(zhiyong0,1)
过时的滚动时间窗口>(zhiyong1,1)
过时的滚动时间窗口>(zhiyong18,1)
过时的滚动时间窗口>(zhiyong11,1)
过时的滚动时间窗口>(zhiyong8,1)
过时的滚动时间窗口>(zhiyong4,2)
过时的滚动时间窗口>(zhiyong17,3)
过时的滚动时间窗口>(zhiyong9,2)
过时的滚动时间窗口>(zhiyong19,2)
过时的滚动时间窗口>(zhiyong15,1)
过时的滚动时间窗口>(zhiyong5,3)
过时的滚动时间窗口>(zhiyong16,3)
过时的滚动时间窗口>(zhiyong14,3)Process finished withexit code 130
滑动时间窗口
SingleOutputStreamOperator<Tuple2<String,Integer>> result2 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).timeWindow(Time.seconds(10),Time.seconds(5))//过时的滑动时间窗口.sum(1);
result2.print("过时的滑动时间窗口");
不设置env同样会报错。设置后可以正常运行:
过时的滑动时间窗口>(zhiyong14,1)
过时的滑动时间窗口>(zhiyong14,1)
过时的滑动时间窗口>(zhiyong11,1)
过时的滑动时间窗口>(zhiyong4,1)
过时的滑动时间窗口>(zhiyong15,1)
过时的滑动时间窗口>(zhiyong3,1)
过时的滑动时间窗口>(zhiyong6,1)
过时的滑动时间窗口>(zhiyong6,1)
过时的滑动时间窗口>(zhiyong7,1)
过时的滑动时间窗口>(zhiyong11,1)
过时的滑动时间窗口>(zhiyong3,2)
过时的滑动时间窗口>(zhiyong16,1)
过时的滑动时间窗口>(zhiyong15,3)
过时的滑动时间窗口>(zhiyong4,1)
过时的滑动时间窗口>(zhiyong16,2)
过时的滑动时间窗口>(zhiyong3,1)
过时的滑动时间窗口>(zhiyong7,2)
过时的滑动时间窗口>(zhiyong1,1)
过时的滑动时间窗口>(zhiyong15,2)
过时的滑动时间窗口>(zhiyong11,1)
过时的滑动时间窗口>(zhiyong4,1)Process finished withexit code 130
大约每次打印5条数据,结合自定义数据源约是每秒1条,滑动窗口显然功能正确。
计数窗口算子
滚动计数窗口
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);//过时方法,必须设置,才能正常使用算子,否则报错SingleOutputStreamOperator<Tuple2<String,Integer>> result3 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).countWindow(5).sum(1);
result3.print("滚动计数窗口");
不设置env同样会报错。设置后可以正常运行:
滚动计数窗口>(zhiyong8,5)
滚动计数窗口>(zhiyong12,5)
滚动计数窗口>(zhiyong4,5)
滚动计数窗口>(zhiyong11,5)
滚动计数窗口>(zhiyong6,5)
滚动计数窗口>(zhiyong19,5)Process finished withexit code 130
每次都是凑齐5个才会打印。
滑动计数窗口
SingleOutputStreamOperator<Tuple2<String,Integer>> result4 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).countWindow(5,3).sum(1);
result4.print("滑动计数窗口");
同样是使用过时方法设置env,可以看到:
滑动计数窗口>(zhiyong0,3)
滑动计数窗口>(zhiyong8,3)
滑动计数窗口>(zhiyong17,3)
滑动计数窗口>(zhiyong7,3)
滑动计数窗口>(zhiyong9,3)
滑动计数窗口>(zhiyong12,3)
滑动计数窗口>(zhiyong18,3)
滑动计数窗口>(zhiyong1,3)
滑动计数窗口>(zhiyong10,3)
滑动计数窗口>(zhiyong4,3)
滑动计数窗口>(zhiyong16,3)
滑动计数窗口>(zhiyong19,3)
滑动计数窗口>(zhiyong9,5)
滑动计数窗口>(zhiyong8,5)
滑动计数窗口>(zhiyong17,5)
滑动计数窗口>(zhiyong1,5)
滑动计数窗口>(zhiyong19,5)
滑动计数窗口>(zhiyong9,5)
滑动计数窗口>(zhiyong5,3)
滑动计数窗口>(zhiyong3,3)
滑动计数窗口>(zhiyong11,3)
滑动计数窗口>(zhiyong12,5)
滑动计数窗口>(zhiyong17,5)
滑动计数窗口>(zhiyong18,5)
滑动计数窗口>(zhiyong2,3)
滑动计数窗口>(zhiyong1,5)
滑动计数窗口>(zhiyong10,5)
滑动计数窗口>(zhiyong4,5)
滑动计数窗口>(zhiyong12,5)
滑动计数窗口>(zhiyong7,5)
滑动计数窗口>(zhiyong8,5)
滑动计数窗口>(zhiyong19,5)
滑动计数窗口>(zhiyong11,5)
滑动计数窗口>(zhiyong1,5)
滑动计数窗口>(zhiyong17,5)
滑动计数窗口>(zhiyong9,5)
滑动计数窗口>(zhiyong0,5)
滑动计数窗口>(zhiyong3,5)
滑动计数窗口>(zhiyong14,3)
滑动计数窗口>(zhiyong2,5)
滑动计数窗口>(zhiyong19,5)
滑动计数窗口>(zhiyong4,5)
滑动计数窗口>(zhiyong5,5)
滑动计数窗口>(zhiyong2,5)
滑动计数窗口>(zhiyong8,5)
滑动计数窗口>(zhiyong6,3)
滑动计数窗口>(zhiyong15,3)
滑动计数窗口>(zhiyong11,5)
滑动计数窗口>(zhiyong13,3)
滑动计数窗口>(zhiyong16,5)
滑动计数窗口>(zhiyong19,5)
滑动计数窗口>(zhiyong10,5)
滑动计数窗口>(zhiyong0,5)
滑动计数窗口>(zhiyong2,5)
滑动计数窗口>(zhiyong16,5)
滑动计数窗口>(zhiyong4,5)
滑动计数窗口>(zhiyong15,5)
滑动计数窗口>(zhiyong13,5)
滑动计数窗口>(zhiyong7,5)
滑动计数窗口>(zhiyong1,5)
滑动计数窗口>(zhiyong19,5)
滑动计数窗口>(zhiyong10,5)
滑动计数窗口>(zhiyong3,5)
滑动计数窗口>(zhiyong11,5)
滑动计数窗口>(zhiyong6,5)
滑动计数窗口>(zhiyong8,5)
滑动计数窗口>(zhiyong16,5)
滑动计数窗口>(zhiyong0,5)
滑动计数窗口>(zhiyong14,5)
滑动计数窗口>(zhiyong12,5)Process finished withexit code 130
正好20个word,正好20个count到结果为3的数据。其余均是20。显然,最开始累计出现3次就开始滑动,之后每来3条滑动1次,但是窗口大小只有5个,所以之后的count值永远是5个。
窗口分配器的窗口算子
会话窗口
静态时间间隔的处理时间会话窗口
先构建数据源:
packagecom.zhiyong.flinkStream;importorg.apache.flink.streaming.api.functions.source.SourceFunction;importjava.util.ArrayList;importjava.util.Random;/**
* @program: study
* @description: 20秒内随机产生数据
* @author: zhiyong
* @create: 2022-03-24 22:08
**/publicclassFlinkRandom10sSourceimplementsSourceFunction<String>{boolean needRun=true;@Overridepublicvoidrun(SourceContext<String> ctx)throwsException{while(needRun){ArrayList<String> result =newArrayList<>();for(int i =0; i <10; i++){
result.add("zhiyong"+ i);}int index=newRandom().nextInt(10);
ctx.collect(result.get(index));int delayTime =newRandom().nextInt(10)*1000;System.out.println("产生数据“"+ result.get(index));System.out.println("延时"+ delayTime +"ms");Thread.sleep(delayTime);}}@Overridepublicvoidcancel(){
needRun=false;}}
测试会话窗口:
packagecom.zhiyong.flinkStream;importorg.apache.flink.api.common.functions.FlatMapFunction;importorg.apache.flink.api.java.functions.KeySelector;importorg.apache.flink.api.java.tuple.Tuple2;importorg.apache.flink.streaming.api.datastream.DataStreamSource;importorg.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;importorg.apache.flink.streaming.api.environment.StreamExecutionEnvironment;importorg.apache.flink.streaming.api.windowing.assigners.ProcessingTimeSessionWindows;importorg.apache.flink.streaming.api.windowing.time.Time;importorg.apache.flink.util.Collector;/**
* @program: study
* @description: Flink会话窗口
* @author: zhiyong
* @create: 2022-03-24 22:07
**/publicclassSessionWindowDemo{publicstaticvoidmain(String[] args)throwsException{StreamExecutionEnvironment env =StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);DataStreamSource<String> data = env.addSource(newFlinkRandom10sSource());SingleOutputStreamOperator<Tuple2<String,Integer>> result1 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).window(ProcessingTimeSessionWindows.withGap(Time.seconds(5))).sum(1);
result1.print("静态时间间隔的处理时间会话窗口");
env.execute("会话窗口");}privatestaticclassFlatMapFunction1implementsFlatMapFunction<String,Tuple2<String,Integer>>{@OverridepublicvoidflatMap(String value,Collector<Tuple2<String,Integer>> out)throwsException{for(String cell:value.split("\\s+")){
out.collect(Tuple2.of(cell,1));}}}}
结果:
产生数据“zhiyong4
延时6000ms
静态时间间隔的处理时间会话窗口>(zhiyong4,1)
产生数据“zhiyong0
延时4000ms
产生数据“zhiyong0
延时9000ms
静态时间间隔的处理时间会话窗口>(zhiyong0,2)
产生数据“zhiyong4
延时7000ms
静态时间间隔的处理时间会话窗口>(zhiyong4,1)
产生数据“zhiyong3
延时3000ms
产生数据“zhiyong8
延时0ms
产生数据“zhiyong1
延时4000ms
静态时间间隔的处理时间会话窗口>(zhiyong3,1)
产生数据“zhiyong6
延时2000ms
静态时间间隔的处理时间会话窗口>(zhiyong8,1)
静态时间间隔的处理时间会话窗口>(zhiyong1,1)
产生数据“zhiyong5
延时5000ms
静态时间间隔的处理时间会话窗口>(zhiyong6,1)
产生数据“zhiyong0
延时0ms
产生数据“zhiyong9
延时6000ms
静态时间间隔的处理时间会话窗口>(zhiyong5,1)
静态时间间隔的处理时间会话窗口>(zhiyong0,1)
静态时间间隔的处理时间会话窗口>(zhiyong9,1)
产生数据“zhiyong7
延时6000ms
静态时间间隔的处理时间会话窗口>(zhiyong7,1)
产生数据“zhiyong5
延时8000ms
静态时间间隔的处理时间会话窗口>(zhiyong5,1)
产生数据“zhiyong4
延时7000ms
静态时间间隔的处理时间会话窗口>(zhiyong4,1)
产生数据“zhiyong3
延时2000ms
产生数据“zhiyong8
延时9000ms
静态时间间隔的处理时间会话窗口>(zhiyong3,1)
静态时间间隔的处理时间会话窗口>(zhiyong8,1)
产生数据“zhiyong0
延时4000ms
产生数据“zhiyong6
延时0ms
产生数据“zhiyong6
延时3000ms
静态时间间隔的处理时间会话窗口>(zhiyong0,1)
产生数据“zhiyong0
延时4000ms
静态时间间隔的处理时间会话窗口>(zhiyong6,2)
产生数据“zhiyong2
延时9000ms
静态时间间隔的处理时间会话窗口>(zhiyong0,1)
静态时间间隔的处理时间会话窗口>(zhiyong2,1)
产生数据“zhiyong5
延时6000ms
静态时间间隔的处理时间会话窗口>(zhiyong5,1)
产生数据“zhiyong7
延时3000ms
产生数据“zhiyong3
延时8000ms
静态时间间隔的处理时间会话窗口>(zhiyong7,1)
静态时间间隔的处理时间会话窗口>(zhiyong3,1)
产生数据“zhiyong2
延时4000ms
Process finished withexit code 130
动态时间间隔的处理时间会话窗口
SingleOutputStreamOperator<Tuple2<String,Integer>> result2 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).window(ProcessingTimeSessionWindows.withDynamicGap(newSessionWindowTimeGapExtractor<Tuple2<String,Integer>>(){@Overridepubliclongextract(Tuple2<String,Integer> element){return(Long.parseLong(element.f0.toString().split("g")[1])+5)*1000;}})).sum(1);
result2.print("动态时间间隔的处理时间会话窗口");
执行后:
产生数据“zhiyong2
延时4000ms
产生数据“zhiyong7
延时4000ms
动态时间间隔的处理时间会话窗口>(zhiyong2,1)
产生数据“zhiyong7
延时0ms
产生数据“zhiyong9
延时8000ms
产生数据“zhiyong9
延时1000ms
产生数据“zhiyong9
延时3000ms
产生数据“zhiyong6
延时2000ms
动态时间间隔的处理时间会话窗口>(zhiyong7,2)
产生数据“zhiyong1
延时0ms
产生数据“zhiyong6
延时2000ms
产生数据“zhiyong7
延时8000ms
动态时间间隔的处理时间会话窗口>(zhiyong1,1)
动态时间间隔的处理时间会话窗口>(zhiyong9,3)
产生数据“zhiyong8
延时9000ms
动态时间间隔的处理时间会话窗口>(zhiyong6,2)
动态时间间隔的处理时间会话窗口>(zhiyong7,1)
产生数据“zhiyong2
延时0ms
产生数据“zhiyong7
延时9000ms
动态时间间隔的处理时间会话窗口>(zhiyong8,1)
动态时间间隔的处理时间会话窗口>(zhiyong2,1)
产生数据“zhiyong0
延时8000ms
动态时间间隔的处理时间会话窗口>(zhiyong7,1)
动态时间间隔的处理时间会话窗口>(zhiyong0,1)
产生数据“zhiyong2
延时9000ms
动态时间间隔的处理时间会话窗口>(zhiyong2,1)
产生数据“zhiyong7
延时3000ms
产生数据“zhiyong9
延时6000ms
产生数据“zhiyong3
延时3000ms
产生数据“zhiyong7
延时0ms
产生数据“zhiyong8
延时6000ms
动态时间间隔的处理时间会话窗口>(zhiyong7,1)
动态时间间隔的处理时间会话窗口>(zhiyong9,1)
动态时间间隔的处理时间会话窗口>(zhiyong3,1)
产生数据“zhiyong7
延时4000ms
产生数据“zhiyong9
延时1000ms
产生数据“zhiyong7
延时0ms
产生数据“zhiyong4
延时3000ms
动态时间间隔的处理时间会话窗口>(zhiyong8,1)
产生数据“zhiyong8
延时6000ms
产生数据“zhiyong4
延时1000ms
动态时间间隔的处理时间会话窗口>(zhiyong4,1)
产生数据“zhiyong9
延时8000ms
动态时间间隔的处理时间会话窗口>(zhiyong7,3)
动态时间间隔的处理时间会话窗口>(zhiyong8,1)
产生数据“zhiyong5
延时9000ms
动态时间间隔的处理时间会话窗口>(zhiyong4,1)Process finished withexit code 130
可以看出这种情况时间间隔可以动态变化,同样是超时便关闭窗口。
静态时间间隔的事件时间会话窗口
.window(EventTimeSessionWindows.withGap(Time.seconds(5)))
算子层面区别不大。
动态时间间隔的事件时间会话窗口
.window(EventTimeSessionWindows.withDynamicGap(newSessionWindowTimeGapExtractor<Tuple2<String,Integer>>(){@Overridepubliclongextract(Tuple2<String,Integer> element){return(Long.parseLong(element.f0.toString().split("g")[1])+5)*1000;}}))
算子层面区别不大。
事件时间的这2种情况与处理时间类似,不再赘述。
时间窗口
.timeWindow(Time.seconds(30))//这种是滚动窗口.timeWindow(Time.seconds(30),Time.seconds(30))//这种是滑动窗口
这种算子已经过时。但是时间窗口使用频率偏偏最高。新API可以设置偏移量来消除时区的影响。比如东八区。。。
滚动事件时间窗口
SingleOutputStreamOperator<Tuple2<String,Integer>> result5 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).window(TumblingEventTimeWindows.of(Time.seconds(10)))//.window(TumblingEventTimeWindows.of(Time.seconds(10)),Time.hours(-8))//东八区.sum(1);
result5.print("滚动事件时间窗口");
滚动处理时间窗口
SingleOutputStreamOperator<Tuple2<String,Integer>> result6 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).window(TumblingProcessingTimeWindows.of(Time.seconds(10)))//.window(TumblingProcessingTimeWindows.of(Time.seconds(10)),Time.hours(-8))//东八区.sum(1);
result6.print("滚动处理时间窗口");
滑动事件时间窗口
SingleOutputStreamOperator<Tuple2<String,Integer>> result7 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).window(SlidingEventTimeWindows.of(Time.seconds(5),Time.seconds(3)))//.window(SlidingEventTimeWindows.of(Time.seconds(5),Time.seconds(3),Time.hours(-8)))//东八区.sum(1);
result7.print("滑动事件时间窗口");
滑动处理时间窗口
SingleOutputStreamOperator<Tuple2<String,Integer>> result8 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).window(SlidingProcessingTimeWindows.of(Time.seconds(5),Time.seconds(3)))//.window(SlidingProcessingTimeWindows.of(Time.seconds(5),Time.seconds(3),Time.hours(-8)))//东八区.sum(1);
result8.print("滑动处理时间窗口");
版权归原作者 虎鲸不是鱼 所有, 如有侵权,请联系我们删除。