0


Flink窗口转换算子

窗口转换算子预览

之前扒源码看到过Flink的窗口有很多种:

packageorg.apache.flink.streaming.api.windowing.assigners;/**
 * A {@code WindowAssigner} assigns zero or more {@link Window Windows} to an element.
 *
 * <p>In a window operation, elements are grouped by their key (if available) and by the windows to
 * which it was assigned. The set of elements with the same key and window is called a pane. When a
 * {@link Trigger} decides that a certain pane should fire the {@link
 * org.apache.flink.streaming.api.functions.windowing.WindowFunction} is applied to produce output
 * elements for that pane.
 *
 * @param <T> The type of elements that this WindowAssigner can assign windows to.
 * @param <W> The type of {@code Window} that this assigner assigns.
 */@PublicEvolvingpublicabstractclassWindowAssigner<T,WextendsWindow>implementsSerializable{privatestaticfinallong serialVersionUID =1L;/**
     * Returns a {@code Collection} of windows that should be assigned to the element.
     *
     * @param element The element to which windows should be assigned.
     * @param timestamp The timestamp of the element.
     * @param context The {@link WindowAssignerContext} in which the assigner operates.
     */publicabstractCollection<W>assignWindows(T element,long timestamp,WindowAssignerContext context);/** Returns the default trigger associated with this {@code WindowAssigner}. */publicabstractTrigger<T,W>getDefaultTrigger(StreamExecutionEnvironment env);/**
     * Returns a {@link TypeSerializer} for serializing windows that are assigned by this {@code
     * WindowAssigner}.
     */publicabstractTypeSerializer<W>getWindowSerializer(ExecutionConfig executionConfig);/**
     * Returns {@code true} if elements are assigned to windows based on event time, {@code false}
     * otherwise.
     */publicabstractbooleanisEventTime();/**
     * A context provided to the {@link WindowAssigner} that allows it to query the current
     * processing time.
     *
     * <p>This is provided to the assigner by its containing {@link
     * org.apache.flink.streaming.runtime.operators.windowing.WindowOperator}, which, in turn, gets
     * it from the containing {@link org.apache.flink.streaming.runtime.tasks.StreamTask}.
     */publicabstractstaticclassWindowAssignerContext{/** Returns the current processing time. */publicabstractlonggetCurrentProcessingTime();}}

窗口算子需要传的参数WindowAssigner有如下实现类:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vBqttzeL-1648137764194)(E:\study\flink\Flink窗口转换算子.assets\image-20220324223444576.png)]

源码:

packageorg.apache.flink.streaming.api.datastream;publicclassKeyedStream<T, KEY>extendsDataStream<T>{// ------------------------------------------------------------------------//  Windowing// ------------------------------------------------------------------------/**
     * Windows this {@code KeyedStream} into tumbling time windows.
     *
     * <p>This is a shortcut for either {@code .window(TumblingEventTimeWindows.of(size))} or {@code
     * .window(TumblingProcessingTimeWindows.of(size))} depending on the time characteristic set
     * using {@link
     * org.apache.flink.streaming.api.environment.StreamExecutionEnvironment#setStreamTimeCharacteristic(org.apache.flink.streaming.api.TimeCharacteristic)}
     *
     * @param size The size of the window.
     * @deprecated Please use {@link #window(WindowAssigner)} with either {@link
     *     TumblingEventTimeWindows} or {@link TumblingProcessingTimeWindows}. For more information,
     *     see the deprecation notice on {@link TimeCharacteristic}
     */@DeprecatedpublicWindowedStream<T, KEY,TimeWindow>timeWindow(Time size){if(environment.getStreamTimeCharacteristic()==TimeCharacteristic.ProcessingTime){returnwindow(TumblingProcessingTimeWindows.of(size));}else{returnwindow(TumblingEventTimeWindows.of(size));}}/**
     * Windows this {@code KeyedStream} into sliding time windows.
     *
     * <p>This is a shortcut for either {@code .window(SlidingEventTimeWindows.of(size, slide))} or
     * {@code .window(SlidingProcessingTimeWindows.of(size, slide))} depending on the time
     * characteristic set using {@link
     * org.apache.flink.streaming.api.environment.StreamExecutionEnvironment#setStreamTimeCharacteristic(org.apache.flink.streaming.api.TimeCharacteristic)}
     *
     * @param size The size of the window.
     * @deprecated Please use {@link #window(WindowAssigner)} with either {@link
     *     SlidingEventTimeWindows} or {@link SlidingProcessingTimeWindows}. For more information,
     *     see the deprecation notice on {@link TimeCharacteristic}
     */@DeprecatedpublicWindowedStream<T, KEY,TimeWindow>timeWindow(Time size,Time slide){if(environment.getStreamTimeCharacteristic()==TimeCharacteristic.ProcessingTime){returnwindow(SlidingProcessingTimeWindows.of(size, slide));}else{returnwindow(SlidingEventTimeWindows.of(size, slide));}}/**
     * Windows this {@code KeyedStream} into tumbling count windows.
     *
     * @param size The size of the windows in number of elements.
     */publicWindowedStream<T, KEY,GlobalWindow>countWindow(long size){returnwindow(GlobalWindows.create()).trigger(PurgingTrigger.of(CountTrigger.of(size)));}/**
     * Windows this {@code KeyedStream} into sliding count windows.
     *
     * @param size The size of the windows in number of elements.
     * @param slide The slide interval in number of elements.
     */publicWindowedStream<T, KEY,GlobalWindow>countWindow(long size,long slide){returnwindow(GlobalWindows.create()).evictor(CountEvictor.of(size)).trigger(CountTrigger.of(slide));}/**
     * Windows this data stream to a {@code WindowedStream}, which evaluates windows over a key
     * grouped stream. Elements are put into windows by a {@link WindowAssigner}. The grouping of
     * elements is done both by key and by window.
     *
     * <p>A {@link org.apache.flink.streaming.api.windowing.triggers.Trigger} can be defined to
     * specify when windows are evaluated. However, {@code WindowAssigners} have a default {@code
     * Trigger} that is used if a {@code Trigger} is not specified.
     *
     * @param assigner The {@code WindowAssigner} that assigns elements to windows.
     * @return The trigger windows data stream.
     */@PublicEvolvingpublic<WextendsWindow>WindowedStream<T, KEY,W>window(WindowAssigner<?superT,W> assigner){returnnewWindowedStream<>(this, assigner);}}

与Window有关的算子有已经过时的2种timeWindow时间窗口、countWindow计数窗口,以及一个window窗口分配器算子。

过时的时间窗口算子

以处理时间为例子。事件时间的情况与之类似。之后尽量使用新API,但不会深究老API与老的DataSet批处理。

滚动时间窗口

packagecom.zhiyong.flinkStream;importcom.zhiyong.flinkStudy.FlinkWordCountDemo2FlatMapFunction;importcom.zhiyong.flinkStudy.WordCountSource1ps;importorg.apache.flink.api.common.eventtime.WatermarkStrategy;importorg.apache.flink.api.common.functions.FlatMapFunction;importorg.apache.flink.api.java.functions.KeySelector;importorg.apache.flink.api.java.tuple.Tuple2;importorg.apache.flink.streaming.api.TimeCharacteristic;importorg.apache.flink.streaming.api.datastream.DataStreamSource;importorg.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;importorg.apache.flink.streaming.api.environment.StreamExecutionEnvironment;importorg.apache.flink.streaming.api.windowing.time.Time;importorg.apache.flink.table.planner.expressions.In;importorg.apache.flink.util.Collector;/**
 * @program: study
 * @description: Flink窗口转换算子
 * @author: zhiyong
 * @create: 2022-03-23 23:59
 **/publicclassWindowDemo{publicstaticvoidmain(String[] args)throwsException{StreamExecutionEnvironment env =StreamExecutionEnvironment.getExecutionEnvironment();

        env.setParallelism(1);

        env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);//过时方法,必须设置,才能正常使用算子,否则报错DataStreamSource<String> data = env.addSource(newWordCountSource1ps());SingleOutputStreamOperator<Tuple2<String,Integer>> result1 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).timeWindow(Time.seconds(30))//过时的滚动时间窗口.sum(1);

        result1.print("过时的滚动时间窗口");

        env.execute("streaming");}privatestaticclassFlatMapFunction1implementsFlatMapFunction<String,Tuple2<String,Integer>>{@OverridepublicvoidflatMap(String value,Collector<Tuple2<String,Integer>> out)throwsException{for(String cell:value.split("\\s+")){
                out.collect(Tuple2.of(cell,1));}}}}

由于自定义数据源不带时间戳,不使用上述过时方法设置env会报错:

Exception in thread "main"org.apache.flink.runtime.client.JobExecutionException:Job execution failed.
    at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
    at org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:137)
    at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
    at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
    at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
    at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:258)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
    at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
    at org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1389)
    at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
    at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
    at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
    at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
    at org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:47)
    at akka.dispatch.OnComplete.internal(Future.scala:300)
    at akka.dispatch.OnComplete.internal(Future.scala:297)
    at akka.dispatch.japi$CallbackBridge.apply(Future.scala:224)
    at akka.dispatch.japi$CallbackBridge.apply(Future.scala:221)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
    at org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$DirectExecutionContext.execute(AkkaFutureUtils.java:65)
    at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
    at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
    at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
    at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
    at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:621)
    at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:24)
    at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:23)
    at scala.concurrent.Future.$anonfun$andThen$1(Future.scala:532)
    at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
    at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
    at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:63)
    at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:100)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
    at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:100)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1067)
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1703)
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:172)Caused by:org.apache.flink.runtime.JobException:Recovery is suppressed by NoRestartBackoffTimeStrategy
    at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138)
    at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82)
    at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:252)
    at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:242)
    at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:233)
    at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:684)
    at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79)
    at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethod)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316)
    at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
    at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
    at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
    at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
    at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
    at akka.actor.Actor.aroundReceive(Actor.scala:537)
    at akka.actor.Actor.aroundReceive$(Actor.scala:535)
    at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
    at akka.actor.ActorCell.invoke(ActorCell.scala:548)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
    at akka.dispatch.Mailbox.run(Mailbox.scala:231)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:243)...4 more
Caused by:java.lang.RuntimeException:Record has Long.MIN_VALUE timestamp (= no timestamp marker).Is the time characteristic set to'ProcessingTime', or did you forget tocall'DataStream.assignTimestampsAndWatermarks(...)'?
    at org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows.assignWindows(TumblingEventTimeWindows.java:83)
    at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:293)
    at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:233)
    at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
    at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
    at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:496)
    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761)
    at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
    at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
    at java.lang.Thread.run(Thread.java:750)Process finished withexit code 1

为了使用该过时的时间窗口算子,使用了该过时方法。设置后可以成功运行:

过时的滚动时间窗口>(zhiyong13,1)
过时的滚动时间窗口>(zhiyong6,1)
过时的滚动时间窗口>(zhiyong0,1)
过时的滚动时间窗口>(zhiyong1,1)
过时的滚动时间窗口>(zhiyong18,1)
过时的滚动时间窗口>(zhiyong11,1)
过时的滚动时间窗口>(zhiyong8,1)
过时的滚动时间窗口>(zhiyong4,2)
过时的滚动时间窗口>(zhiyong17,3)
过时的滚动时间窗口>(zhiyong9,2)
过时的滚动时间窗口>(zhiyong19,2)
过时的滚动时间窗口>(zhiyong15,1)
过时的滚动时间窗口>(zhiyong5,3)
过时的滚动时间窗口>(zhiyong16,3)
过时的滚动时间窗口>(zhiyong14,3)Process finished withexit code 130

滑动时间窗口

SingleOutputStreamOperator<Tuple2<String,Integer>> result2 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).timeWindow(Time.seconds(10),Time.seconds(5))//过时的滑动时间窗口.sum(1);

        result2.print("过时的滑动时间窗口");

不设置env同样会报错。设置后可以正常运行:

过时的滑动时间窗口>(zhiyong14,1)
过时的滑动时间窗口>(zhiyong14,1)
过时的滑动时间窗口>(zhiyong11,1)
过时的滑动时间窗口>(zhiyong4,1)
过时的滑动时间窗口>(zhiyong15,1)
过时的滑动时间窗口>(zhiyong3,1)
过时的滑动时间窗口>(zhiyong6,1)
过时的滑动时间窗口>(zhiyong6,1)
过时的滑动时间窗口>(zhiyong7,1)
过时的滑动时间窗口>(zhiyong11,1)
过时的滑动时间窗口>(zhiyong3,2)
过时的滑动时间窗口>(zhiyong16,1)
过时的滑动时间窗口>(zhiyong15,3)
过时的滑动时间窗口>(zhiyong4,1)
过时的滑动时间窗口>(zhiyong16,2)
过时的滑动时间窗口>(zhiyong3,1)
过时的滑动时间窗口>(zhiyong7,2)
过时的滑动时间窗口>(zhiyong1,1)
过时的滑动时间窗口>(zhiyong15,2)
过时的滑动时间窗口>(zhiyong11,1)
过时的滑动时间窗口>(zhiyong4,1)Process finished withexit code 130

大约每次打印5条数据,结合自定义数据源约是每秒1条,滑动窗口显然功能正确。

计数窗口算子

滚动计数窗口

        env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);//过时方法,必须设置,才能正常使用算子,否则报错SingleOutputStreamOperator<Tuple2<String,Integer>> result3 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).countWindow(5).sum(1);

        result3.print("滚动计数窗口");

不设置env同样会报错。设置后可以正常运行:

滚动计数窗口>(zhiyong8,5)
滚动计数窗口>(zhiyong12,5)
滚动计数窗口>(zhiyong4,5)
滚动计数窗口>(zhiyong11,5)
滚动计数窗口>(zhiyong6,5)
滚动计数窗口>(zhiyong19,5)Process finished withexit code 130

每次都是凑齐5个才会打印。

滑动计数窗口

SingleOutputStreamOperator<Tuple2<String,Integer>> result4 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).countWindow(5,3).sum(1);

        result4.print("滑动计数窗口");

同样是使用过时方法设置env,可以看到:

滑动计数窗口>(zhiyong0,3)
滑动计数窗口>(zhiyong8,3)
滑动计数窗口>(zhiyong17,3)
滑动计数窗口>(zhiyong7,3)
滑动计数窗口>(zhiyong9,3)
滑动计数窗口>(zhiyong12,3)
滑动计数窗口>(zhiyong18,3)
滑动计数窗口>(zhiyong1,3)
滑动计数窗口>(zhiyong10,3)
滑动计数窗口>(zhiyong4,3)
滑动计数窗口>(zhiyong16,3)
滑动计数窗口>(zhiyong19,3)
滑动计数窗口>(zhiyong9,5)
滑动计数窗口>(zhiyong8,5)
滑动计数窗口>(zhiyong17,5)
滑动计数窗口>(zhiyong1,5)
滑动计数窗口>(zhiyong19,5)
滑动计数窗口>(zhiyong9,5)
滑动计数窗口>(zhiyong5,3)
滑动计数窗口>(zhiyong3,3)
滑动计数窗口>(zhiyong11,3)
滑动计数窗口>(zhiyong12,5)
滑动计数窗口>(zhiyong17,5)
滑动计数窗口>(zhiyong18,5)
滑动计数窗口>(zhiyong2,3)
滑动计数窗口>(zhiyong1,5)
滑动计数窗口>(zhiyong10,5)
滑动计数窗口>(zhiyong4,5)
滑动计数窗口>(zhiyong12,5)
滑动计数窗口>(zhiyong7,5)
滑动计数窗口>(zhiyong8,5)
滑动计数窗口>(zhiyong19,5)
滑动计数窗口>(zhiyong11,5)
滑动计数窗口>(zhiyong1,5)
滑动计数窗口>(zhiyong17,5)
滑动计数窗口>(zhiyong9,5)
滑动计数窗口>(zhiyong0,5)
滑动计数窗口>(zhiyong3,5)
滑动计数窗口>(zhiyong14,3)
滑动计数窗口>(zhiyong2,5)
滑动计数窗口>(zhiyong19,5)
滑动计数窗口>(zhiyong4,5)
滑动计数窗口>(zhiyong5,5)
滑动计数窗口>(zhiyong2,5)
滑动计数窗口>(zhiyong8,5)
滑动计数窗口>(zhiyong6,3)
滑动计数窗口>(zhiyong15,3)
滑动计数窗口>(zhiyong11,5)
滑动计数窗口>(zhiyong13,3)
滑动计数窗口>(zhiyong16,5)
滑动计数窗口>(zhiyong19,5)
滑动计数窗口>(zhiyong10,5)
滑动计数窗口>(zhiyong0,5)
滑动计数窗口>(zhiyong2,5)
滑动计数窗口>(zhiyong16,5)
滑动计数窗口>(zhiyong4,5)
滑动计数窗口>(zhiyong15,5)
滑动计数窗口>(zhiyong13,5)
滑动计数窗口>(zhiyong7,5)
滑动计数窗口>(zhiyong1,5)
滑动计数窗口>(zhiyong19,5)
滑动计数窗口>(zhiyong10,5)
滑动计数窗口>(zhiyong3,5)
滑动计数窗口>(zhiyong11,5)
滑动计数窗口>(zhiyong6,5)
滑动计数窗口>(zhiyong8,5)
滑动计数窗口>(zhiyong16,5)
滑动计数窗口>(zhiyong0,5)
滑动计数窗口>(zhiyong14,5)
滑动计数窗口>(zhiyong12,5)Process finished withexit code 130

正好20个word,正好20个count到结果为3的数据。其余均是20。显然,最开始累计出现3次就开始滑动,之后每来3条滑动1次,但是窗口大小只有5个,所以之后的count值永远是5个。

窗口分配器的窗口算子

会话窗口

静态时间间隔的处理时间会话窗口

先构建数据源:

packagecom.zhiyong.flinkStream;importorg.apache.flink.streaming.api.functions.source.SourceFunction;importjava.util.ArrayList;importjava.util.Random;/**
 * @program: study
 * @description: 20秒内随机产生数据
 * @author: zhiyong
 * @create: 2022-03-24 22:08
 **/publicclassFlinkRandom10sSourceimplementsSourceFunction<String>{boolean needRun=true;@Overridepublicvoidrun(SourceContext<String> ctx)throwsException{while(needRun){ArrayList<String> result =newArrayList<>();for(int i =0; i <10; i++){
                result.add("zhiyong"+ i);}int index=newRandom().nextInt(10);
            ctx.collect(result.get(index));int delayTime =newRandom().nextInt(10)*1000;System.out.println("产生数据“"+ result.get(index));System.out.println("延时"+ delayTime +"ms");Thread.sleep(delayTime);}}@Overridepublicvoidcancel(){
        needRun=false;}}

测试会话窗口:

packagecom.zhiyong.flinkStream;importorg.apache.flink.api.common.functions.FlatMapFunction;importorg.apache.flink.api.java.functions.KeySelector;importorg.apache.flink.api.java.tuple.Tuple2;importorg.apache.flink.streaming.api.datastream.DataStreamSource;importorg.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;importorg.apache.flink.streaming.api.environment.StreamExecutionEnvironment;importorg.apache.flink.streaming.api.windowing.assigners.ProcessingTimeSessionWindows;importorg.apache.flink.streaming.api.windowing.time.Time;importorg.apache.flink.util.Collector;/**
 * @program: study
 * @description: Flink会话窗口
 * @author: zhiyong
 * @create: 2022-03-24 22:07
 **/publicclassSessionWindowDemo{publicstaticvoidmain(String[] args)throwsException{StreamExecutionEnvironment env =StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);DataStreamSource<String> data = env.addSource(newFlinkRandom10sSource());SingleOutputStreamOperator<Tuple2<String,Integer>> result1 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).window(ProcessingTimeSessionWindows.withGap(Time.seconds(5))).sum(1);

        result1.print("静态时间间隔的处理时间会话窗口");

        env.execute("会话窗口");}privatestaticclassFlatMapFunction1implementsFlatMapFunction<String,Tuple2<String,Integer>>{@OverridepublicvoidflatMap(String value,Collector<Tuple2<String,Integer>> out)throwsException{for(String cell:value.split("\\s+")){
                out.collect(Tuple2.of(cell,1));}}}}

结果:

产生数据“zhiyong4
延时6000ms
静态时间间隔的处理时间会话窗口>(zhiyong4,1)
产生数据“zhiyong0
延时4000ms
产生数据“zhiyong0
延时9000ms
静态时间间隔的处理时间会话窗口>(zhiyong0,2)
产生数据“zhiyong4
延时7000ms
静态时间间隔的处理时间会话窗口>(zhiyong4,1)
产生数据“zhiyong3
延时3000ms
产生数据“zhiyong8
延时0ms
产生数据“zhiyong1
延时4000ms
静态时间间隔的处理时间会话窗口>(zhiyong3,1)
产生数据“zhiyong6
延时2000ms
静态时间间隔的处理时间会话窗口>(zhiyong8,1)
静态时间间隔的处理时间会话窗口>(zhiyong1,1)
产生数据“zhiyong5
延时5000ms
静态时间间隔的处理时间会话窗口>(zhiyong6,1)
产生数据“zhiyong0
延时0ms
产生数据“zhiyong9
延时6000ms
静态时间间隔的处理时间会话窗口>(zhiyong5,1)
静态时间间隔的处理时间会话窗口>(zhiyong0,1)
静态时间间隔的处理时间会话窗口>(zhiyong9,1)
产生数据“zhiyong7
延时6000ms
静态时间间隔的处理时间会话窗口>(zhiyong7,1)
产生数据“zhiyong5
延时8000ms
静态时间间隔的处理时间会话窗口>(zhiyong5,1)
产生数据“zhiyong4
延时7000ms
静态时间间隔的处理时间会话窗口>(zhiyong4,1)
产生数据“zhiyong3
延时2000ms
产生数据“zhiyong8
延时9000ms
静态时间间隔的处理时间会话窗口>(zhiyong3,1)
静态时间间隔的处理时间会话窗口>(zhiyong8,1)
产生数据“zhiyong0
延时4000ms
产生数据“zhiyong6
延时0ms
产生数据“zhiyong6
延时3000ms
静态时间间隔的处理时间会话窗口>(zhiyong0,1)
产生数据“zhiyong0
延时4000ms
静态时间间隔的处理时间会话窗口>(zhiyong6,2)
产生数据“zhiyong2
延时9000ms
静态时间间隔的处理时间会话窗口>(zhiyong0,1)
静态时间间隔的处理时间会话窗口>(zhiyong2,1)
产生数据“zhiyong5
延时6000ms
静态时间间隔的处理时间会话窗口>(zhiyong5,1)
产生数据“zhiyong7
延时3000ms
产生数据“zhiyong3
延时8000ms
静态时间间隔的处理时间会话窗口>(zhiyong7,1)
静态时间间隔的处理时间会话窗口>(zhiyong3,1)
产生数据“zhiyong2
延时4000ms

Process finished withexit code 130

动态时间间隔的处理时间会话窗口

SingleOutputStreamOperator<Tuple2<String,Integer>> result2 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).window(ProcessingTimeSessionWindows.withDynamicGap(newSessionWindowTimeGapExtractor<Tuple2<String,Integer>>(){@Overridepubliclongextract(Tuple2<String,Integer> element){return(Long.parseLong(element.f0.toString().split("g")[1])+5)*1000;}})).sum(1);

        result2.print("动态时间间隔的处理时间会话窗口");

执行后:

产生数据“zhiyong2
延时4000ms
产生数据“zhiyong7
延时4000ms
动态时间间隔的处理时间会话窗口>(zhiyong2,1)
产生数据“zhiyong7
延时0ms
产生数据“zhiyong9
延时8000ms
产生数据“zhiyong9
延时1000ms
产生数据“zhiyong9
延时3000ms
产生数据“zhiyong6
延时2000ms
动态时间间隔的处理时间会话窗口>(zhiyong7,2)
产生数据“zhiyong1
延时0ms
产生数据“zhiyong6
延时2000ms
产生数据“zhiyong7
延时8000ms
动态时间间隔的处理时间会话窗口>(zhiyong1,1)
动态时间间隔的处理时间会话窗口>(zhiyong9,3)
产生数据“zhiyong8
延时9000ms
动态时间间隔的处理时间会话窗口>(zhiyong6,2)
动态时间间隔的处理时间会话窗口>(zhiyong7,1)
产生数据“zhiyong2
延时0ms
产生数据“zhiyong7
延时9000ms
动态时间间隔的处理时间会话窗口>(zhiyong8,1)
动态时间间隔的处理时间会话窗口>(zhiyong2,1)
产生数据“zhiyong0
延时8000ms
动态时间间隔的处理时间会话窗口>(zhiyong7,1)
动态时间间隔的处理时间会话窗口>(zhiyong0,1)
产生数据“zhiyong2
延时9000ms
动态时间间隔的处理时间会话窗口>(zhiyong2,1)
产生数据“zhiyong7
延时3000ms
产生数据“zhiyong9
延时6000ms
产生数据“zhiyong3
延时3000ms
产生数据“zhiyong7
延时0ms
产生数据“zhiyong8
延时6000ms
动态时间间隔的处理时间会话窗口>(zhiyong7,1)
动态时间间隔的处理时间会话窗口>(zhiyong9,1)
动态时间间隔的处理时间会话窗口>(zhiyong3,1)
产生数据“zhiyong7
延时4000ms
产生数据“zhiyong9
延时1000ms
产生数据“zhiyong7
延时0ms
产生数据“zhiyong4
延时3000ms
动态时间间隔的处理时间会话窗口>(zhiyong8,1)
产生数据“zhiyong8
延时6000ms
产生数据“zhiyong4
延时1000ms
动态时间间隔的处理时间会话窗口>(zhiyong4,1)
产生数据“zhiyong9
延时8000ms
动态时间间隔的处理时间会话窗口>(zhiyong7,3)
动态时间间隔的处理时间会话窗口>(zhiyong8,1)
产生数据“zhiyong5
延时9000ms
动态时间间隔的处理时间会话窗口>(zhiyong4,1)Process finished withexit code 130

可以看出这种情况时间间隔可以动态变化,同样是超时便关闭窗口。

静态时间间隔的事件时间会话窗口

.window(EventTimeSessionWindows.withGap(Time.seconds(5)))

算子层面区别不大。

动态时间间隔的事件时间会话窗口

.window(EventTimeSessionWindows.withDynamicGap(newSessionWindowTimeGapExtractor<Tuple2<String,Integer>>(){@Overridepubliclongextract(Tuple2<String,Integer> element){return(Long.parseLong(element.f0.toString().split("g")[1])+5)*1000;}}))

算子层面区别不大。

事件时间的这2种情况与处理时间类似,不再赘述。

时间窗口

.timeWindow(Time.seconds(30))//这种是滚动窗口.timeWindow(Time.seconds(30),Time.seconds(30))//这种是滑动窗口

这种算子已经过时。但是时间窗口使用频率偏偏最高。新API可以设置偏移量来消除时区的影响。比如东八区。。。

滚动事件时间窗口

SingleOutputStreamOperator<Tuple2<String,Integer>> result5 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).window(TumblingEventTimeWindows.of(Time.seconds(10)))//.window(TumblingEventTimeWindows.of(Time.seconds(10)),Time.hours(-8))//东八区.sum(1);
        result5.print("滚动事件时间窗口");

滚动处理时间窗口

SingleOutputStreamOperator<Tuple2<String,Integer>> result6 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).window(TumblingProcessingTimeWindows.of(Time.seconds(10)))//.window(TumblingProcessingTimeWindows.of(Time.seconds(10)),Time.hours(-8))//东八区.sum(1);
        result6.print("滚动处理时间窗口");

滑动事件时间窗口

SingleOutputStreamOperator<Tuple2<String,Integer>> result7 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).window(SlidingEventTimeWindows.of(Time.seconds(5),Time.seconds(3)))//.window(SlidingEventTimeWindows.of(Time.seconds(5),Time.seconds(3),Time.hours(-8)))//东八区.sum(1);
        result7.print("滑动事件时间窗口");

滑动处理时间窗口

SingleOutputStreamOperator<Tuple2<String,Integer>> result8 = data.flatMap(newFlatMapFunction1()).keyBy(newKeySelector<Tuple2<String,Integer>,Object>(){@OverridepublicObjectgetKey(Tuple2<String,Integer> value)throwsException{return value.f0;}}).window(SlidingProcessingTimeWindows.of(Time.seconds(5),Time.seconds(3)))//.window(SlidingProcessingTimeWindows.of(Time.seconds(5),Time.seconds(3),Time.hours(-8)))//东八区.sum(1);
        result8.print("滑动处理时间窗口");
标签: hadoop apache flink

本文转载自: https://blog.csdn.net/qq_41990268/article/details/123725097
版权归原作者 虎鲸不是鱼 所有, 如有侵权,请联系我们删除。

“Flink窗口转换算子”的评论:

还没有评论