Why jruby faster




















I think that will resolve the caller portion of your bug but I'm unclear if there are performance issues outside of caller we need to resolve. Can you clarify what remains slow in JRuby 9k, assuming we have fixed the full-stack caller performance? The stack trace performance issues on Java 11 may be resolves as of Ok, well this is good news on the caller front: Java The StackWalker performance is still poor, however, so we'll use getStackTrace when a full trace is requested and consider options for partial traces e.

StackWalker is a clear win when inside a deep trace, but for a shallow trace getStackTrace will be faster. I'm not sure how we can detect that, though. And here is Java For the time being, I can not pinpoint a single issue. I just see that the pretty huge and complex application as a whole does not perform that well, but I don't know yet whether there are special heavily used language features which makes it slower. Your original report only talks about the performance of caller , which I believe should be fixed with my pull request.

That is unrelated to any other performance things we've discussed. In comment you mention looking at a couple other things:. So I think that's where we stand. A next step would be to run profiling I saw and closed your --sample issue so we can get an idea of where the time is going.

However if you profile with heavy caller use, that's going to massively dominate the profile results and we won't get much actionable data. Please try to profile without caller in play. Just a short comment to answer your questions, because, as you said, they don't really belong to the 'caller' theme:.

We don't need to use eval and currently have it in our code for performance reason. We have strings in our applications, which are syntactically a very restricted subset of Ruby expressions. I also wrote my own evaluator, and used eval for comparision, and it turned out that eval was faster. I will redo these benchmarks with JRuby 9, maybe the result is differntly. Aside from this, I do my measurements with logging i. Still, for the whole application, JRuby 9 is slower for us, but I understand that we need to find certain constructs in order to investigate it.

Unfortunately, as I wrote in the other thread which you have closed, I can't get profiling to work Regarding eval I will reopen the sampling issue since it seems there's still some issues getting it to work with your app. I finally have new findings on this issue, including a new benchmark, which does not involve caller , but still runs slower on JRuby 9. Running it with jruby-complete I would like to add another observation, which may or may not be helpful for this problem: Our example application we now are using for performance evaluation a complete execution of our Ruby program on a real data sample takes pretty reproducably 10 minutes with JRuby 1.

But when I run our application using the flat profiler, JRuby 1. Hence, under the profiler, the new JRuby is faster than the old one, but without profiler it is the other way round. It is difficult to obtain meaningful profiling data under this conditions. The profiler may simply be more efficient in the newer JRuby, so I wouldn't expect comparing profile results between these versions to be super accurate.

I could point out that we're no longer "terribly" slow but I want to be faster anyway. This is a very large number of samples occurring in dynamic call plumbing code, which may indicate there's some method not inlining or a slow-path invocation that never inlines or optimizes to a monomorphic call. The difficulty for me is to create typical code, which we actually can use for benchmarking.

So while we still seem to have regressed on this latest benchmark, we're still way faster than CRuby. The async-profiler project provides a JVM plugin that does very low profile sampling of call stacks, object allocations, and other metrics.

It can attach to a running VM or profile the entire process from beginning to end. I typically profile entire runs using a flag like this:. It works by recording all interesting events for a period of time, and then viewing a dump of that data.

Another set of interesting results, these track allocations. The absolute amounts aren't important because these ran for different amounts of time, but the ratio between the different types is quite different. The allocation of SearchMatchTask was eliminated, but there seems to be a much more drastic ratio of ByteCodeMachine to everything else.

Altering the benchmark to use a single character string rather than a regexp with the above patch also has an impact, and brings our performance to about where 1. I also experimented with a change from Ruby 2. Ok, some part of this is because of the new way we handle interpolating values into a string. Compare this to 1. I'm pushing my optimizations in Some part of this perf difference is the cost constructing of the dynamic string. Where 1. I've added logic to optimize that call but the extra string object is a killer still.

Ok after all recent optimizations we are within a margin of error on my most recent run of your smaller benchmark! I think I'm going to close this bug, since the original performance issues were mostly surrounding the now fixed implementation of caller. I will open a new one for your new bug, since it's largely unrelated to the original issue we found it's all strings.

Any future reduced benchmarks might want to be new bugs but you can run them by us. If they're again related to strings it could just be another case of the same bug. Most of the original degradation was from caller , which has been greatly improved in JRuby and also performs better on more recent JDKs. I have opened to track the reduced script. So we pay twice the cost and startup time becomes an even bigger challenge. We are closing the gap! We have some baseline numbers now, based on the -e 1 , gem list , gem install , and rails console command lines.

Sounds like just the magic we need, right? For some cases, it may be! An alternative Ruby implementation called TruffleRuby — part of the same GraalVM project — uses AOT and a prebooted image of the heap to improve their baseline startup substantially. So how does TruffleRuby fare on running our three common Ruby commands above? Things get a little murky here. Unfortunately, since TruffleRuby still parses, compiles, and executes Ruby code from source, they still see poor startup time in comparison to CRuby.

Ahead-of-time compilation may still be an option for JRuby, however. Our interpreter is much simpler than the one found in TruffleRuby, so precompiling JRuby to native code should run pretty well. We hope to explore this option in the next few months. J9 is one of the few world-class, fully-compliant JVM implementations out there, with a completely different array of optimizations, garbage collectors, and supported platforms. One of the cooler features of OpenJ9 is its ability to share pre-processed class data across runs.

When you pass the -Xshareclasses flag, OpenJ9 will create a shared archive containing pre-parsed, pre-verified JVM bytecode and class data. An additional flag -Xquickstart reduces how much optimization OpenJ9 does similar to the Hotspot TieredStopAtLevel flag shown above to allow short-running commands to get up and going more quickly.

And as of JRuby 9. The third, rails console is oddly slower…we look forward to working with the OpenJ9 team to get that one optimized as well. With our simple example, we see some very nice improvements:. Frustratingly, the rails console is again slower than on Hotspot 8…what is it about Rails that continues to confound optimizing VMs? We will be exploring how best to take advantage of these improvements. We're also pinging a set of per-thread fields to handle the unsafe "kill" and "raise" operations on each thread Let's turn all that off:.

The experimental optimizations up to this point other than threadless comprise the set of options for JRuby's --fast option, shipped in 1. The --fast option additionally tries to statically inspect code to determine whether these optimizations are safe.

For example, if you're running with --fast but still access backrefs, we're going to create a frame for you anyway. We're not done yet. I mentioned earlier the JVM gets some of its best optimizations from its ability to profile and inline code at runtime. Unfortunately in current JRuby, there's no way to inline dynamic calls.

There's too much plumbing involved. The upcoming "invokedynamic" work in Java 7 will give us an easier path forward, making dynamic calls as natural to the JVM as static calls, but of course we want to support Java 5 and Java 6 for a long time. So naturally, I have been maintaining an experimental patch that eliminates most of that plumbing and makes dynamic calls inline on Java 5 and Java 6.

Here are the latest Insider stories. More Insider Sign Out. Sign In Register. Sign Out Sign In Register. Latest Insider. Check out the latest Insider stories here. More from the IDG Network. Top five scripting languages on the JVM. JRuby upgrade called biggest release ever.

JRuby upgrade promises better performance. JRuby upgrade features Java accommodations. Performance Optimization, JRuby-style The truth is it's actually very easy to make small snippits of Ruby code run really fast, especially if you optimize for the benchmark.

JRuby 1. So let's try "heap frame elimination": JRuby 1. Next up we'll turn on some optimizations for math operators.



0コメント

  • 1000 / 1000