Tampilkan postingan dengan label NDK. Tampilkan semua postingan
Tampilkan postingan dengan label NDK. Tampilkan semua postingan

Rabu, 13 April 2016

Optimize, Develop, and Debug with Vulkan Developer Tools

Posted by Shannon Woods, Technical Program Manager



Today we're pleased to bring you a preview of Android development tools for Vulkan™. Vulkan is a new 3D rendering API which we’ve helped to develop as a member of Khronos, geared at providing explicit, low-overhead GPU (Graphics Processor Unit) control to developers. Vulkan’s reduction of CPU overhead allows some synthetic benchmarks to see as much as 10 times the draw call throughput on a single core as compared to OpenGL ES. Combined with a threading-friendly API design which allows multiple cores to be used in parallel with high efficiency, this offers a significant boost in performance for draw-call heavy applications.



Vulkan support is available now via the Android N Preview on devices which support it, including Nexus 5X and Nexus 6P. (Of course, you will still be able to use OpenGL ES as well!)




To help developers start coding quickly, we’ve put together a set of samples and guides that illustrate how to use Vulkan effectively.



You can see Vulkan in action running on an Android device with Robert Hodgin’s Fish Tornado demo, ported by Google’s Art, Copy, and Code team:






Optimization: The Vulkan API



There are many similarities between OpenGL ES and Vulkan, but Vulkan offers new features for developers who need to make every millisecond count.




  • Application control of memory allocation. Vulkan provides mechanisms for fine-grained control of how and when memory is allocated on the GPU. This allows developers to use their own allocation and recycling policies to fit their application, ultimately reducing execution and memory overhead and allowing applications to control when expensive allocations occur.

  • Asynchronous command generation. In OpenGL ES, draw calls are issued to the GPU as soon as the application calls them. In Vulkan, the application instead submits draw calls to command buffers, which allows the work of forming and recording the draw call to be separated from the act of issuing it to the GPU. By spreading command generation across several threads, applications can more effectively make use of multiple CPU cores. These command buffers can also be reused, reducing the overhead involved in command creation and issuance.

  • No hidden work. One OpenGL ES pitfall is that some commands may trigger work at points which are not explicitly spelled out in the API specification or made obvious to the developer. Vulkan makes performance more predictable and consistent by specifying which commands will explicitly trigger work and which will not.

  • Multithreaded design, from the ground up. All OpenGL ES applications must issue commands for a context only from a single thread in order to render predictably and correctly. By contrast, Vulkan doesn’t have this requirement, allowing applications to do work like command buffer generation in parallel— but at the same time, it doesn’t make implicit guarantees about the safety of modifying and reading data from multiple threads at the same time. The power and responsibility of managing thread synchronization is in the hands of the application.

  • Mobile-friendly features. Vulkan includes features particularly helpful for achieving high performance on tiling GPUs, used by many mobile devices. Applications can provide information about the interaction between separate rendering passes, allowing tiling GPUs to make effective use of limited memory bandwidth, and avoid performing off-chip reads.

  • Offline shader compilation. Vulkan mandates support for SPIR-V, an intermediate language for shaders. This allows developers to compile shaders ahead of time, and ship SPIR-V binaries with their applications. These binaries are simpler to parse than high-level languages like GLSL, which means less variance in how drivers perform this parsing. SPIR-V also opens the door for third parties to provide compilers for specialized or cross-platform shading languages.

  • Optional validation. OpenGL ES validates every command you call, checking that arguments are within expected ranges, and objects are in the correct state to be operated upon. Vulkan doesn’t perform any of this validation itself. Instead, developers can use optional debug tools to ensure their calls are correct, incurring no run-time overhead in the final product.




Debugging: Validation Layers


As noted above, Vulkan’s lack of implicit validation requires developers to make use of tools outside the API in order to validate their code. Vulkan’s layer mechanism allows validation code and other developer tools to inspect every API call during development, without incurring any overhead in the shipping version. Our guides show you how to build the validation layers for use with the Android NDK, giving you the tools necessary to build bug-free Vulkan code from start to finish.




Develop: Shader toolchain



The Shaderc collection of tools provides developers with build-time and run-time tools for compiling GLSL into SPIR-V. Shaders can be compiled at build time using glslc, a command-line compiler, for easy integration into existing build systems. Or, for shaders which are generated or edited during execution, developers can use the Shaderc library to compile GLSL shaders to SPIR-V via a C interface. Both tools are built on top of Khronos’s reference compiler.




Additional Resources


The Vulkan ecosystem is a broad one, and the resources to get you started don’t end here. There is a wealth of material to explore, including:


























Jumat, 16 Oktober 2015

Game Performance: Vertex Array Objects

Posted by Shanee Nishry



Previously, we showed how you can use vertex layout qualifiers to increase the performance and determinism of your OpenGL application. In this post, we’ll show another useful technique that will help you produce increased performance and cleaner code when drawing objects.



Binding the vertex buffer



Before drawing onto the screen, you need to bind your vertex data (e.g. positions, normals, UVs) to the corresponding vertex shader attributes. To do that, you need to bind the vertex buffer, enable the generic vertex attribute, and use glVertexAttribPointer to describe the layout of the buffer.



Therefore, a draw call might look like this:



const GLuint ATTRIBUTE_LOCATION_POSITIONS   = 0;
const GLuint ATTRIBUTE_LOCATION_TEXTUREUV = 1;
const GLuint ATTRIBUTE_LOCATION_NORMALS = 2;

// Bind shader program, uniforms and textures
// ...

// Bind the vertex buffer
glBindBuffer( GL_ARRAY_BUFFER, vertex_buffer_object );

// Set the vertex attributes
glEnableVertexAttribArray( ATTRIBUTE_LOCATION_POSITIONS );
glVertexAttribPointer( ATTRIBUTE_LOCATION_POSITIONS, 3, GL_FLOAT, GL_FALSE, 32, 0 );

glEnableVertexAttribArray( ATTRIBUTE_LOCATION_TEXTUREUV );
glVertexAttribPointer( ATTRIBUTE_LOCATION_TEXTUREUV, 2, GL_FLOAT, GL_FALSE, 32, 12 );

glEnableVertexAttribArray( ATTRIBUTE_LOCATION_NORMALS );
glVertexAttribPointer( ATTRIBUTE_LOCATION_NORMALS, 3, GL_FLOAT, GL_FALSE, 32, 20 );

// Draw elements
glDrawElements( GL_TRIANGLES, count, GL_UNSIGNED_SHORT, 0 );


There are several reasons why we might not like this code very much. The first is that we need to cache the layout of the vertex buffer to enable and disable the right attributes before drawing. This means we are either hard-coding or saving some amount of data for a nearly meaningless task.



The second reason is performance. Having to tell the drivers which attributes to individually activate is suboptimal. It would be best if we could precompile this information and deliver it all at once.



Lastly, and purely for aesthetics, our draw call is cluttered by long boilerplate code. It would be nice to get rid of it.








Did you know there is another reason why someone might frown on this code? The code is making use of layout qualifiers which is great! But, since it’s already using OpenGL ES 3+, it would be even better if the code also used Geometry Instancing. By batching many instances of a mesh into a single draw call, you can really boost performance.




So how can we improve on the above code?



Vertex Array Objects (VAOs)



If you are using OpenGL ES 3 or higher, you should use Vertex Array Objects (or "VAOs") to store your vertex attribute state.



Using a VAO allows the drivers to compile the vertex description format for repeated use. In addition, this frees you from having to cache the vertex format needed for glVertexAttribPointer, and it also results in less per-draw boilerplate code.



Creating Vertex Array Objects



The first thing you need to do is create your VAO. This is created once per mesh, alongside the vertex buffer object and is done like this:



const GLuint ATTRIBUTE_LOCATION_POSITIONS   = 0;
const GLuint ATTRIBUTE_LOCATION_TEXTUREUV = 1;
const GLuint ATTRIBUTE_LOCATION_NORMALS = 2;

// Bind the vertex buffer object
glBindBuffer( GL_ARRAY_BUFFER, vertex_buffer_object );

// Create a VAO
GLuint vao;
glGenVertexArrays( 1, &vao );
glBindVertexArray( vao );

// Set the vertex attributes as usual
glEnableVertexAttribArray( ATTRIBUTE_LOCATION_POSITIONS );
glVertexAttribPointer( ATTRIBUTE_LOCATION_POSITIONS, 3, GL_FLOAT, GL_FALSE, 32, 0 );

glEnableVertexAttribArray( ATTRIBUTE_LOCATION_TEXTUREUV );
glVertexAttribPointer( ATTRIBUTE_LOCATION_TEXTUREUV, 2, GL_FLOAT, GL_FALSE, 32, 12 );

glEnableVertexAttribArray( ATTRIBUTE_LOCATION_NORMALS );
glVertexAttribPointer( ATTRIBUTE_LOCATION_NORMALS, 3, GL_FLOAT, GL_FALSE, 32, 20 );

// Unbind the VAO to avoid accidentally overwriting the state
// Skip this if you are confident your code will not do so
glBindVertexArray( 0 );


You have probably noticed that this is very similar to our previous code section except that we now have the addition of:



// Create a vertex array object
GLuint vao;
glGenVertexArrays( 1, &vao );
glBindVertexArray( vao );


These lines create and bind the VAO. All glEnableVertexAttribArray and glVertexAttribPointer calls after that are recorded in the currently bound VAO, and that greatly simplifies our per-draw procedure as all you need to do is use the newly created VAO.



Using the Vertex Array Object



The next time you want to draw using this mesh all you need to do is bind the VAO using glBindVertexArray.



// Bind shader program, uniforms and textures
// ...

// Bind Vertex Array Object
glBindVertexArray( vao );

// Draw elements
glDrawElements( GL_TRIANGLES, count, GL_UNSIGNED_SHORT, 0 );


You no longer need to go through all the vertex attributes. This makes your code cleaner, makes per-frame calls shorter and more efficient, and allows the drivers to optimize the binding stage to increase performance.








Did you notice we are no longer calling glBindBuffer? This is because calling glVertexAttribPointer while recording the VAO references the currently bound buffer even though the VAO does not record glBindBuffer calls on itself.



Want to learn more how to improve your game performance? Check out our Game Performance article series. If you are building on Android you might also be interested in the Android Performance Patterns.



Kamis, 15 Oktober 2015

Android Support Library 23.1

Posted by Ian Lake, Developer Advocate



The Android Support Library is a collection of libraries available on a wide array of API levels that help you focus on the unique parts of your app, providing pre-built components, new functionality, and compatibility shims.



With the latest release of the Android Support Library (23.1), you will see improvements across the Support V4, Media Router, RecyclerView, AppCompat, Design, Percent, Custom Tabs, Leanback, and Palette libraries. Let’s take a closer look.



Support V4



The Support V4 library focuses on making supporting a wide variety of API levels straightforward with compatibility shims and backporting specific functionality.



NestedScrollView is a ScrollView that supports nested scrolling back to API 4. You’ll now be able to set a OnScrollChangeListener to receive callbacks when the scroll X or Y positions change.



There are a lot of pieces that make up a fully functioning media playback app, with much of it centered around MediaSessionCompat. A media button receiver, a key part to handling playback controls from hardware or bluetooth controls, is now formalized in the new MediaButtonReceiver class. This class makes it possible to forward received playback controls to a Service which is managing your MediaSessionCompat, reusing the Callback methods already required for API 21+, centralizing support on all API levels and all media control events in one place. A simplified constructor for MediaSessionCompat is also available, automatically finding a media button receiver in your manifest for use with MediaSessionCompat.



Media Router



The Media Router Support Library is the key component for connecting and sending your media playback to remote devices, such as video and audio devices with Google Cast support. It also provides the mechanism, via MediaRouteProvider, to enable any application to create and manage a remote media playback device connection.



In this release, MediaRouteChooserDialog (the dialog that controls selecting a valid remote device) and MediaRouteControllerDialog (the dialog to control ongoing remote playback) have both received a brand new design and additional functionality as well. You’ll find the chooser dialog sorts devices by frequency of use and includes a device type icon for easy identification of different devices while the controller dialog now shows current playback information (including album art).





To feel like a natural part of your app, the content color for both dialogs is now based on the colorPrimary of your alert dialog theme:



<!-- Your app theme set on your Activity -->
<style name="AppTheme" parent="Theme.AppCompat.Light.DarkActionBar">
<item name="colorPrimary">@color/primary</item>
<item name="colorPrimaryDark">@color/primaryDark</item>
<item name="alertDialogTheme">@style/AppTheme.Dialog</item>
</style>


<!-- Theme for the dialog itself -->
<style name="AppTheme.Dialog" parent="Theme.AppCompat.Light.Dialog.Alert">
<item name="colorPrimary">@color/primary</item>
<item name="colorPrimaryDark">@color/primaryDark</item>
</style>



RecyclerView



RecyclerView is an extremely powerful and flexible way to show a list, grid, or any view of a large set of data. One advantage over ListView or GridView is the built in support for animations as items are added, removed, or repositioned.



This release significantly changes the animation system for the better. By using the new ItemAnimator’s canReuseUpdatedViewHolder() method, you’ll be able to choose to reuse the existing ViewHolder, enabling item content animation support. The new ItemHolderInfo and associated APIs give the ItemAnimator the flexibility to collect any data it wants at the correct point in the layout lifecycle, passing that information into the animate callbacks.



Note that this new API is not backward compatible. If you previously implemented an ItemAnimator, you can instead extend SimpleItemAnimator, which provides the old API by wrapping the new API. You’ll also notice that some methods have been entirely removed from ItemAnimator. For example, if you were calling recyclerView.getItemAnimator().setSupportsChangeAnimations(false), this code won’t compile anymore. You can replace it with:



ItemAnimator animator = recyclerView.getItemAnimator();
if (animator instanceof SimpleItemAnimator) {
((SimpleItemAnimator) animator).setSupportsChangeAnimations(false);
}


AppCompat



One component of the AppCompat Support Library has been in providing a consistent set of widgets across all API levels, including the ability to tint those widgets to match your branding and accent colors.



This release adds tint aware versions of SeekBar (for tinting the thumb) as well as ImageButton and ImageView (providing backgroundTint support) which will automatically be used when you use the platform versions in your layouts. You’ll also find that SwitchCompat has been updated to match the styling found in Android 6.0 Marshmallow.



Design



The Design Support Library includes a number of components to help implement the latest in the Google design specifications.



TextInputLayout expands its existing functionality of floating hint text and error indicators with new support for character counting.



AppBarLayout supports a number of scroll flags which affect how children views react to scrolling (e.g. scrolling off the screen). New to this release is SCROLL_FLAG_SNAP, ensuring that when scrolling ends, the view is not left partially visible. Instead, it will be scrolled to its nearest edge, making fully visible or scrolled completely off the screen. You’ll also find that AppBarLayout now allows users to start scrolling from within the AppBarLayout rather than only from within your scrollable view - this behavior can be controlled by adding a DragCallback.



NavigationView provides a convenient way to build a navigation drawer, including the ability to creating menu items using a menu XML file. We’ve expanded the functionality possible with the ability to set custom views for items via app:actionLayout or using MenuItemCompat.setActionView().



Percent



The Percent Support Library provides percentage based dimensions and margins and, new to this release, the ability to set a custom aspect ratio via app:aspectRatio. By setting only a single width or height and using aspectRatio, the PercentFrameLayout or PercentRelativeLayout will automatically adjust the other dimension so that the layout uses a set aspect ratio.



Custom Tabs



The Custom Tabs Support Library allows your app to utilize the full features of compatible browsers including using pre-existing cookies while still maintaining a fast load time (via prefetching) and a custom look and actions.



In this release, we’re adding a few additional customizations such as hiding the url bar when the page is scrolled down with the new enableUrlBarHiding() method. You’ll also be able to update the action button in an already launched custom tab via your CustomTabsSession with setActionButton() - perfect for providing a visual indication of a state change.



Navigation events via CustomTabsCallback#onNavigationEvent() have also been expanded to include the new TAB_SHOWN and TAB_HIDDEN events, giving your app more information on how the user interacts with your web content.



Leanback



The Leanback library makes it easy to build user interfaces on TV devices. This release adds GuidedStepSupportFragment for a support version of GuidedStepFragment as well as improving animations and transitions and allowing GuidedStepFragment to be placed on top of existing content.



You’ll also be able to annotate different types of search completions in SearchFragment and staggered slide transition support for VerticalGridFragment.



Palette



Palette, used to extract colors from images, now supports extracting from a specific region of a Bitmap with the new setRegion() method.



SDK available now!



There’s no better time to get started with the Android Support Library. You can get started developing today by updating the Android Support Repository from the Android SDK Manager.



For an in depth look at every API change in this release, check out the full API diff.



To learn more about the Android Support Library and the APIs available to you through it, visit the Support Library section on the Android Developer site.


Rabu, 09 September 2015

Play Games Loot Drop for Developers

Posted by Ben Frenkel, Product Manager Google Play Games


Launched last March, Player Analytics is already becoming an important tool for many game developers, helping them to manage their games businesses and optimize in-game player behavior. Today we’re expanding Player Analytics with two new analytics reports that give you better visibility into time-based player activity and custom game events. We’re also introducing a new Player Stats API to let you tune your game experience for specific segments of players across the game lifecycle. Along with those, we’re rolling out a new version of our C++/iOS SDKs and Unity plug-in and giving you better tools to manage repeating Quests.


New useful reports for developers



We are launching two new reports later this week in the Play Games developer console: the Player Time Series Explorer and the Events Viewer. We’ve also made improvements to our player retention report.


Player Time Series Explorer



Ever wondered what your players are doing in the first few minutes of gameplay? What happens just before players spend or churn? The time-series explorer lets you understand what happens in these critical moments for your players.


For example, you carefully built out the first set of experiences in your game, but are surprised by how many players never get through even the first set of challenges. With the Player Time Series Explorer, you can now see which challenges are impeding player progress most, and make targeted improvements to decrease the rate of churn. Learn more.


Customize settings to explore player time series




Select from a list of preset questions




Find out what happens before your players spend for the first time


Select “What happens before first spend” to see what happens just before your players spend for the first time. Time series are aligned by first spend event so you can easily explore what happened just before and after first purchase.




Find out what happens before your players churn


Select “What happens before churn” to see what happens before your players stop playing. In the example below, all the churn events are right aligned to make it easier to compare player time series.




Hovering over events shows you additional details

You can see more details for all event types by holding your cursor over the event’s shape. In this example, you can see that “Player 03” spent $4.99 after earning six achievements. Hovering over the achievement shapes will show you which specific achievements were earned.




Events Viewer



Now you can create your own reports based on your custom Play Games’ events. You can select multiple events to display and bookmark the report for easy access. Learn more.


Here’s an example showing how a developer can compare the rates at which Players are entering contests, winning, and almost winning. This report would identify opportunities to improve the balance of its contest modes. You can then bookmark the settings so you can easily track improvements.




28x28 day retention grid



We added a 28-day-by-28-day retention grid to help you compare retention rates across a larger number of new user cohorts.




Tailor player experiences with the Player Stats API



Stats and reports give you insights into your what your players are doing, but wouldn’t it be nice to take action on those insights in your game? That’s what the Player Stats API is all about. The Player Stats API lets you tailor player experiences to specific segments of players across the game lifecycle. Player segments are based on player progression, spend, and engagement.


Here are some examples of what you can do with Player Stats API:


  • For highly engaged players that just aren’t spending, you can show them special bonuses that are aimed at recruiting others to play instead of spending

  • For your most prolific spenders, you can provide occasional free gifts and upgrades

  • For users that haven’t found their stride in your game, you can show them a video that directs them to community features, like clan attacks or alliances, that drive deeper engagement

  • For players that have been away from the game for a while, you can give them a welcome back message that acknowledges impressive accomplishments, and award a badge designed to encourage return play


The Player Stats API is launching in the next few weeks.


C++/iOS SDK and Unity Plug-in updates



iOS support for Play game services just got a lot better. This update includes improved CocoaPods support, which will make it easier to configure Play game services in Xcode. This also means you’ll have a much easier time building for iOS using the Unity plug-in as well.


The latest build of the C++/iOS SDKs is now built on the new Google Sign-In framework, which adds support for authentication via multiple Google apps, including Gmail and YouTube. More importantly, if a player does not have any applicable Google apps installed, the Sign-In framework will bring up a webview within the app for authentication. Opening up a webview inside the app, instead of switching to a separate browser instance, makes for a much better user experience, and addresses a top developer request. For more on the new Google Sign-in library on iOS, check out this video. Learn more.


Improved Quests



Quests are a great way of engaging your players with new goals, and with this update we have made managing Quests easier with the introduction of repeating Quests. You can create Quests that run weekly or monthly by checking the repeating quest box. This will make it easier for you to engage your players with regularly occurring challenges. Repeating Quests will be launching in the next few weeks.


If you have previously integrated Quests, you can easily convert them into repeating quests by following two easy steps.


1. Go to Quests section of developer console, and open up an existing Quest. Click the copy Quest button at the top of the page




2. Scroll down to the Schedule section of the Quest form, check the “Repeating quest” box, select between monthly and weekly quests under “Repeats”, and leave the “Ends:” field set to “Never”. After hitting save, you are done! From then on, the quest will run weekly or monthly until you decide to end it.




Google Play game services (GPGS) docs and SDK downloads


  • GPGS overview: https://developers.google.com/games/services/?utm_campaign=cwag-voice-actions-94

  • Gettings started with GPGS guide: https://developers.google.com/games/services/console/enabling?utm_campaign=cwag-voice-actions-94

  • Downloads

    • Android, iOS, and C++ SDK

    • Unity Plugin


Kamis, 30 Juli 2015

Get your hands on Android Studio 1.3





Posted by Jamal Eason, Product Manager, Android



Previewed earlier this summer at Google I/O, Android Studio 1.3 is now available on the stable release channel. We appreciated the early feedback from those developers on our canary and beta channels to help ship a great product.



Android Studio 1.3 is our biggest feature release for the year so far, which includes a new memory profiler, improved testing support, and full editing and debugging support for C++. Let’s take a closer look.




New Features in Android Studio 1.3




Performance & Testing Tools



  • Android Memory (HPROF) Viewer

    Android Studio now allows you to capture and analyze memory snapshots in the native Android HPROF format.





  • Allocation Tracker

    In addition to displaying a table of memory allocations that your app uses, the updated allocation tracker now includes a visual way to view the your app allocations.





  • APK Tests in Modules

    For more flexibility in app testing, you now have the option to place your code tests in a separate module and use the new test plugin (‘com.android.test’) instead of keeping your tests right next to your app code. This feature does require your app project to use the Gradle Plugin 1.3.






Code and SDK Management




  • App permission annotations

    Android Studio now has inline code annotation support to help you manage the new app permissions model in the M release of Android. Learn more about code annotations.





  • Data Binding Support

    New data brinding features allow you to create declarative layouts in order to minimize boilerplate code by binding your application logic into your layouts. Learn more about data binding.





  • SDK Auto Update & SDK Manager

    Managing Android SDK updates is now a part of the Android Studio. By default, Android Studio will now prompt you about new SDK & Tool updates. You can still adjust your preferences with the new & integrated Android SDK Manager.







  • C++ Support

    As a part of the Android 1.3 stable release, we included an Early Access Preview of the C++ editor & debugger support paired with an experimental build plugin. See the Android C++ Preview page for information on how to get started. Support for more complex projects and build configurations is in development, but let us know your feedback.






Time to Update




An important thing to remember is that an update to Android Studio does not require you to change your Android app projects. With updating, you get the latest features but still have control of which build tools and app dependency versions you want to use for your Android app.



For current developers on Android Studio, you can check for updates from the navigation menu. For new users, you can learn more about Android Studio on the product overview page or download the stable version from the Android Studio download site.



We are excited to launch this set of features in Android Studio and we are hard at work developing the next set of tools to make develop Android development easier on Android Studio. As always we welcome feedback on how we can help you. Connect with the Android developer tools team on Google+.




Kamis, 02 Juli 2015

Game Performance: Data-Oriented Programming

Posted by Shanee Nishry, Game Developer Advocate



To improve game performance, we’d like to highlight a programming paradigm that will help you maximize your CPU potential, make your game more efficient, and code smarter.



Before we get into detail of data-oriented programming, let’s explain the problems it solves and common pitfalls for programmers.



Memory


The first thing a programmer must understand is that memory is slow and the way you code affects how efficiently it is utilized. Inefficient memory layout and order of operations forces the CPU idle waiting for memory so it can proceed doing work.



The easiest way to demonstrate is by using an example. Take this simple code for instance:



char data[1000000]; // One Million bytes
unsigned int sum = 0;

for ( int i = 0; i < 1000000; ++i )
{
sum += data[ i ];
}


An array of one million bytes is declared and iterated on one byte at a time. Now let's change things a little to illustrate the underlying hardware. Changes marked in bold:



char data[16000000]; // Sixteen Million bytes
unsigned int sum = 0;

for ( int i = 0; i < 16000000; i += 16 )
{
sum += data[ i ];
}


The array is changed to contain sixteen million bytes and we iterate over one million of them, skipping 16 at a time.



A quick look suggests there shouldn't be any effect on performance as the code is translated to the same number of instructions and runs the same number of times, however that is not the case. Here is the difference graph. Note that this is on a logarithmic scale--if the scale were linear, the performance difference would be too large to display on any reasonably-sized graph!




Graph in logarithmic scale


The simple change making the loop skip 16 bytes at a time makes the program run 5 times slower!



The average difference in performance is 5x and is consistent when iterating 1,000 bytes up to a million bytes, sometimes increasing up to 7x. This is a serious change in performance.



Note: The benchmark was run on multiple hardware configurations including a desktop with Intel 5930K 3.50GHz CPU, a Macbook Pro Retina laptop with 2.6 GHz Intel i7 CPU and Android Nexus 5 and Nexus 6 devices. The results were pretty consistent.



If you wish to replicate the test, you might have to ensure the memory is out of the cache before running the loop because some compilers will cache the array on declaration. Read below to understand more on how it works.



Explanation


What happens in the example is quite simply explained when you understand how the CPU accesses data. The CPU can’t access data in RAM; the data must be copied to the cache, a smaller but extremely fast memory line which resides near the CPU chip.



When the program starts, the CPU is set to run an instruction on part of the array but that data is still not in the cache, therefore causing a cache miss and forcing the CPU to wait for the data to be copied into the cache.



For simplicity sake, assume a cache size of 16 bytes for the L1 cache line, this means 16 bytes will be copied starting from the requested address for the instruction.



In the first code example, the program next tries to operate on the following byte, which is already copied into the cache following the initial cache miss, therefore continuing smoothly. This is also true for the next 14 bytes. After 16 bytes, since the first cache miss the loop, will encounter another cache miss and the CPU will again wait for data to operate on, copying the next 16 bytes into the cache.



In the second code sample, the loop skips 16 bytes at a time but hardware continues to operate the same. The cache copies the 16 subsequent bytes each time it encounters a cache miss which means the loop will trigger a cache miss with each iteration and cause the CPU to wait idle for data each time!



Note: Modern hardware implements cache prefetch algorithms to prevent incurring a cache miss per frame, but even with prefetching, more bandwidth is used and performance is lower in our example test.





In reality the cache lines tend to be larger than 16 bytes, the program would run much slower if it were to wait for data at every iteration. A Krait-400 found in the Nexus 5 has a L0 data cache of 4 KB with 64 Bytes per line.



If you are wondering why cache lines are so small, the main reason is that making fast memory is expensive.



Data-Oriented Design



The way to solve such performance issues is by designing your data to fit into the cache and have the program to operate on the entire data continuously.



This can be done by organizing your game objects inside Structures of Arrays (SoA) instead of Arrays of Structures (AoS) and pre-allocating enough memory to contain the expected data.



For example, a simple physics object in an AoS layout might look like this:



struct PhysicsObject
{
Vec3 mPosition;
Vec3 mVelocity;

float mMass;
float mDrag;
Vec3 mCenterOfMass;

Vec3 mRotation;
Vec3 mAngularVelocity;

float mAngularDrag;
};


This is a common way way to present an object in C++.



On the other hand, using SoA layout looks more like this:



class PhysicsSystem
{
private:
size_t mNumObjects;
std::vector< Vec3 > mPositions;
std::vector< Vec3 > mVelocities;
std::vector< float > mMasses;
std::vector< float > mDrags;

// ...
};


Let’s compare how a simple function to update object positions by their velocity would operate.



For the AoS layout, a function would look like this:



void UpdatePositions( PhysicsObject* objects, const size_t num_objects, const float delta_time )
{
for ( int i = 0; i < num_objects; ++i )
{
objects[i].mPosition += objects[i].mVelocity * delta_time;
}
}


The PhysicsObject is loaded into the cache but only the first 2 variables are used. Being 12 bytes each amounts to 24 bytes of the cache line being utilised per iteration and causing a cache miss with every object on a 64 bytes cache line of a Nexus 5.



Now let’s look at the SoA way. This is our iteration code:



void PhysicsSystem::SimulateObjects( const float delta_time )
{
for ( int i = 0; i < mNumObjects; ++i )
{
mPositions[ i ] += mVelocities[i] * delta_time;
}
}


With this code, we immediately cause 2 cache misses, but we are then able to run smoothly for about 5.3 iterations before causing the next 2 cache misses resulting in a significant performance increase!



The way data is sent to the hardware matters. Be aware of data-oriented design and look for places it will perform better than object-oriented code.



We have barely scratched the surface. There is still more to data-oriented programming than structuring your objects. For example, the cache is used for storing instructions and function memory so optimizing your functions and local variables affects cache misses and hits. We also did not mention the L2 cache and how data-oriented design makes your application easier to multithread.



Make sure to profile your code to find out where you might want to implement data-oriented design. You can use different profilers for different architecture, including the NVIDIA Tegra System Profiler, ARM Streamline Performance Analyzer, Intel and PowerVR PVRMonitor.



If you want to learn more on how to optimize for your cache, read on cache prefetching for various CPU architectures.



Senin, 25 Mei 2015

Game Performance: Geometry Instancing

Posted by Shanee Nishry, Games Developer Advocate



Imagine a beautiful virtual forest with countless trees, plants and vegetation, or a stadium with countless people in the crowd cheering. If you are heroic you might like the idea of an epic battle between armies.



Rendering a lot of meshes is desired to create a beautiful scene like a forest, a cheering crowd or an army, but doing so is quite costly and reduces the frame rate. Fortunately this is possible using a simple technique called Geometry Instancing.



Geometry instancing can be used in 2D games for rendering a large number of sprites, or in 3D for things like particles, characters and environment.



The NDK code sample More Teapots demoing the content of this article can be found with the ndk inside the samples folder and in the git repository.



Support and Extensions



Geometry instancing is available from OpenGL ES 3.0 and to OpenGL 2.0 devices which support the GL_NV_draw_instanced or GL_EXT_draw_instanced extensions. More information on how to using the extensions is shown in the More Teapots demo.



Overview



Submitting draw calls causes OpenGL to queue commands to be sent to the GPU, this has an expensive overhead which may affect performance. This overhead grows when changing states such as alpha blending function, active shader, textures and buffers.



Geometry Instancing is a technique that combines multiple draws of the same mesh into a single draw call, resulting in reduced overhead and potentially increased performance. This works even when different transformations are required.



The algorithm



To explain how Geometry Instancing works let’s quickly overview traditional drawing.



Traditional Drawing



To a draw a mesh you’d usually prepare a vertex buffer and an index buffer, bind your shader and buffers, set your uniforms such as a World View Projection matrix and make a draw call.





To draw multiple instances using the same mesh you set new uniform values for the transformations and other data and call draw again. This is repeated for every instance.





Drawing with Geometry Instancing



Geometry Instancing reduces CPU overhead by reducing the sequence described above into a single buffer and draw call.



It works by using an additional buffer which contains custom per-instance data needed by your shader, such as transformations, color, light data.



The first change to your workflow is to create the additional buffer on initialization stage.







To put it into code let’s define an example per-instance data that includes a world view projection matrix and a color:



C++



struct PerInstanceData
{
Mat4x4 WorldViewProj;
Vector4 Color;
};


You also need to the structure to your shader. The easiest way is by creating a Uniform Block with an array:



GLSL



#define MAX_INSTANCES 512

layout(std140) uniform PerInstanceData {
struct
{
mat4 uMVP;
vec4 uColor;
} Data[ MAX_INSTANCES ];
};



Note that uniform blocks have limited sizes. You can find the maximum number of bytes you can use by querying for GL_MAX_UNIFORM_BLOCK_SIZE using glGetIntegerv.



Example:



GLint max_block_size = 0;
glGetIntegerv( GL_MAX_UNIFORM_BLOCK_SIZE, &max_block_size );



Bind the uniform block on the CPU in your program’s initialization stage:




C++



#define MAX_INSTANCES 512
#define BINDING_POINT 1
GLuint shaderProgram; // Compiled shader program

// Bind Uniform Block
GLuint blockIndex = glGetUniformBlockIndex( shaderProgram, "PerInstanceData" );
glUniformBlockBinding( shaderProgram, blockIndex, BINDING_POINT );


And create a corresponding uniform buffer object:



C++


// Create Instance Buffer
GLuint instanceBuffer;

glGenBuffers( 1, &instanceBuffer );
glBindBuffer( GL_UNIFORM_BUFFER, instanceBuffer );
glBindBufferBase( GL_UNIFORM_BUFFER, BINDING_POINT, instanceBuffer );

// Initialize buffer size
glBufferData( GL_UNIFORM_BUFFER, MAX_INSTANCES * sizeof( PerInstanceData ), NULL, GL_DYNAMIC_DRAW );


The next step is to update the instance data every frame to reflect changes to the visible objects you are going to draw. Once you have your new instance buffer you can draw everything with a single call to glDrawElementsInstanced.





You update the instance buffer using glMapBufferRange. This function locks the buffer and retrieves a pointer to the byte data allowing you to copy your per-instance data. Unlock your buffer using glUnmapBuffer when you are done.



Here is a simple example for updating the instance data:



const int NUM_SCENE_OBJECTS = …; // number of objects visible in your scene which share the same mesh

// Bind the buffer
glBindBuffer( GL_UNIFORM_BUFFER, instanceBuffer );

// Retrieve pointer to map the data
PerInstanceData* pBuffer = (PerInstanceData*) glMapBufferRange( GL_UNIFORM_BUFFER, 0,
NUM_SCENE_OBJECTS * sizeof( PerInstanceData ),
GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT );

// Iterate the scene objects
for ( int i = 0; i < NUM_SCENE_OBJECTS; ++i )
{
pBuffer[ i ].WorldViewProj = ... // Copy World View Projection matrix
pBuffer[ i ].Color = … // Copy color
}

glUnmapBuffer( GL_UNIFORM_BUFFER ); // Unmap the buffer


And finally you can draw everything with a single call to glDrawElementsInstanced or glDrawArraysInstanced (depending if you are using an index buffer):



glDrawElementsInstanced( GL_TRIANGLES, NUM_INDICES, GL_UNSIGNED_SHORT, 0,
NUM_SCENE_OBJECTS );


You are almost done! There is just one more step to do. In your shader you need to make use of the new uniform buffer object for your transformations and colors. In your shader main program:



void main()
{

gl_Position = PerInstanceData.Data[ gl_InstanceID ].uMVP * inPosition;
outColor = PerInstanceData.Data[ gl_InstanceID ].uColor;
}


You might have noticed the use gl_InstanceID. This is a predefined OpenGL vertex shader variable that tells your program which instance it is currently drawing. Using this variable your shader can properly iterate the instance data and match the correct transformation and color for every vertex.



That’s it! You are now ready to use Geometry Instancing. If you are drawing the same mesh multiple times in a frame make sure to implement Geometry Instancing in your pipeline! This can greatly reduce overhead and improve performance.