How-To Tutorials

article-image-mongodb-is-partnering-with-alibaba

30 Oct 2019

3 min read

MongoDB is partnering with Alibaba

30 Oct 2019

Tensions between the U.S. and China have been frosty at best where trade is concerned. But MongoDB, based partly in Palo Alto in the heart of Silicon Valley, and with a HQ in New York, today announced that it was partnering with Chinese conglomerate AliBaba to bring Alibaba cloud users MongoDB-as-a-Service. While it's probably not going to bring Trump's ongoing trade war to an end, it could help MongoDB to position itself as the leading NoSQL database on the planet. What does the MongoDB and Alibaba partnership actually mean? In practical terms, it means that Alibaba's cloud customers will now have access to a fully supported version of MongoDB on Alibaba's data centers. That means complete access to all existing features of MongoDB, and Alibaba's support in escalating issues that may arise when they're using MongoDB. With MongoDB 4.2.0 released back in August, Alibaba users will also have the ability to take advantages of some of the database's new features, such as distributed transactions, and client-side field-level encryption. But that's just for Alibaba users - from MongoDB's perspective, this partnership cements its already impressive position in the Chinese market. "Over the past four years the most downloads of MongoDB have been from China" said Dev Ittycheria, MongoDB's President and CEO. For Alibaba, meanwhile, the partnership will likely only strengthen their position within the cloud market. Feifei Li, Vice President of the Alibaba Group spoke of supporting "a wide range of customer needs from open-source developers to enterprise IT teams of all sizes." Li didn't say anything much more revealing than that, choosing instead to focus on Alibaba's pitch to users: "Combined with Alibaba Cloud's native data analytics capabilities, working with partners like MongoDB will empower our customers to generate more business insights from their daily operations." A new direction for MongoDB? The partnership is particularly interesting in the context of MongoDB's licensing struggles over the last 12 months. Initially putting forward its Server Side Public License, the project later withdrew its application to the Open Source Foundation over what CTO Eliot Horowitz described as a lack of "community consensus." The SSPL was intended to protect MongoDB - and other projects like it - from "large cloud vendors... [that] capture all of the value but contribute nothing back to the community." It would appear that MongoDB is trying a new approach to this problem: instead of trying to outflank the vendors, its joining them. Explore Packt's newest MongoDB eBooks and videos.

0
0
25786

article-image-write-effective-stored-procedures-postgresql

Amey Varangaonkar

12 Dec 2017

8 min read

How to write effective Stored Procedures in PostgreSQL

Amey Varangaonkar

12 Dec 2017

8 min read

[box type="note" align="" class="" width=""]This article is an excerpt from the book Mastering PostgreSQL 9.6 written by Hans-Jürgen Schönig. PostgreSQL is an open source database used not only for handling large datasets (Big Data) but also as a JSON document database. It finds applications in the software as well as the web domains. This book will enable you to build better PostgreSQL applications and administer databases more efficiently. [/box] In this article, we explain the concept of Stored Procedures, and how to write them effectively in PostgreSQL 9.6. When it comes to stored procedures, PostgreSQL differs quite significantly from other database systems. Most database engines force you to use a certain programming language to write server-side code. Microsoft SQL Server offers Transact-SQL while Oracle encourages you to use PL/SQL. PostgreSQL does not force you to use a certain language but allows you to decide on what you know best and what you like best. The reason PostgreSQL is so flexible is actually quite interesting too in a historical sense. Many years ago, one of the most well-known PostgreSQL developers (Jan Wieck), who had written countless patches back in its early days, came up with the idea of using TCL as the server-side programming language. The trouble was simple—nobody wanted to use TCL and nobody wanted to have this stuff in the database engine. The solution to the problem was to make the language interface so flexible that basically any language can be integrated with PostgreSQL easily. Then, the CREATE LANGUAGE clause was born: test=# h CREATE LANGUAGE Command: CREATE LANGUAGE Description: define a new procedural language Syntax: CREATE [ OR REPLACE ] [ PROCEDURAL ] LANGUAGE name CREATE [ OR REPLACE ] [ TRUSTED ] [ PROCEDURAL ] LANGUAGE name HANDLER call_handler [ INLINE inline_handler ] [ VALIDATOR valfunction ] Nowadays, many different languages can be used to write stored procedures. The flexibility added to PostgreSQL back in the early days has really paid off, and so you can choose from a rich set of programming languages. How exactly does PostgreSQL handle languages? If you take a look at the syntax of the CREATE LANGUAGE clause, you will see a couple of keywords: HANDLER: This function is actually the glue between PostgreSQL and any external language you want to use. It is in charge of mapping PostgreSQL data structures to whatever is needed by the language and helps to pass the code around. VALIDATOR: This is the policeman of the infrastructure. If it is available, it will be in charge of delivering tasty syntax errors to the end user. Many languages are able to parse the code before actually executing it. PostgreSQL can use that and tell you whether a function is correct or not when you create it. Unfortunately, not all languages can do this, so in some cases, you will still be left with problems showing up at runtime. INLINE: If it is present, PostgreSQL will be able to run anonymous code blocks in this function. The anatomy of a stored procedure Before actually digging into a specific language, I want to talk a bit about the anatomy of a typical stored procedure. For demo purposes, I have written a function that just adds up two numbers: test=# CREATE OR REPLACE FUNCTION mysum(int, int) RETURNS int AS ' SELECT $1 + $2; ' LANGUAGE 'sql'; CREATE FUNCTION The first thing you can see is that the procedure is written in SQL. PostgreSQL has to know which language we are using, so we have to specify that in the definition. Note that the code of the function is passed to PostgreSQL as a string ('). That is somewhat noteworthy because it allows a function to become a black box to the execution machinery. In other database engines, the code of the function is not a string but is directly attached to the statement. This simple abstraction layer is what gives the PostgreSQL function manager all its power. Inside the string, you can basically use all that the programming language of your choice has to offer. In my example, I am simply adding up two numbers passed to the function. For this example, two integer variables are in use. The important part here is that PostgreSQL provides you with function overloading. In other words, the mysum(int, int) function is not the same as the mysum(int8, int8) function. PostgreSQL sees these things as two distinct functions. Function overloading is a nice feature; however, you have to be very careful not to accidentally deploy too many functions if your parameter list happens to change from time to time. Always make sure that functions that are not needed anymore are really deleted. The CREATE OR REPLACE FUNCTION clause will not change the parameter list. You can, therefore, use it only if the signature does not change. It will either error out or simply deploy a new function. Let's run the function: test=# SELECT mysum(10, 20); Mysum ------- 30 (1 row) The result is not really surprising. Introducing dollar quoting Passing code to PostgreSQL as a string is very flexible. However, using single quotes can be an issue. In many programming languages, single quotes show up frequently. To be able to use quotes, people have to escape them when passing the string to PostgreSQL. For many years this has been the standard procedure. Fortunately, those old times have passed by and new means to pass the code to PostgreSQL are available: test=# CREATE OR REPLACE FUNCTION mysum(int, int) RETURNS int AS $$ SELECT $1 + $2; $$ LANGUAGE 'sql'; CREATE FUNCTION The solution to the problem of quoting strings is called dollar quoting. Instead of using quotes to start and end strings, you can simple use $$. Currently, I am only aware of two languages that have assigned a meaning to $$. In Perl as well as in bash scripts, $$ represents the process ID. To overcome even this little obstacle, you can use $ almost anything $ to start and end the string. The following example shows how that works: test=# CREATE OR REPLACE FUNCTION mysum(int, int) RETURNS int AS $ $ SELECT $1 + $2; $ $ LANGUAGE 'sql'; CREATE FUNCTION All this flexibility allows you to really overcome the problem of quoting once and for all. As long as the start string and the end string match, there won't be any problems left. Making use of anonymous code blocks So far, you have learned to write the most simplistic stored procedures possible, and you have learned to execute code. However, there is more to code execution than just full-blown stored procedures. In addition to full-blown procedures, PostgreSQL allows the use of anonymous code blocks. The idea is to run code that is needed only once. This kind of code execution is especially useful to deal with administrative tasks. Anonymous code blocks don't take parameters and are not permanently stored in the database as they don't have a name anyway. Here is a simple example: test=# DO $$ BEGIN RAISE NOTICE 'current time: %', now(); END; $$ LANGUAGE 'plpgsql'; NOTICE: current time: 2016-12-12 15:25:50.678922+01 CONTEXT: PL/pgSQL function inline_code_block line 3 at RAISE DO In this example, the code only issues a message and quits. Again, the code block has to know which language it uses. The string is again passed to PostgreSQL using simple dollar quoting. Using functions and transactions As you know, everything that PostgreSQL exposes in user land is a transaction. The same, of course, applies if you are writing stored procedures. The procedure is always part of the transaction you are in. It is not autonomous—it is just like an operator or any other operation. Here is an example: All three function calls happen in the same transaction. This is important to understand because it implies that you cannot do too much transactional flow control inside a function. Suppose the second function call commits. What happens in such a case anyway? It cannot work. However, Oracle has a mechanism that allows for autonomous transactions. The idea is that even if a transaction rolls back, some parts might still be needed and should be kept. The classical example is as follows: Start a function to look up secret data Add a log line to the document that somebody has modified this important secret data Commit the log line but roll back the change You still want to know that somebody attempted to change data To solve problems like this one, autonomous transactions can be used. The idea is to be able to commit a transaction inside the main transaction independently. In this case, the entry in the log table will prevail while the change will be rolled back. As of PostgreSQL 9.6, autonomous transactions are not happening. However, I have already seen patches floating around that implement this feature. We will see when these features make it to the core. To give you an impression of how things will most likely work, here is a code snippet based on the first patches: AS $$ DECLARE PRAGMA AUTONOMOUS_TRANSACTION; BEGIN FOR i IN 0..9 LOOP START TRANSACTION; INSERT INTO test1 VALUES (i); IF i % 2 = 0 THEN COMMIT; ELSE ROLLBACK; END IF; END LOOP; RETURN 42; END; $$; ... The point in this example is that we can decide on the fly whether to commit or to roll back the autonomous transaction. More about stored procedures - on how to write them in different languages and how you can improve their performance - can be found in the book Mastering PostgreSQL 9.6. Make sure you order your copy now!

0
0
25784

article-image-troubleshooting-in-sql-server

Sunith Shetty

15 Mar 2018

16 min read

Troubleshooting in SQL Server

Sunith Shetty

15 Mar 2018

16 min read

[box type="note" align="" class="" width=""]This article is an excerpt from a book SQL Server 2017 Administrator's Guide written by Marek Chmel and Vladimír Mužný. This book will help you learn to implement and administer successful database solution with SQL Server 2017.[/box] Today, we will perform SQL Server analysis, and also learn ways for efficient performance monitoring and tuning. Performance monitoring and tuning Performance monitoring and tuning is a crucial part of your database administration skill set so as to keep the performance and stability of your server great, and to be able to find and fix the possible issues. The overall system performance can decrease over time; your system may work with more data or even become totally unresponsive. In such cases, you need the skills and tools to find the issue to bring the server back to normal as fast as possible. We can use several tools on the operating system layer and, then, inside the SQL Server to verify the performance and the possible root cause of the issue. The first tool that we can use is the performance monitor, which is available on your Windows Server: Performance monitor can be used to track important SQL Server counters, which can be very helpful in evaluating the SQL Server Performance. To add a counter, simply right-click on the monitoring screen in the Performance monitoring and tuning section and use the Add Counters item. If the SQL Server instance that you're monitoring is a default instance, you will find all the performance objects listed as SQL Server. If your instance is named, then the performance objects will be listed as MSSQL$InstanceName in the list of performance objects. We can split the important counters to watch between the system counters for the whole server and specific SQL Server counters. The list of system counters to watch include the following: Processor(_Total)% Processor Time: This is a counter to display the CPU load. Constant high values should be investigated to verify whether the current load does not exceed the performance limits of your HW or VM server, or if your workload is not running with proper indexes and statistics, and is generating bad query plans. MemoryAvailable MBytes: This counter displays the available memory on the operating system. There should always be enough memory for the operating system. If this counter drops below 64MB, it will send a notification of low memory and the SQL Server will reduce the memory usage. Physical Disk—Avg. Disk sec/Read: This disk counter provides the average latency information for your storage system; be careful if your storage is made of several different disks to monitor the proper storage system. Physical Disk: This indicates the average disk writes per second. Physical Disk: This indicates the average disk reads per second. Physical Disk: This indicates the number of disk writes per second. System—Processor Queue Length: This counter displays the number of threads waiting on a system CPU. If the counter is above 0, this means that there are more requests than the CPU can handle, and if the counter is constantly above 0, this may signal performance issues. Network interface: This indicates the total number of bytes per second. Once you have added all these system counters, you can see the values real time or you can configure a data collection, which will run for a specified selected time and periodically collect the information: With SQL Server-specific counters, we can dig deeper into the CPU, memory, and storage utilization to see what the SQL Server is doing and how the SQL Server is utilizing the subsystems. SQL Server memory monitoring and troubleshooting Important counters to watch for SQL Server memory utilization include counters from the SQL Server: Buffer Manager performance object and from SQL Server:Memory Manager: Important counters to watch for SQL Server memory utilization include counters from the SQL Server: Buffer Manager performance object and from SQL Server:Memory Manager: SQLServer-Buffer Manager—buffer cache hit ratio: This counter displays the ratio of how often the SQL Server can find the proper data in the cache when a query returns such data. If the data is not found in the cache, it has to be read from the disk. The higher the counter, the better the overall performance, since the memory access is usually faster compared to the disk subsystem. SQLServer-Buffer Manager—page life expectancy: This counter can measure how long a page can stay in the memory in seconds. The longer a page can stay in the memory, the less likely it will be for the SQL Server to need to access the disk in order to get the data into the memory again. SQL Server-Memory Manager—total server memory (KB): This is the amount of memory the server has committed using the memory manager. SQL Server-Memory Manager—target server memory (KB): This is the ideal amount of memory the server can consume. On a stable system, the target and total should be equal unless you face a memory pressure. Once the memory is utilized after the warm-up of your server, these two counters should not drop significantly, which would be another indication of system-level memory pressure, where the SQL Server memory manager has to deallocate memory. SQL Server-Memory Manager—memory grants pending: This counter displays the total number of SQL Server processes that are waiting to be granted memory from the memory manager. To check the performance counters, you can also use a T-SQL query, where you can query the sys.dm_os_performance_counters DMV: SELECT [counter_name] as [Counter Name], [cntr_value]/1024 as [Server Memory (MB)] FROM sys.dm_os_performance_counters WHERE [object_name] LIKE '%Memory Manager%' AND [counter_name] IN ('Total Server Memory (KB)', 'Target Server Memory (KB)') This query will return two values—one for target memory and one for total memory. These two should be close to each other on a warmed up system. Another query you can use is to get the information from a DMV named sys.dm_0s_sys_memory: SELECT total_physical_memory_kb/1024/1024 AS [Physical Memory (GB)], available_physical_memory_kb/1024/1024 AS [Available Memory (GB)], system_memory_state_desc AS [System Memory State] FROM sys.dm_os_sys_memory WITH (NOLOCK) OPTION (RECOMPILE) This query will display the available physical memory and the total physical memory of your server with several possible memory states: Available physical memory is high (this is a state you would like to see on your system, indicating there is no lack of memory) Physical memory usage is steady Available physical memory is getting low Available physical memory is low The memory grants can be verified with a T-SQL query: SELECT [object_name] as [Object name] , cntr_value AS [Memory Grants Pending] FROM sys.dm_os_performance_counters WITH (NOLOCK) WHERE [object_name] LIKE N'%Memory Manager%' AND counter_name = N'Memory Grants Pending' OPTION (RECOMPILE); If you face memory issues, there are several steps you can take for improvements: Check and configure your SQL Server max memory usage Add more RAM to your server; the limit for Standard Edition is 128 GB and there is no limit for SQL Server with Enterprise Use Lock Pages in Memory Optimize your queries SQL Server storage monitoring and troubleshooting The important counters to watch for SQL Server storage utilization would include counters from the SQL Server:Access Methods performance object: SQL Server-Access Methods—Full Scans/sec: This counter displays the number of full scans per second, which can be either table or full-index scans SQL Server-Access Methods—Index Searches/sec: This counter displays the number of searches in the index per second SQL Server-Access Methods—Forwarded Records/sec: This counter displays the number of forwarded records per second Monitoring the disk system is crucial, since your disk is used as a storage for the following: Data files Log files tempDB database Page file Backup files To verify the disk latency and IOPS metric of your drives, you can use the Performance monitor, or the T-SQL commands, which will query the sys.dm_os_volume_stats and sys.dm_io_virtual_file_stats DMF. Simple code to start with would be a T-SQL script utilizing the DMF to check the space available within the database files: SELECT f.database_id, f.file_id, volume_mount_point, total_bytes, available_bytes FROM sys.master_files AS f CROSS APPLY sys.dm_os_volume_stats(f.database_id, f.file_id); To check the I/O file stats with the second provided DMF, you can use a T-SQL code for checking the information about tempDB data files: SELECT * FROM sys.dm_io_virtual_file_stats (NULL, NULL) vfs join sys.master_files mf on mf.database_id = vfs.database_id and mf.file_id = vfs.file_id WHERE mf.database_id = 2 and mf.type = 0 To measure the disk performance, we can use a tool named DiskSpeed, which is a replacement for older SQLIO tool, which was used for a long time. DiskSpeed is an external utility, which is not available on the operating system by default. This tool can be downloaded from GitHub or the Technet Gallery at https://github.com/microsoft/diskspd. The following example runs a test for 15 seconds using a single thread to drive 100 percent random 64 KB reads at a depth of 15 overlapped (outstanding) I/Os to a regular file: DiskSpd –d300 -F1 -w0 -r –b64k -o15 d:datafile.dat Troubleshooting wait statistics We can use the whole Wait Statistics approach for a thorough understanding of the SQL Server workload and undertake performance troubleshooting based on the collected data. Wait Statistics are based on the fact that, any time a request has to wait for a resource, the SQL Server tracks this information, and we can use this information for further analysis. When we consider any user process, it can include several threads. A thread is a single unit of execution on SQL Server, where SQLOS controls the thread scheduling instead of relying on the operating system layer. Each processor core has it's own scheduler component responsible for executing such threads. To see the available schedulers in your SQL Server, you can use the following query: SELECT * FROM sys.dm_os_schedulers Such code will return all the schedulers in your SQL Server; some will be displayed as visible online and some as hidden online. The hidden ones are for internal system tasks while the visible ones are used by user tasks running on the SQL Server. There is one more scheduler, which is displayed as Visible Online (DAC). This one is used for dedicated administration connection, which comes in handy when the SQL Server stops responding. To use a dedicated admin connection, you can modify your SSMS connection to use the DAC, or you can use a switch with the sqlcmd.exe utility, to connect to the DAC. To connect to the default instance with DAC on your server, you can use the following command: sqlcmd.exe -E -A Each thread can be in three possible states: running: This indicates that the thread is running on the processor suspended: This indicates that the thread is waiting for a resource on a waiter list runnable: This indicates that the thread is waiting for execution on a runnable queue Each running thread runs until it has to wait for a resource to become available or until it has exhausted the CPU time for a running thread, which is set to 4 ms. This 4 ms time can be visible in the output of the previous query to sys.dm_os_schedulers and is called a quantum. When a thread requires any resource, it is moved away from the processor to a waiter list, where the thread waits for the resource to become available. Once the resource is available, the thread is notified about the resource availability and moves to the bottom of the runnable queue. Any waiting thread can be found via the following code, which will display the waiting threads and the resource they are waiting for: SELECT * FROM sys.dm_os_waiting_tasks The threads then transition between the execution at the CPU, waiter list, and runnable queue. There is a special case when a thread does not need to wait for any resource and has already run for 4 ms on the CPU, then the thread will be moved directly to the runnable queue instead of the waiter list. In the following image, we can see the thread states and the objects where the thread resides: When the thread is waiting on the waiter list, we can talk about a resource wait time. When the thread is waiting on the runnable queue to get on the CPU for execution, we can talk about the signal time. The total wait time is, then, the sum of the signal and resource wait times. You can find the ratio of the signal to resource wait times with the following script: Select signalWaitTimeMs=sum(signal_wait_time_ms) ,'%signal waits' = cast(100.0 * sum(signal_wait_time_ms) / sum (wait_time_ms) as numeric(20,2)) ,resourceWaitTimeMs=sum(wait_time_ms - signal_wait_time_ms) ,'%resource waits'= cast(100.0 * sum(wait_time_ms - signal_wait_time_ms) / sum (wait_time_ms) as numeric(20,2)) from sys.dm_os_wait_stats When the ratio goes over 30 percent for the signal waits, then there will be a serious CPU pressure and your processor(s) will have a hard time handling all the incoming requests from the threads. The following query can then grab the wait statistics and display the most frequent wait types, which were recorded through the thread executions, or actually during the time the threads were waiting on the waiter list for any particular resource: WITH [Waits] AS (SELECT [wait_type], [wait_time_ms] / 1000.0 AS [WaitS], ([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS], [signal_wait_time_ms] / 1000.0 AS [SignalS], [waiting_tasks_count] AS [WaitCount], 100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage], ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum] FROM sys.dm_os_wait_stats WHERE [wait_type] NOT IN ( N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR', N'BROKER_TASK_STOP', N'BROKER_TO_FLUSH', N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE', N'CHKPT', N'CLR_AUTO_EVENT', N'CLR_MANUAL_EVENT', N'CLR_SEMAPHORE', N'DIRTY_PAGE_POLL', N'DISPATCHER_QUEUE_SEMAPHORE', N'EXECSYNC', N'FSAGENT', N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX', N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION', N'HADR_LOGCAPTURE_WAIT', N'HADR_NOTIFICATION_DEQUEUE', N'HADR_TIMER_TASK', N'HADR_WORK_QUEUE', N'KSOURCE_WAKEUP', N'LAZYWRITER_SLEEP', N'LOGMGR_QUEUE', N'MEMORY_ALLOCATION_EXT', N'ONDEMAND_TASK_QUEUE', N'PREEMPTIVE_XE_GETTARGETSTATE', N'PWAIT_ALL_COMPONENTS_INITIALIZED', N'PWAIT_DIRECTLOGCONSUMER_GETNEXT', N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP', N'QDS_ASYNC_QUEUE', N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP', N'QDS_SHUTDOWN_QUEUE', N'REDO_THREAD_PENDING_WORK', N'REQUEST_FOR_DEADLOCK_SEARCH', N'RESOURCE_QUEUE', N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH', N'SLEEP_DBSTARTUP', N'SLEEP_DCOMSTARTUP', N'SLEEP_MASTERDBREADY', N'SLEEP_MASTERMDREADY', N'SLEEP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP', N'SLEEP_SYSTEMTASK', N'SLEEP_TASK', N'SLEEP_TEMPDBSTARTUP', N'SNI_HTTP_ACCEPT', N'SP_SERVER_DIAGNOSTICS_SLEEP', N'SQLTRACE_BUFFER_FLUSH', N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', N'SQLTRACE_WAIT_ENTRIES', N'WAIT_FOR_RESULTS', N'WAITFOR', N'WAITFOR_TASKSHUTDOWN', N'WAIT_XTP_RECOVERY', N'WAIT_XTP_HOST_WAIT', N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG', N'WAIT_XTP_CKPT_CLOSE', N'XE_DISPATCHER_JOIN', N'XE_DISPATCHER_WAIT', N'XE_TIMER_EVENT' ) AND [waiting_tasks_count] > 0 ) SELECT MAX ([W1].[wait_type]) AS [WaitType], CAST (MAX ([W1].[WaitS]) AS DECIMAL (16,2)) AS [Wait_S], CAST (MAX ([W1].[ResourceS]) AS DECIMAL (16,2)) AS [Resource_S], CAST (MAX ([W1].[SignalS]) AS DECIMAL (16,2)) AS [Signal_S], MAX ([W1].[WaitCount]) AS [WaitCount], CAST (MAX ([W1].[Percentage]) AS DECIMAL (5,2)) AS [Percentage], CAST ((MAX ([W1].[WaitS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgWait_S], CAST ((MAX ([W1].[ResourceS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgRes_S], CAST ((MAX ([W1].[SignalS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgSig_S] FROM [Waits] AS [W1] INNER JOIN [Waits] AS [W2] ON [W2].[RowNum] <= [W1].[RowNum] GROUP BY [W1].[RowNum] HAVING SUM ([W2].[Percentage]) - MAX( [W1].[Percentage] ) < 95 GO This code is available on the whitepaper, published by SQLSkills, named SQL Server Performance Tuning Using Wait Statistics by Erin Stellato and Jonathan Kehayias, which then refers the URL on SQL Skills and uses the full query by Paul Randal available at https://www.sqlskills.com/ blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/. Some of the typical wait stats you can see are: PAGEIOLATCH The PAGEIOLATCH wait type is used when the thread is waiting for a page to be read into the buffer pool from the disk. This wait type comes with two main forms: PAGEIOLATCH_SH: This page will be read from the disk PAGEIOLATCH_EX: This page will be modified You may quickly assume that the storage has to be the problem, but that may not be the case. Like any other wait, they need to be considered in correlation with other wait types and other counters available to correctly find the root cause of the slow SQL Server operations. The page may be read into the buffer pool, because it was previously removed due to memory pressure and is needed again. So, you may also investigate the following: Buffer Manager: Page life expectancy Buffer Manager: Buffer cache hit ratio Also, you need to consider the following as a possible factor to the PAGEIOLATCH wait types: Large scans versus seeks on the indexes Implicit conversions Inundated statistics Missing indexes PAGELATCH This wait type is quite frequently misplaced with PAGEIOLATCH but PAGELATCH is used for pages already present in the memory. The thread waits for the access to such a page again with possible PAGELATCH_SH and PAGELATCH_EX wait types. A pretty common situation with this wait type is a tempDB contention, where you need to analyze what page is being waited for and what type of query is actually waiting for such a resource. As a solution to the tempDB, contention you can do the following: Add more tempDB data files Use traceflags 1118 and 1117 for tempDB on systems older than SQL Server 2016 CXPACKET This wait type is encountered when any thread is running in parallel. The CXPACKET wait type itself does not mean that there is really any problem on the SQL Server. But if such a wait type is accumulated very quickly, it may be a signal of skewed statistics, which require an update or a parallel scan on the table where proper indexes are missing. The option for parallelism is controlled via MAX DOP setting, which can be configured on the following: The server level The database level A query level with a hint We learned about SQL Server analysis with the Wait Statistics troubleshooting methodology and possible DMVs to get more insight on the problems occurring in the SQL Server. To know more about how to successfully create, design, and deploy databases using SQL Server 2017, do checkout the book SQL Server 2017 Administrator's Guide.

0
0
25776

article-image-can-a-modified-mit-hippocratic-license-to-restrict-misuse-of-open-source-software-prompt-a-wave-of-ethical-innovation-in-tech

Savia Lobo

24 Sep 2019

5 min read

Can a modified MIT ‘Hippocratic License’ to restrict misuse of open source software prompt a wave of ethical innovation in tech?

Savia Lobo

24 Sep 2019

5 min read

Open source licenses allow software to be freely distributed, modified, and used. These licenses give developers an additional advantage of allowing others to use their software as per their own rules and conditions. Recently, software developer and open-source advocate Coraline Ada Ehmke has caused a stir in the software engineering community with ‘The Hippocratic License.’ Ehmke was also the original author of Contributor Covenant, a “code of conduct" for open source projects that encourages participants to use inclusive language and to refrain from personal attacks and harassment. In a tweet posted in September last year, following the code of conduct, she mentioned, “40,000 open source projects, including Linux, Rails, Golang, and everything OSS produced by Google, Microsoft, and Apple have adopted my code of conduct.” [box type="shadow" align="" class="" width=""]The term ‘Hippocratic’ is derived from the Hippocratic Oath, the most widely known of Greek medical texts. The Hippocratic Oath in literal terms requires a new physician to swear upon a number of healing gods that he will uphold a number of professional ethical standards.[/box] Ehmke explained the license in more detail in a post published on Sunday. In it, she highlights how the idea that writing software with the goals of clarity, conciseness, readability, performance, and elegance are limiting, and potentially dangerous.“All of these technologies are inherently political,” she writes. “There is no neutral political position in technology. You can’t build systems that can be weaponized against marginalized people and take no responsibility for them.”The concept of the Hippocratic license is relatively simple. In a tweet, Ehmke said that it “specifically prohibits the use of open-source software to harm others.” Open source software and the associated harm Out of the many privileges that open source software allows such as free redistribution of the software as well as the source code, the OSI also defines there is no discrimination against who uses it or where it will be put to use. A few days ago, a software engineer, Seth Vargo pulled his open-source software, Chef-Sugar, offline after finding out that Chef (a popular open source DevOps company using the software) had recently signed a contract selling $95,000-worth of licenses to the US Immigrations and Customs Enforcement (ICE), which has faced widespread condemnation for separating children from their parents at the U.S. border and other abuses. Vargo took down the Chef Sugar library from both GitHub and RubyGems, the main Ruby package repository, as a sign of protest. In May, this year, Mijente, an advocacy organization released documents stating that Palantir was responsible for the 2017 ICE operation that targeted and arrested family members of children crossing the border alone. Also, in May 2018, Amazon employees, in a letter to Jeff Bezos, protested against the sale of its facial recognition tech to Palantir where they “refuse to contribute to tools that violate human rights”, citing the mistreatment of refugees and immigrants by ICE. Also, in July, the WYNC revealed that Palantir’s mobile app FALCON was being used by ICE to carry out raids on immigrant communities as well as enable workplace raids in New York City in 2017. Founder of OSI responds to Ehmke’s Hippocratic License Bruce Perens, one of the founders of the Open Source movement in software, responded to Ehmke in a post titled “Sorry, Ms. Ehmke, The “Hippocratic License” Can’t Work” . “The software may not be used by individuals, corporations, governments, or other groups for systems or activities that actively and knowingly endanger harm, or otherwise threaten the physical, mental, economic, or general well-being of underprivileged individuals or groups,” he highlights in his post. “The terms are simply far more than could be enforced in a copyright license,” he further adds. “Nobody could enforce Ms. Ehmke’s license without harming someone, or at least threatening to do so. And it would be easy to make a case for that person being underprivileged,” he continued. He concluded saying that, though the terms mentioned in Ehmke’s license were unagreeable, he will “happily support Ms. Ehmke in pursuit of legal reforms meant to achieve the protection of underprivileged people.” Many have welcomed Ehmke's idea of an open source license with an ethical clause. However, the license is not OSI approved yet and chances are slim after Perens’ response. There are many users who do not agree with the license. Reaching a consensus will be hard. https://twitter.com/seannalexander/status/1175853429325008896 https://twitter.com/AdamFrisby/status/1175867432411336704 https://twitter.com/rishmishra/status/1175862512509685760 Even though developers host their source code on open source repositories, a license may bring certain level of restrictions on who is allowed to use the code. However, as Perens mentions, many of the terms in Ehmke’s license hard to implement. Irrespective of the outcome of this license’s approval process, Coraline Ehmke has widely opened up the topic of the need for long overdue FOSS licensing reforms in the open source community. It would be interesting to see if such a license would boost ethical reformation by giving more authority to the developers in imbibing their values and preventing the misuse of their software. Read the Hippocratic license to know more in detail. Other interesting news Tech ImageNet Roulette: New viral app trained using ImageNet exposes racial biases in artificial intelligent system Machine learning ethics: what you need to know and what you can do Facebook suspends tens of thousands of apps amid an ongoing investigation into how apps use personal data

0
0
25773

article-image-6-index-types-in-postgresql-10-you-should-know

Sugandha Lahoti

28 Feb 2018

13 min read

6 index types in PostgreSQL 10 you should know

Sugandha Lahoti

28 Feb 2018

13 min read

[box type="note" align="" class="" width=""]This article is an excerpt from a book Mastering PostgreSQL 10 written by Hans-Jürgen Schönig. This book will help you master the capabilities of PostgreSQL 10 to efficiently manage and maintain your database.[/box] In today’s post, we will learn about the different index types available for sorting in PostgreSQL and also understand how they function. What are index types and why you need them Data types can be sorted in a useful way. Just imagine a polygon. How would you sort these objects in a useful way? Sure, you can sort by the area covered, its length or so, but doing this won't allow you to actually find them using a geometric search. The solution to the problem is to provide more than just one index type. Each index will serve a special purpose and do exactly what is needed. The following six index types are available (as of PostgreSQL 10.0): test=# SELECT * FROM pg_am; amname | amhandler | amtype ---------+-------------+-------- btree | bthandler | i hash | hashhandler | i GiST | GiSThandler | i Gin | ginhandler | i spGiST | spghandler | i brin | brinhandler | i (6 rows) A closer look at the 6 index types in PostgreSQL 10 The following sections will outline the purpose of each index type available in PostgreSQL. Note that there are some extensions that can be used on top of what you can see here. Additional index types available on the web are rum, vodka, and in the future, cognac. Hash indexes Hash indexes have been around for many years. The idea is to hash the input value and store it for later lookups. Having hash indexes actually makes sense. However, before PostgreSQL 10.0, it was not advised to use hash indexes because PostgreSQL had no WAL support for them. In PostgreSQL 10.0, this has changed. Hash indexes are now fully logged and are therefore ready for replication and are considered to be a 100% crash safe. Hash indexes are generally a bit larger than b-tree indexes. Suppose you want to index 4 million integer values. A btree will need around 90 MB of storage to do this. A hash index will need around 125 MB on disk. The assumption made by many people that a hash is super small on the disk is therefore, in many cases, just wrong. GiST indexes Generalized Search Tree (GiST) indexes are highly important index types because they are used for a variety of different things. GiST indexes can be used to implement R-tree behavior and it is even possible to act as b-tree. However, abusing GiST for b-tree indexes is not recommended. Typical use cases for GiST are as follows: Range types Geometric indexes (for example, used by the highly popular PostGIS extension) Fuzzy searching Understanding how GiST works To many people, GiST is still a black box. We will now discuss how GiST works internally. Consider the following diagram: Source: http://leopard.in.ua/assets/images/postgresql/pg_indexes/pg_indexes2.jpg Take a look at the tree. You will see that R1 and R2 are on top. R1 and R2 are the bounding boxes containing everything else. R3, R4, and R5 are contained by R1. R8, R9, and R10 are contained by R3, and so on. A GiST index is therefore hierarchically organized. What you can see in the diagram is that some operations, which are not available in b-trees are supported. Some of those operations are overlaps, left of, right of, and so on. The layout of a GiST tree is ideal for geometric indexing. Extending GiST Of course, it is also possible to come up with your own operator classes. The following strategies are supported: Operation Strategy number Strictly left of 1 Does not extend to right of 2 Overlaps 3 Does not extend to left of 4 Strictly right of 5 Same 6 Contains 7 Contained by 8 Does not extend above 9 Strictly below 10 Strictly above 11 Does not extend below 12 If you want to write operator classes for GiST, a couple of support functions have to be provided. In the case of a b-tree, there is only the same function - GiST indexes provide a lot more: Function Description Support function number consistent The functions determine whether a key satisfies the query qualifier. Internally, strategies are looked up and checked. 1 union Calculate the union of a set of keys. In case of numeric values, simply the upper and lower values or a range are computed. It is especially important to geometries. 2 compress Compute a compressed representation of a key or value. 3 decompress This is the counterpart of the compress function. 4 penalty During insertion, the cost of inserting into the tree will be calculated. The cost determines where the new entry will go inside the tree. Therefore, a good penalty function is key to the good overall performance of the index. 5 picksplit Determines where to move entries in case of a page split. Some entries have to stay on the old page while others will go to the new page being created. Having a good picksplit function is essential to a good index performance. 6 equal The equal function is similar to the same function you have already seen in b-trees. 7 distance Calculates the distance (a number) between a key and the query value. The distance function is optional and is needed in case KNN search is supported. 8 fetch Determine the original representation of a compressed key. This function is needed to handle index only scans as supported by the recent version of PostgreSQL. 9 Implementing operator classes for GiST indexes is usually done in C. If you are interested in a good example, I advise you to check out the btree_GiST module in the contrib directory. It shows how to index standard data types using GiST and is a good source of information as well as inspiration. GIN indexes Generalized inverted (GIN) indexes are a good way to index text. Suppose you want to index a million text documents. A certain word may occur millions of times. In a normal b- tree, this would mean that the key is stored millions of times. Not so in a GIN. Each key (or word) is stored once and assigned to a document list. Keys are organized in a standard b- tree. Each entry will have a document list pointing to all entries in the table having the same key. A GIN index is very small and compact. However, it lacks an important feature found in the b-trees-sorted data. In a GIN, the list of item pointers associated with a certain key is sorted by the position of the row in the table and not by some arbitrary criteria. Extending GIN Just like any other index, GIN can be extended. The following strategies are available: Operation Strategy number Overlap 1 Contains 2 Is contained by 3 Equal 4 On top of this, the following support functions are available: Function Description Support function number compare The compare function is similar to the same function you have seen in b-trees. If two keys are compared, it returns -1 (lower), 0 (equal), or 1 (higher). 1 extractValue Extract keys from a value to be indexed. A value can have many keys. For example, a text value might consist of more than one word. 2 extractQuery Extract keys from a query condition. 3 consistent Check whether a value matches a query condition. 4 comparePartial Compare a partial key from a query and a key from the index. Returns -1, 0, or 1 (similar to the same function supported by b-trees). 5 triConsistent Determine whether a value matches a query condition (ternary variant). It is optional if the consistent function is present. 6 If you are looking for a good example of how to extend GIN, consider looking at the btree_gin module in the PostgreSQL contrib directory. It is a valuable source of information and a good way to start your own implementation. SP-GiST indexes Space partitioned GiST (SP-GiST) has mainly been designed for in-memory use. The reason for this is an SP-GiST stored on disk needs a fairly high number of disk hits to function. Disk hits are way more expensive than just following a couple of pointers in RAM. The beauty is that SP-GiST can be used to implement various types of trees such as quad- trees, k-d trees, and radix trees (tries). The following strategies are provided: Operation Strategy number Strictly left of 1 Strictly right of 5 Same 6 Contained by 8 Strictly below 10 Strictly above 11 To write your own operator classes for SP-GiST, a couple of functions have to be provided: Function Description Support function number config Provides information about the operator class in use 1 choose Figures out how to insert a new value into an inner tuple 2 picksplit Figures out how to partition/split a set of values 3 inner_consistent Determine which subpartitions need to be searched for a query 4 leaf_consistent Determine whether key satisfies the query qualifier 5 BRIN indexes Block range indexes (BRIN) are of great practical use. All indexes discussed until now need quite a lot of disk space. Although a lot of work has gone into shrinking GIN indexes and the like, they still need quite a lot because an index pointer is needed for each entry. So, if there are 10 million entries, there will be 10 million index pointers. Space is the main concern addressed by the BRIN indexes. A BRIN index does not keep an index entry for each tuple but will store the minimum and the maximum value of 128 (default) blocks of data (1 MB). The index is therefore very small but lossy. Scanning the index will return more data than we asked for. PostgreSQL has to filter out these additional rows in a later step. The following example demonstrates how small a BRIN index really is: test=# CREATE INDEX idx_brin ON t_test USING brin(id); CREATE INDEX test=# di+ idx_brin List of relations Schema | Name | Type | Owner | Table | Size --------+----------+-------+-------+--------+-------+------------- public | idx_brin | index | hs | t_test | 48 KB (1 row) In my example, the BRIN index is 2,000 times smaller than a standard b-tree. The question naturally arising now is, why don't we always use BRIN indexes? To answer this kind of question, it is important to reflect on the layout of BRIN; the minimum and maximum value for 1 MB are stored. If the data is sorted (high correlation), BRIN is pretty efficient because we can fetch 1 MB of data, scan it, and we are done. However, what if the data is shuffled? In this case, BRIN won't be able to exclude chunks of data anymore because it is very likely that something close to the overall high and the overall low is within 1 MB of data. Therefore, BRIN is mostly made for highly correlated data. In reality, correlated data is quite likely in data warehousing applications. Often, data is loaded every day and therefore dates can be highly correlated. Extending BRIN indexes BRIN supports the same strategies as a b-tree and therefore needs the same set of operators. The code can be reused nicely: Operation Strategy number Less than 1 Less than or equal 2 Equal 3 Greater than or equal 4 Greater than 5 The support functions needed by BRIN are as follows: Function Description Support function number opcInfo Provide internal information about the indexed columns 1 add_value Add an entry to an existing summary tuple 2 consistent Check whether a value matches a condition 3 union Calculate the union of two summary entries (minimum/maximum values) 4 Adding additional indexes Since PostgreSQL 9.6, there has been an easy way to deploy entirely new index types as extensions. This is pretty cool because if those index types provided by PostgreSQL are not enough, it is possible to add additional ones serving precisely your purpose. The instruction to do this is CREATE ACCESS METHOD: test=# h CREATE ACCESS METHOD Command: CREATE ACCESS METHOD Description: define a new access method Syntax: CREATE ACCESS METHOD name TYPE access_method_type HANDLER handler_function Don't worry too much about this command—just in case you ever deploy your own index type, it will come as a ready-to-use extension. One of these extensions implements bloom filters. Bloom filters are probabilistic data structures. They sometimes return too many rows but never too few. Therefore, a bloom filter is a good method to pre-filter data. How does it work? A bloom filter is defined on a couple of columns. A bitmask is calculated based on the input values, which is then compared to your query. The upside of a bloom filter is that you can index as many columns as you want. The downside is that the entire bloom filter has to be read. Of course, the bloom filter is smaller than the underlying data and so it is, in many cases, very beneficial. To use bloom filters, just activate the extension, which is a part of the PostgreSQL contrib package: test=# CREATE EXTENSION bloom; CREATE EXTENSION As stated previously, the idea behind a bloom filter is that it allows you to index as many columns as you want. In many real-world applications, the challenge is to index many columns without knowing which combinations the user will actually need at runtime. In the case of a large table, it is totally impossible to create standard b-tree indexes on, say, 80 fields or more. A bloom filter might be an alternative in this case: test=# CREATE TABLE t_bloom (x1 int, x2 int, x3 int, x4 int, x5 int, x6 int, x7 int); CREATE TABLE Creating the index is easy: test=# CREATE INDEX idx_bloom ON t_bloom USING bloom(x1, x2, x3, x4, x5, x6, x7); CREATE INDEX If sequential scans are turned off, the index can be seen in action: test=# SET enable_seqscan TO off; SET test=# explain SELECT * FROM t_bloom WHERE x5 = 9 AND x3 = 7; QUERY PLAN ------------------------------------------------------------------------- Bitmap Heap Scan on t_bloom (cost=18.50..22.52 rows=1 width=28) Recheck Cond: ((x3 = 7) AND (x5 = 9)) -> Bitmap Index Scan on idx_bloom (cost=0.00..18.50 rows=1 width=0) Index Cond: ((x3 = 7) AND (x5 = 9)) Note that I have queried a combination of random columns; they are not related to the actual order in the index. The bloom filter will still be beneficial. If you are interested in bloom filters, consider checking out the website: https://en.wikipedia.org/wiki/Bloom_filter. We learnt how to use the indexing features in PostgreSQL and fine-tune the performance of our queries. If you liked our article, check out the book Mastering PostgreSQL 10 to implement advanced administrative tasks such as server maintenance and monitoring, replication, recovery, high availability, etc in PostgreSQL 10.

0
0
25771

article-image-20-lessons-bias-machine-learning-systems-nips-2017

Aarthi Kumaraswamy

08 Dec 2017

9 min read

20 lessons on bias in machine learning systems by Kate Crawford at NIPS 2017

Aarthi Kumaraswamy

08 Dec 2017

9 min read

Kate Crawford is a Principal Researcher at Microsoft Research and a Distinguished Research Professor at New York University. She has spent the last decade studying the social implications of data systems, machine learning, and artificial intelligence. Her recent publications address data bias and fairness, and social impacts of artificial intelligence among others. This article attempts to bring our readers to Kate’s brilliant Keynote speech at NIPS 2017. It talks about different forms of bias in Machine Learning systems and the ways to tackle such problems. By the end of this article, we are sure you would want to listen to her complete talk on the NIPS Facebook page. All images in this article come from Kate's presentation slides and do not belong to us. The rise of Machine Learning is every bit as far reaching as the rise of computing itself. A vast new ecosystem of techniques and infrastructure are emerging in the field of machine learning and we are just beginning to learn their full capabilities. But with the exciting things that people can do, there are some really concerning problems arising. Forms of bias, stereotyping and unfair determination are being found in machine vision systems, object recognition models, and in natural language processing and word embeddings. High profile news stories about bias have been on the rise, from women being less likely to be shown high paying jobs to gender bias and object recognition datasets like MS COCO, to racial disparities in education AI systems. 20 lessons on bias in machine learning systems Interest in the study of bias in ML systems has grown exponentially in just the last 3 years. It has more than doubled in the last year alone. We are speaking different languages when we talk about bias. I.e., it means different things to different people/groups. Eg: in law, in machine learning, in geometry etc. Read more on this in the ‘What is bias?’ section below. In the simplest terms, for the purpose of understanding fairness in machine learning systems, we can consider ‘bias’ as a skew that produces a type of harm. Bias in MLaaS is harder to identify and also correct as we do not build them from scratch and are not always privy to how it works under the hood. Data is not neutral. Data cannot always be neutralized. There is no silver bullet for solving bias in ML & AI systems. There are two main kinds of harms caused by bias: Harms of allocation and harms of representation. The former takes an economically oriented view while the latter is more cultural. Allocative harm is when a system allocates or withholds certain groups an opportunity or resource. To know more, jump to the ‘harms of allocation’ section. When systems reinforce the subordination of certain groups along the lines of identity like race, class, gender etc., they cause representative harm. This is further elaborated in the ‘Harms of representation’ section. Harm can further be classified into five types: stereotyping, recognition, denigration, under-representation and ex-nomination. There are many technical approaches to dealing with the problem of bias in a training dataset such as scrubbing to neutral, demographic sampling etc among others. But they all still suffer from bias. Eg: who decides what is ‘neutral’. When we consider bias purely as a technical problem, which is hard enough, we are already missing part of the picture. Bias in systems is commonly caused by bias in training data. We can only gather data about the world we have which has a long history of discrimination. So, the default tendency of these systems would be to reflect our darkest biases. Structural bias is a social issue first and a technical issue second. If we are unable to consider both and see it as inherently socio-technical, then these problems of bias are going to continue to plague the ML field. Instead of just thinking about ML contributing to decision making in say hiring or criminal justice, we also need to think of the role of ML in the harmful representation of human identity. While technical responses to bias are very important and we need more of them, they won’t get us all the way to addressing representational harms to group identity. Representational harms often exceed the scope of individual technical interventions. Developing theoretical fixes that come from the tech world for allocational harms is necessary but not sufficient. The ability to move outside our disciplinary boundaries is paramount to cracking the problem of bias in ML systems. Every design decision has consequences and powerful social implications. Datasets reflect not only the culture but also the hierarchy of the world that they were made in. Our current datasets stand on the shoulder of older datasets building on earlier corpora. Classifications can be sticky and sometimes they stick around longer than we intend them to, even when they are harmful. ML can be deployed easily in contentious forms of categorization that could have serious repercussions. Eg: free-of-bias criminality detector that has Physiognomy at the heart of how it predicts the likelihood of a person being a criminal based on his appearance. What is bias? 14th century: an oblique or diagonal line 16th century: undue prejudice 20th century: systematic differences between the sample and a population In ML: underfitting (low variance and high bias) vs overfitting (high variance and low bias) In Law: judgments based on preconceived notions or prejudices as opposed to the impartial evaluation of facts. Impartiality underpins jury selection, due process, limitations placed on judges etc. Bias is hard to fix with model validation techniques alone. So you can have an unbiased system in an ML sense producing a biased result in a legal sense. Bias is a skew that produces a type of harm. Where does bias come from? Commonly from Training data. It can be incomplete, biased or otherwise skewed. It can draw from non-representative samples that are wholly defined before use. Sometimes it is not obvious because it was constructed in a non-transparent way. In addition to human labeling, other ways that human biases and cultural assumptions can creep in ending up in exclusion or overrepresentation of subpopulation. Case in point: stop-and-frisk program data used as training data by an ML system. This dataset was biased due to systemic racial discrimination in policing. Harms of allocation Majority of the literature understand bias as harms of allocation. Allocative harm is when a system allocates or withholds certain groups, an opportunity or resource. It is an economically oriented view primarily. Eg: who gets a mortgage, loan etc. Allocation is immediate, it is a time-bound moment of decision making. It is readily quantifiable. In other words, it raises questions of fairness and justice in discrete and specific transactions. Harms of representation It gets tricky when it comes to systems that represent society but don't allocate resources. These are representational harms. When systems reinforce the subordination of certain groups along the lines of identity like race, class, gender etc. It is a long-term process that affects attitudes and beliefs. It is harder to formalize and track. It is a diffused depiction of humans and society. It is at the root of all of the other forms of allocative harm. 5 types of allocative harms Source: Kate Crawford’s NIPS 2017 Keynote presentation: Trouble with Bias Stereotyping 2016 paper on word embedding that looked at Gender stereotypical associations and the distances between gender pronouns and occupations. Google translate swaps the genders of pronouns even in a gender-neutral language like Turkish Recognition When a group is erased or made invisible by a system In a narrow sense, it is purely a technical problem. i.e., does a system recognize a face inside an image or video? Failure to recognize someone’s humanity. In the broader sense, it is about respect, dignity, and personhood. The broader harm is whether the system works for you. Eg: system could not process darker skin tones, Nikon’s camera s/w mischaracterized Asian faces as blinking, HP's algorithms had difficulty recognizing anyone with a darker shade of pale. Denigration When people use culturally offensive or inappropriate labels Eg: autosuggestions when people typed ‘jews should’ Under-representation An image search of 'CEOs' yielded only one woman CEO at the bottom-most part of the page. The majority were white male. ex-nomination Technical responses to the problem of biases Improve accuracy Blacklist Scrub to neutral Demographics or equal representation Awareness Politics of classification Where did identity categories come from? What if bias is a deeper and more consistent issue with classification? Source: Kate Crawford’s NIPS 2017 Keynote presentation: Trouble with Bias The fact that bias issues keep creeping into our systems and manifesting in new ways, suggests that we must understand that classification is not simply a technical issue but a social issue as well. One that has real consequences for people that are being classified. There are two themes: Classification is always a product of its time We are currently in the biggest experiment of classification in human history Eg: labeled faces in the wild dataset has 77.5% men, and 83.5% white. An ML system trained on this dataset will work best for that group. What can we do to tackle these problems? Start working on fairness forensics Test our systems: eg: build pre-release trials to see how a system is working across different populations How do we track the life cycle of a training dataset to know who built it and what the demographics skews might be in that dataset Start taking interdisciplinarity seriously Working with people who are not in our field but have deep expertise in other areas Eg: FATE (Fairness Accountability Transparency Ethics) group at Microsoft Research Build spaces for collaboration like the AI now institute. Think harder on the ethics of classification The ultimate question for fairness in machine learning is this. Who is going to benefit from the system we are building? And who might be harmed?

0
0
25748

Packt

05 Feb 2016

29 min read

Working with Ceph Block Device

Packt

05 Feb 2016

29 min read

In this article by Karan Singh, the author of the book Ceph Cookbook, we will see how storage space or capacity are assigned to physical or virtual servers in detail. We'll also cover the various storage formats supported by Ceph. In this article, we will cover the following recipes: Working with the RADOS Block Device Configuring the Ceph client Creating RADOS Block Device Mapping RADOS Block Device Ceph RBD Resizing Working with RBD snapshots Working with RBD clones A quick look at OpenStack Ceph – the best match for OpenStack Configuring OpenStack as Ceph clients Configuring Glance for the Ceph backend Configuring Cinder for the Ceph backend Configuring Nova to attach the Ceph RBD Configuring Nova to boot the instance from the Ceph RBD (For more resources related to this topic, see here.) Once you have installed and configured your Ceph storage cluster, the next task is performing storage provisioning. Storage provisioning is the process of assigning storage space or capacity to physical or virtual servers, either in the form of blocks, files, or object storages. A typical computer system or server comes with a limited local storage capacity that might not be enough for your data storage needs. Storage solutions such as Ceph provide virtually unlimited storage capacity to these servers, making them capable of storing all your data and making sure that you do not run out of space. Using a dedicated storage system instead of local storage gives you the much needed flexibility in terms of scalability, reliability, and performance. Ceph can provision storage capacity in a unified way, which includes block, filesystem, and object storage. The following diagram shows storage formats supported by Ceph, and depending on your use case, you can select one or more storage options: We will discuss each of these options in detail in this article, and we will focus mainly on Ceph block storage. Working with the RADOS Block Device The RADOS Block Device (RBD), which is now known as the Ceph Block Device, provides reliable, distributed, and high performance block storage disks to clients. A RADOS block device makes use of the librbd library and stores a block of data in sequential form striped over multiple OSDs in a Ceph cluster. RBD is backed by the RADOS layer of Ceph, thus every block device is spread over multiple Ceph nodes, delivering high performance and excellent reliability. RBD has native support for Linux kernel, which means that RBD drivers are well integrated with the Linux kernel since the past few years. In addition to reliability and performance, RBD also provides enterprise features such as full and incremental snapshots, thin provisioning, copy on write cloning, dynamic resizing, and so on. RBD also supports In-Memory caching, which drastically improves its performance. The industry leading open source hypervisors, such as KVM and Zen, provide full support to RBD and leverage its features to their guest virtual machines. Other proprietary hypervisors, such as VMware and Microsoft HyperV will be supported very soon. There has been a lot of work going on in the community for support to these hypervisors. The Ceph block device provides full support to cloud platforms such as OpenStack, Cloud stack, as well as others. It has been proven successful and feature-rich for these cloud platforms. In OpenStack, you can use the Ceph block device with cinder (block) and glance (imaging) components. Doing so, you can spin 1000s of Virtual Machines (VMs) in very little time, taking advantage of the copy on write feature of the Ceph block storage. All these features make RBD an ideal candidate for cloud platforms such as OpenStack and CloudStack. We will now learn how to create a Ceph block device and make use of it. Configuring the Ceph client Any regular Linux host (RHEL- or Debian-based) can act as a Ceph client. The Client interacts with the Ceph storage cluster over the network to store or retrieve user data. Ceph RBD support has been added to the Linux mainline kernel, starting with 2.6.34 and later versions. How to do it As we have done earlier, we will set up a Ceph client machine using vagrant and VirtualBox. We will use the Vagrantfile. Vagrant will then launch an Ubuntu 14.04 virtual machine that we will configure as a Ceph client: From the directory where we have cloned ceph-cookbook git repository, launch the client virtual machine using Vagrant: $ vagrant status client-node1$ vagrant up client-node1 Log in to client-node1: $ vagrant ssh client-node1 Note: The username and password that vagrant uses to configure virtual machines is vagrant, and vagrant has sudo rights. The default password for root user is vagrant. Check OS and kernel release (this is optional): $ lsb_release -a$ uname -r Check for RBD support in the kernel: $ sudo modprobe rbd Allow the ceph-node1 monitor machine to access client-node1 over ssh. To do this, copy root ssh keys from the ceph-node1 to client-node1 vagrant user. Execute the following commands from the ceph-node1 machine until otherwise specified: ## Login to ceph-node1 machine $ vagrant ssh ceph-node1 $ sudo su - # ssh-copy-id vagrant@client-node1 Provide a one-time vagrant user password, that is, vagrant, for client-node1. Once the ssh keys are copied from ceph-node1 to client-node1, you should able to log in to client-node1 without a password. Use the ceph-deploy utility from ceph-node1 to install Ceph binaries on client-node1: # cd /etc/ceph # ceph-deploy --username vagrant install client-node1 Copy the Ceph configuration file (ceph.conf) to client-node1: # ceph-deploy --username vagrant config push client-node1 The client machine will require Ceph keys to access the Ceph cluster. Ceph creates a default user, client.admin, which has full access to the Ceph cluster. It's not recommended to share client.admin keys with client nodes. The better approach is to create a new Ceph user with separate keys and allow access to specific Ceph pools: In our case, we will create a Ceph user, client.rbd, with access to the rbd pool. By default, Ceph block devices are created on the rbd pool: # ceph auth get-or-create client.rbd mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=rbd' Add the key to the client-node1 machine for the client.rbd user: # ceph auth get-or-create client.rbd | ssh vagrant@client-node1 sudo tee /etc/ceph/ceph.client.rbd.keyring By this step, client-node1 should be ready to act as a Ceph client. Check the cluster status from the client-node1 machine by providing the username and secret key: $ vagrant ssh client-node1 $ sudo su - # cat /etc/ceph/ceph.client.rbd.keyring >> /etc/ceph/keyring### Since we are not using the default user client.admin we need to supply username that will connect to Ceph cluster.# ceph -s --name client.rbd Creating RADOS Block Device Till now, we have configured Ceph client, and now we will demonstrate creating a Ceph block device from the client-node1 machine. How to do it Create a RADOS Block Device named rbd1 of size 10240 MB: # rbd create rbd1 --size 10240 --name client.rbd There are multiple options that you can use to list RBD images: ## The default pool to store block device images is 'rbd', you can also specify the pool name with the rbd command using the -p option: # rbd ls --name client.rbd # rbd ls -p rbd --name client.rbd # rbd list --name client.rbd Check the details of the rbd image: # rbd --image rbd1 info --name client.rbd Mapping RADOS Block Device Now that we have created block device on Ceph cluster, in order to use this block device, we need to map it to the client machine. To do this, execute the following commands from the client-node1 machine. How to do it Map the block device to the client-node1: # rbd map --image rbd1 --name client.rbd Check the mapped block device: # rbd showmapped --name client.rbd To make use of this block device, we should create a filesystem on this and mount it: # fdisk -l /dev/rbd1 # mkfs.xfs /dev/rbd1 # mkdir /mnt/ceph-disk1 # mount /dev/rbd1 /mnt/ceph-disk1 # df -h /mnt/ceph-disk1 Test the block device by writing data to it: # dd if=/dev/zero of=/mnt/ceph-disk1/file1 count=100 bs=1M To map the block device across reboot, you should add the init-rbdmap script to the system startup, add the Ceph user and keyring details to /etc/ceph/rbdmap, and finally, update the /etc/fstab file: # wget https://raw.githubusercontent.com/ksingh7/ ceph-cookbook/master/rbdmap -O /etc/init.d/rbdmap # chmod +x /etc/init.d/rbdmap # update-rc.d rbdmap defaults ## Make sure you use correct keyring value in /etc/ceph/rbdmap file, which is generally unique for an environment. # echo "rbd/rbd1 id=rbd, keyring=AQCLEg5VeAbGARAAE4ULXC7M5Fwd3BGFDiHRTw==" >> /etc/ceph/rbdmap # echo "/dev/rbd1 /mnt/ceph-disk1 xfs defaults, _netdev 0 0 " >> /etc/fstab # mkdir /mnt/ceph-disk1 # /etc/init.d/rbdmap start Ceph RBD Resizing Ceph supports thin provisioned block devices, which means that the physical storage space will not get occupied until you begin storing data on the block device. The Ceph RADOS block device is very flexible; you can increase or decrease the size of an RBD on the fly from the Ceph storage end. However, the underlying filesystem should support resizing. Advance filesystems such as XFS, Btrfs, EXT, ZFS, and others support filesystem resizing to a certain extent. Please follow filesystem specific documentation to know more on resizing. How to do it To increase or decrease Ceph RBD image size, use the --size <New_Size_in_MB> option with the rbd resize command, this will set the new size for the RBD image: The original size of the RBD image that we created earlier was 10 GB. We will now increase its size to 20 GB: # rbd resize --image rbd1 --size 20480 --name client.rbd # rbd info --image rbd1 --name client.rbd Grow the filesystem so that we can make use of increased storage space. It's worth knowing that the filesystem resize is a feature of the OS as well as the device filesystem. You should read filesystem documentation before resizing any partition. The XFS filesystem supports online resizing. Check system message to know the filesystem size change: # dmesg | grep -i capacity # xfs_growfs -d /mnt/ceph-disk1 Working with RBD Snapshots Ceph extends full support to snapshots, which are point-in-time, read-only copies of an RBD image. You can preserve the state of a Ceph RBD image by creating snapshots and restoring the snapshot to get the original data. How to do it Let's see how a snapshot works with Ceph. To test the snapshot functionality of Ceph, let's create a file on the block device that we created earlier: # echo "Hello Ceph This is snapshot test" > /mnt/ ceph-disk1/snapshot_test_file Create a snapshot for the Ceph block device: Syntax: rbd snap create <pool-name>/<image-name>@<snap-name># rbd snap create rbd/rbd1@snapshot1 --name client.rbd To list snapshots of an image, use the following: Syntax: rbd snap ls <pool-name>/<image-name> # rbd snap ls rbd/rbd1 --name client.rbd To test the snapshot restore functionality of Ceph RBD, let's delete files from filesystem: # rm -f /mnt/ceph-disk1/* We will now restore the Ceph RBD snapshot to get back the files that deleted in the last step. Please note that a rollback operation will overwrite current the version of the RBD image and its data with the snapshot version. You should perform this operation carefully: Syntax: rbd snap rollback <pool-name>/<image-name>@<snap-name># rbd snap rollback rbd/rbd1@snapshot1 --name client.rbd Once the snapshot rollback operation is completed, remount the Ceph RBD filesystem to refresh the filesystem state. You should be able to get your deleted files back: # umount /mnt/ceph-disk1 # mount /dev/rbd1 /mnt/ceph-disk1 # ls -l /mnt/ceph-disk1 When you no longer need snapshots, you can remove a specific snapshot using the following syntax. Deleting the snapshot will not delete your current data on the Ceph RBD image: Syntax: rbd snap rm <pool-name>/<image-name>@<snap-name> # rbd snap rm rbd/rbd1@snapshot1 --name client.rbd If you have multiple snapshots of an RBD image, and you wish to delete all the snapshots with a single command, then use the purge sub command: Syntax: rbd snap purge <pool-name>/<image-name># rbd snap purge rbd/rbd1 --name client.rbd Working with RBD Clones Ceph supports a very nice feature for creating Copy-On-Write (COW) clones from RBD snapshots. This is also known as Snapshot Layering in Ceph. Layering allows clients to create multiple instant clones of Ceph RBD. This feature is extremely useful for cloud and virtualization platforms such as OpenStack, CloudStack, and Qemu/KVM, and so on. These platforms usually protect Ceph RBD images containing an OS / VM image in the form of a snapshot. Later, this snapshot is cloned multiple times to spawn new virtual machines / instances. Snapshots are read-only, but COW clones are fully writable; this feature of Ceph provides a greater level of flexibility and is extremely useful among cloud platforms: Every cloned image (child image) stores references of its parent snapshot to read image data. Hence, the parent snapshot should be protected before it can be used for cloning. At the time of data writing on the COW cloned image, it stores new data references to itself. COW cloned images are as good as RBD. They are quite flexible like RBD, which means that they are writable, resizable, and support snapshots and further cloning. In Ceph RBD, images are of two types: format-1 and format-2. The RBD snapshot feature is available on both types that is, in format-1 as well as in format-2 RBD images. However, the layering feature (the COW cloning feature) is available only for the RBD image with format-2. The default RBD image format is format-1. How to do it To demonstrate RBD cloning, we will intentionally create a format-2 RBD image, then create and protect its snapshot, and finally, create COW clones out of it: Create a format-2 RBD image and check its detail: # rbd create rbd2 --size 10240 --image-format 2 --name client.rbd # rbd info --image rbd2 --name client.rbd Create a snapshot of this RBD image: # rbd snap create rbd/rbd2@snapshot_for_cloning --name client.rbd To create a COW clone, protect the snapshot. This is an important step, we should protect the snapshot because if the snapshot gets deleted, all the attached COW clones will be destroyed: # rbd snap protect rbd/rbd2@snapshot_for_cloning --name client.rbd Next, we will create a cloned RBD image using this snapshot: Syntax: rbd clone <pool-name>/<parent-image>@<snap-name> <pool-name>/<child-image-name> # rbd clone rbd/rbd2@snapshot_for_cloning rbd/clone_rbd2 --name client.rbd Creating a clone is a quick process. Once it's completed, check new image information. You would notice that its parent pool, image, and snapshot information would be displayed: # rbd info rbd/clone_rbd2 --name client.rbd At this point, we have a cloned RBD image, which is dependent upon its parent image snapshot. To make the cloned RBD image independent of its parent, we need to flatten the image, which involves copying the data from the parent snapshot to the child image. The time it takes to complete the flattening process depends on the size of the data present in the parent snapshot. Once the flattening process is completed, there is no dependency between the cloned RBD image and its parent snapshot. To initiate the flattening process, use the following: # rbd flatten rbd/clone_rbd2 --name client.rbd # rbd info --image clone_rbd2 --name client.rbd After the completion of the flattening process, if you check image information, you will notice that the parent image/snapshot name is not present and the clone is independent. You can also remove the parent image snapshot if you no longer require it. Before removing the snapshot, you first have to unprotect it: # rbd snap unprotect rbd/rbd2@snapshot_for_cloning --name client.rbd Once the snapshot is unprotected, you can remove it: # rbd snap rm rbd/rbd2@snapshot_for_cloning --name client.rbd A quick look at OpenStack OpenStack is an open source software platform for building and managing public and private cloud infrastructure. It is being governed by an independent, non-profit foundation known as the OpenStack foundation. It has the largest and the most active community, which is backed by technology giants such as, HP, Red Hat, Dell, Cisco, IBM, Rackspace, and many more. OpenStack's idea for cloud is that it should be simple to implement and massively scalable. OpenStack is considered as the cloud operating system where users are allowed to instantly deploy hundreds of virtual machines in an automated way. It also provides an efficient way of hassle free management of these machines. OpenStack is known for its dynamic scale up, scale out, and distributed architecture capabilities, making your cloud environment robust and future-ready. OpenStack provides an enterprise class Infrastructure-as-a-service (IaaS) platform for all your cloud needs. As shown in the preceding diagram, OpenStack is made up of several different software components that work together to provide cloud services. Out of all these components, in this article, we will focus on Cinder and Glance, which provide block storage and image services respectively. For more information on OpenStack components, please visit http://www.openstack.org/. Ceph – the best match for OpenStack Since the last few years, OpenStack has been getting amazingly popular, as it's based on software defined on a wide range, whether it's computing, networking, or even storage. And when you talk storage for OpenStack, Ceph will get all the attraction. An OpenStack user survey, conducted in May 2015, showed Ceph dominating the block storage driver market with a whopping 44% production usage. Ceph provides a robust, reliable storage backend that OpenStack was looking for. Its seamless integration with OpenStack components such as cinder, glance, nova, and keystone provides all in one cloud storage backend for OpenStack. Here are some key benefits that make Ceph the best match for OpenStack: Ceph provides enterprise grade, feature rich storage backend at a very low cost per gigabyte, which helps to keep the OpenStack cloud deployment price down. Ceph is a unified storage solution for Block, File, or Object storage for OpenStack, allowing applications to use storage as they need. Ceph provides advance block storage capabilities for OpenStack clouds, which includes the easy and quick spawning of instances, as well as the backup and cloning of VMs. It provides default persistent volumes for OpenStack instances that can work like traditional servers, where data will not flush on rebooting the VMs. Ceph supports OpenStack in being host-independent by supporting VM migrations, scaling up storage components without affecting VMs. It provides the snapshot feature to OpenStack volumes, which can also be used as a means of backup. Ceph's copy-on-write cloning feature provides OpenStack to spin up several instances at once, which helps the provisioning mechanism function faster. Ceph supports rich APIs for both Swift and S3 Object storage interfaces. Ceph and OpenStack communities have been working closely since the last few years to make the integration more seamless, and to make use of new features as they are landed. In the future, we can expect that OpenStack and Ceph will be more closely associated due to Red Hat's acquisition of Inktank, the company behind Ceph; Red Hat is one of the major contributor of OpenStack project. OpenStack is a modular system, which is a system that has a unique component for a specific set of tasks. There are several components that require a reliable storage backend, such as Ceph, and extend full integration to it, as shown in the following diagram. Each of these components uses Ceph in their own way to store block devices and objects. The majority of cloud deployment based on OpenStack and Ceph use the Cinder, glance, and Swift integrations with Ceph. Keystone integration is used when you need an S3-compatible object storage on the Ceph backend. Nova integration allows boot from Ceph volume capabilities for your OpenStack instances. Setting up OpenStack The OpenStack setup and configuration is beyond the scope of this article; however, for ease of demonstration, we will use a virtual machine preinstalled with the OpenStack RDO Juno release. If you like, you can also use your own OpenStack environment and can perform Ceph integration. How to do it In this section, we will demonstrate setting up a preconfigured OpenStack environment using vagrant, and accessing it via CLI and GUI: Launch openstack-node1 using Vagrantfile. Make sure that you are on the host machine and are under the ceph-cookbook repository before bringing up openstack-node1 using vagrant: # cd ceph-cookbook # vagrant up openstack-node1 Once openstack-node1 is up, check the vagrant status and log in to the node: $ vagrant status openstack-node1 $ vagrant ssh openstack-node1 We assume that you have some knowledge of OpenStack and are aware of its operations. We will source the keystone_admin file, which has been placed under /root, and to do this, we need to switch to root: $ sudo su - $ source keystone_admin We will now run some native OpenStack commands to make sure that OpenStack is set up correctly. Please note that some of these commands do not show any information, since this is a fresh OpenStack environment and does not have instances or volumes created: # nova list # cinder list # glance image-list You can also log in to the OpenStack horizon web interface (https://192.168.1.111/dashboard) with the username as admin and password as vagrant. After logging in the Overview page opens: Configuring OpenStack as Ceph clients OpenStack nodes should be configured as Ceph clients in order to access the Ceph cluster. To do this, install Ceph packages on OpenStack nodes and make sure it can access the Ceph cluster. How to do it In this section, we are going to configure OpenStack as a Ceph client, which will be later used to configure cinder, glance, and nova: We will use ceph-node1 to install Ceph binaries on os-node1 using ceph-deploy. To do this, we should set up an ssh password-less login to os-node1. The root password is again the same (vagrant): $ vagrant ssh ceph-node1 $ sudo su - # ping os-node1 -c 1 # ssh-copy-id root@os-node1 Next, we will install Ceph packages to os-node1 using ceph-deploy: # cd /etc/ceph # ceph-deploy install os-node1 Push the Ceph configuration file, ceph.conf, from ceph-node1 to os-node1. This configuration file helps clients reach the Ceph monitor and OSD machines. Please note that you can also manually copy the ceph.conf file to os-node1 if you like: # ceph-deploy config push os-node1 Make sure that the ceph.conf file that we have pushed to os-node1 should have the permission of 644. Create Ceph pools for cinder, glance, and nova. You may use any available pool, but it's recommended that you create separate pools for OpenStack components: # ceph osd pool create images 128 # ceph osd pool create volumes 128 # ceph osd pool create vms 128 Set up client authentication by creating a new user for cinder and glance: # ceph auth get-or-create client.cinder mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rx pool=images' # ceph auth get-or-create client.glance mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=images' Add the keyrings to os-node1 and change their ownership: # ceph auth get-or-create client.glance | ssh os-node1 sudo tee /etc/ceph/ceph.client.glance.keyring # ssh os-node1 sudo chown glance:glance /etc/ceph/ceph.client.glance.keyring # ceph auth get-or-create client.cinder | ssh os-node1 sudo tee /etc/ceph/ceph.client.cinder.keyring # ssh os-node1 sudo chown cinder:cinder /etc/ceph/ceph.client.cinder.keyring The libvirt process requires accessing the Ceph cluster while attaching or detaching a block device from Cinder. We should create a temporary copy of the client.cinder key that will be needed for the cinder and nova configuration later in this article: # ceph auth get-key client.cinder | ssh os-node1 tee /etc/ceph/temp.client.cinder.key At this point, you can test the previous configuration by accessing the Ceph cluster from os-node1 using the client.glance and client.cinder Ceph users. Log in to os-node1 and run the following commands: $ vagrant ssh openstack-node1 $ sudo su - # cd /etc/ceph # ceph -s --name client.glance --keyring ceph.client.glance.keyring # ceph -s --name client.cinder --keyring ceph.client.cinder.keyring Finally, generate uuid, then create, define, and set the secret key to libvirt and remove temporary keys: Generate a uuid by using the following: # cd /etc/ceph # uuidgen Create a secret file and set this uuid number to it: cat > secret.xml <<EOF <secret ephemeral='no' private='no'> <uuid>bb90381e-a4c5-4db7-b410-3154c4af486e</uuid> <usage type='ceph'> <name>client.cinder secret</name> </usage> </secret> EOF Make sure that you use your own uuid generated for your environment./ Define the secret and keep the generated secret value safe. We would require this secret value in the next steps: # virsh secret-define --file secret.xml Set the secret value that was generated in the last step to virsh and delete temporary files. Deleting the temporary files is optional; it's done just to keep the system clean: # virsh secret-set-value --secret bb90381e-a4c5-4db7-b410-3154c4af486e --base64 $(cat temp.client.cinder.key) && rm temp.client.cinder.key secret.xml # virsh secret-list Configuring Glance for the Ceph backend We have completed the configuration required from the Ceph side. In this section, we will configure the OpenStack glance to use Ceph as a storage backend. How to do it This section talks about configuring the glance component of OpenStack to store virtual machine images on Ceph RBD: Log in to os-node1, which is our glance node, and edit /etc/glance/glance-api.conf for the following changes: Under the [DEFAULT] section, make sure that the following lines are present: default_store=rbd show_image_direct_url=True Execute the following command to verify entries: # cat /etc/glance/glance-api.conf | egrep -i "default_store|image_direct" Under the [glance_store] section, make sure that the following lines are present under RBD Store Options: stores = rbd rbd_store_ceph_conf=/etc/ceph/ceph.conf rbd_store_user=glance rbd_store_pool=images rbd_store_chunk_size=8 Execute the following command to verify the previous entries: # cat /etc/glance/glance-api.conf | egrep -v "#|default" | grep -i rbd Restart the OpenStack glance services: # service openstack-glance-api restart Source the keystone_admin file for OpenStack and list the glance images: # source /root/keystonerc_admin # glance image-list Download the cirros image from the Internet, which will later be stored in Ceph: # wget http://download.cirros-cloud.net/0.3.1/cirros-0.3.1-x86_64-disk.img Add a new glance image using the following command: # glance image-create --name cirros_image --is-public=true --disk-format=qcow2 --container-format=bare < cirros-0.3.1-x86_64-disk.img List the glance images using the following command; you will notice there are now two glance images: # glance image-list You can verify that the new image is stored in Ceph by querying the image ID in the Ceph images pool: # rados -p images ls --name client.glance --keyring /etc/ceph/ceph.client.glance.keyring | grep -i id Since we have configured glance to use Ceph for its default storage, all the glance images will now be stored in Ceph. You can also try creating images from the OpenStack horizon dashboard: Finally, we will try to launch an instance using the image that we have created earlier: # nova boot --flavor 1 --image b2d15e34-7712-4f1d-b48d-48b924e79b0c vm1 While you are adding new glance images or creating an instance from the glance image stored on Ceph, you can check the IO on the Ceph cluster by monitoring it using the # watch ceph -s command. Configuring Cinder for the Ceph backend The Cinder program of OpenStack provides block storage to virtual machines. In this section, we will configure OpenStack Cinder to use Ceph as a storage backend. OpenStack Cinder requires a driver to interact with the Ceph block device. On the OpenStack node, edit the /etc/cinder/cinder.conf configuration file by adding the code snippet given in the following section. How to do it In the last section, we learned to configure glance to use Ceph. In this section, we will learn to use the Ceph RBD with the Cinder service of OpenStack: Since in this demonstration we are not using multiple backend cinder configurations, comment the enabled_backends option from the /etc/cinder/cinder.conf file: Navigate to the Options defined in cinder.volume.drivers.rbd section of the /etc/cinder/cinder.conf file and add the following.(replace the secret uuid with your environments value): volume_driver = cinder.volume.drivers.rbd.RBDDriver rbd_pool = volumes rbd_user = cinder rbd_secret_uuid = bb90381e-a4c5-4db7-b410-3154c4af486e rbd_ceph_conf = /etc/ceph/ceph.conf rbd_flatten_volume_from_snapshot = false rbd_max_clone_depth = 5 rbd_store_chunk_size = 4 rados_connect_timeout = -1 glance_api_version = 2 Execute the following command to verify the previous entries: # cat /etc/cinder/cinder.conf | egrep "rbd|rados|version" | grep -v "#" Restart the OpenStack cinder services: # service openstack-cinder-volume restart Source the keystone_admin files for OpenStack: # source /root/keystonerc_admin # cinder list To test this configuration, create your first cinder volume of 2 GB, which should now be created on your Ceph cluster: # cinder create --display-name ceph-volume01 --display-description "Cinder volume on CEPH storage" 2 Check the volume by listing the cinder and Ceph volumes pool: # cinder list # rados -p volumes --name client.cinder --keyring ceph.client.cinder.keyring ls | grep -i id Similarly, try creating another volume using the OpenStack Horizon dashboard. Configuring Nova to attach the Ceph RBD In order to attach the Ceph RBD to OpenStack instances, we should configure the nova component of OpenStack by adding the rbd user and uuid information that it needs to connect to the Ceph cluster. To do this, we need to edit /etc/nova/nova.conf on the OpenStack node and perform the steps that are given in the following section. How to do it The cinder service that we configured in the last section creates volumes on Ceph, however, to attach these volumes to OpenStack instances, we need to configure NOVA: Navigate to the Options defined in nova.virt.libvirt.volume section and add the following lines of code (replace the secret uuid with your environments value): rbd_user=cinder rbd_secret_uuid= bb90381e-a4c5-4db7-b410-3154c4af486e Restart the OpenStack nova services: # service openstack-nova-compute restart To test this configuration, we will attach the cinder volume to an OpenStack instance. List the instance and volumes to get the ID: # nova list # cinder list Attach the volume to the instance: # nova volume-attach 1cadffc0-58b0-43fd-acc4-33764a02a0a6 1337c866-6ff7-4a56-bfe5-b0b80abcb281 # cinder list You can now use this volume as a regular block disk from your OpenStack instance: Configuring Nova to boot the instance from the Ceph RBD In order to boot all OpenStack instances into Ceph, that is, for the boot-from-volume feature, we should configure an ephemeral backend for nova. To do this, edit /etc/nova/nova.conf on the OpenStack node and perform the changes shown next. How to do it This section deals with configuring NOVA to store entire virtual machine on the Ceph RBD: Navigate to the [libvirt] section and add the following: inject_partition=-2 images_type=rbd images_rbd_pool=vms images_rbd_ceph_conf=/etc/ceph/ceph.conf Verify your changes: # cat /etc/nova/nova.conf|egrep "rbd|partition" | grep -v "#" Restart the OpenStack nova services: # service openstack-nova-compute restart To boot a virtual machine in Ceph, the glance image format must be RAW. We will use the same cirros image that we downloaded earlier in this article and convert this image from the QCOW to RAW format (this is important). You can also use any other image, as long as it's in the RAW format: # qemu-img convert -f qcow2 -O raw cirros-0.3.1-x86_64-disk.img cirros-0.3.1-x86_64-disk.raw Create a glance image using a RAW image: # glance image-create --name cirros_raw_image --is-public=true --disk-format=raw --container-format=bare < cirros-0.3.1-x86_64-disk.raw To test the boot from the Ceph volume feature, create a bootable volume: # nova image-list # cinder create --image-id ff8d9729-5505-4d2a-94ad-7154c6085c97 --display-name cirros-ceph-boot-volume 1 List cinder volumes to check if the bootable field is true: # cinder list Now, we have a bootable volume, which is stored on Ceph, so let's launch an instance with this volume: # nova boot --flavor 1 --block_device_mapping vda=fd56314b-e19b-4129-af77-e6adf229c536::0 --image 964bd077-7b43-46eb-8fe1-cd979a3370df vm2_on_ceph --block_device_mapping vda = <cinder bootable volume id >--image = <Glance image associated with the bootable volume> Finally, check the instance status: # nova list At this point, we have an instance running from a Ceph volume. Try to log in to the instance from the horizon dashboard: Summary In this article, we have covered the various storage formats supported by Ceph in detail and how they were assigned to other physical or virtual servers. Resources for Article: Further resources on this subject: Ceph Instant Deployment [article] GNU Octave: Data Analysis Examples [article] Interacting with GNU Octave: Operators [article]

0
0
25744

article-image-alarming-ways-governments-use-surveillance-tech

Neil Aitken

14 Jun 2018

12 min read

Alarming ways governments are using surveillance tech to watch you

Neil Aitken

14 Jun 2018

12 min read

Mapquest, part of the Verizon company, is the second largest provider of mapping services in the world, after Google Maps. It provides advanced cartography services to companies like Snap and PapaJohns pizza. The company is about to release an app that users can install on their smartphone. Their new application will record and transmit video images of what’s happening in front of your vehicle, as you travel. Data can be sent from any phone with a camera – using the most common of tools – a simple mobile data plan, for example. In exchange, you’ll get live traffic updates, among other things. Mapquest will use the video image data they gather to provide more accurate and up to date maps to their partners. The real world is changing all the time – roads get added, cities re-route traffic from time to time. The new AI based technology Mapquest employ could well improve the reliability of driverless cars, which have to engage with this ever changing landscape, in a safe manner. No-one disagrees with safety improvements. Mapquests solution is impressive technology. The fact that they can use AI to interpret the images they see and upload the information they receive to update maps is incredible. And, in this regard, the company is just one of the myriad daily news stories which excite and astound us. These stories do, however, often have another side to them which is rarely acknowledged. In the wrong hands, Mapquest’s solution could create a surveillance database which tracked people in real time. Surveillance technology involves the use of data and information products to capture details about individuals. The act of surveillance is usually undertaken with a view to achieving a goal. The principle is simple. The more ‘they’ know about you, the easier it will be to influence you towards their ends. Surveillance information can be used to find you, apprehend you or potentially, to change your mind, without even realising that you had been watched. Mapquest’s innovation is just a single example of surveillance technology in government hands which has expanded in capability far beyond what most people realise. Read also: What does the US government know about you? The truth beyond the Facebook scandal Facebook’s share price fell 14% in early 2018 as a result of public outcry related to the Cambridge Analytica announcements the company made. The idea that a private company had allowed detailed information about individuals to be provided to a third party without their consent appeared to genuinely shock and appall people. Technology tools like Mapquest’s tracking capabilities and Facebook’s profiling techniques are being taken and used by police forces and corporate entities around the world. The reality of current private and public surveillance capabilities is that facilities exist, and are in use, to collect and analyse data on most people in the developing world. The known limits of these services may surprise even those who are on the cutting edge of technology. There are so many examples from all over the world listed below that will genuinely make you want to consider going off grid! Innovative, Ingenious overlords: US companies have a flare for surveillance The US is the centre for information based technology companies. Much of what they develop is exported as well as used domestically. The police are using human genome matching to track down criminals and can find ‘any family in the country’ There have been 2 recent examples of police arresting a suspect after using human genome databases to investigate crimes. A growing number of private individuals have now used publicly available services such as 23andme to sequence their genome (DNA) either to investigate further their family tree, or to determine the potential of a pre-disposition to the gene based component of a disease. In one example, The Golden State Killer, an ex cop, was arrested 32 years after the last reported rape in a series of 45 (in addition to 12 murders) which occurred between 1976 and 1986. To track him down, police approached sites like 23andme with DNA found at crime scenes, established a family match and then progressed the investigation using conventional means. More than 12 million Americans have now used a genetic sequencing service and it is believed that investigators could find a family match for the DNA of anyone who has committed a crime in America. In simple terms, whether you want it or not, the law enforcement has the DNA of every individual in the country available to them. Domain Awareness Centers (DAC) bring the Truman Show to life The 400,000 Residents of Oakland, California discovered in 2012, that they had been the subject of an undisclosed mass surveillance project, by the local police force, for many years. Feeds from CCTV cameras installed in Oakland’s suburbs were augmented with weather information feeds, social media feeds and extracted email conversations, as well as a variety of other sources. The scheme began at Oakland’s port with Federal funding as part of a national response to the events of 9.11.2001 but was extended to cover the near half million residents of the city. Hundreds of additional video cameras were installed, along with gunshot recognition microphones and some of the other surveillance technologies provided in this article. The police force conducting the surveillance had no policy on what information was recorded or for how long it was kept. Internet connected toys spy on children The FBI has warned Americans that children’s toys connected to the internet ‘could put the privacy and safety of children at risk.' Children’s toy Hello Barbie was specifically admonished for poor privacy controls as part of the FBI’s press release. Internet connected toys could be used to record video of children at any point in the day or, conceivably, to relay a human voice, making it appear to the child that the toy was talking to them. Oracle suggest Google’s Android operating system routinely tracks users’ position even when maps are turned off In Australia, two American companies have been involved in a disagreement about the potential monitoring of Android phones. Oracle accused Google of monitoring users’ location (including altitude), even when mapping software is turned off on the device. The tracking is performed in the background of their phone. In Australia alone, Oracle suggested that Google’s monitoring could involve around 1GB of additional mobile data every month, costing users nearly half a billion dollars a year, collectively. Amazon facial recognition in real time helps US law enforcement services Amazon are providing facial recognition services which take a feed from public video cameras, to a number of US Police Forces. Amazon can match images taken in real time to a database containing ‘millions of faces.’ Are there any state or Federal rules in place to govern police facial recognition? Wired reported that there are ‘more or less none.’ Amazon’s scheme is a trial taking place in Florida. There are at least 2 other companies offering similar schemes in the US to law enforcement services. Big glass microphone can help agencies keep an ear on the ground Project ‘Big Glass Microphone’ uses the vibrations that the movements of cars (among other things) cause in the buried fiber optic telecommunications links. A successful test of the technology has been undertaken on the fiber optic cables which run underground on the Stanford University Campus, to record vehicle movements. Fiber optic links now make up the backbone of much data transport infrastructure - the way your phone and computer connect to the internet. Big glass microphone as it stands is the first step towards ‘invisible’ monitoring of people and their assets. It appears the FBI now have the ability to crack/access any phone Those in the know suggest that Apple’s iPhone is the most secure smart device against government surveillance. In 2016, this was put to the test. The Justice Department came into possession of an iPhone allegedly belonging to one of the San Bernadino shooters and ultimately sued Apple in an attempt to force the company to grant access to it, as part of their investigation. The case was ultimately dropped leading some to speculate that NAND mirroring techniques were used to gain access to the phone without Apple’s assistance, implying that even the most secure phones can now be accessed by authorities. Cornell University’s lie detecting algorithm Groundbreaking work by Cornell University will provide ‘at a distance’ access to information that previously required close personal access to an accused subject. Cornell’s solution interprets feeds from a number of video cameras on subjects and analyses the results to judge their heart rate. They believe the system can be used to determine if someone is lying from behind a screen. University of Southern California can anticipate social unrest with social media feeds Researchers at the University Of Southern California have developed an AI tool to study Social Media posts and determine whether those writing them are likely to cause Social Unrest. The software claims to have identified an association between both the volume of tweets written / the content of those tweets and protests turning physical. They can now offer advice to law enforcement on the likelihood of a protest turning violent so they can be properly prepared based on this information. The UK, an epicenter of AI progress, is not far behind in tracking people The UK has a similarly impressive array of tools at its disposal to watch the people that representatives of the country feels may be required. Given the close levels of cooperation between the UK and US governments, it is likely that many of these UK facilities are shared with the US and other NATO partners. Project stingray – fake cell phone/mobile phone ‘towers’ to intercept communications Stingray is a brand name for an IMSI (the unique identifier on a SIM card) tracker. They ‘spoof’ real towers, presenting themselves as the closest mobile phone tower. This ‘fools’ phones in to connecting to them. The technology has been used to spy on criminals in the UK but it is not just the UK government which use Stingray or its equivalents. The Washington Post reported in June 2018 that a number of domestically compiled intelligence reports suggest that foreign governments acting on US soil, including China and Russia, have been eavesdropping on the Whitehouse, using the same technology. UK developed Spyware is being used by authoritarian regimes Gamma International is a company based in Hampshire UK, which provided the (notably authoritarian) Egyptian government with a facility to install what was effectively spyware delivered with a virus on to computers in their country. Once installed, the software permitted the government to monitor private digital interactions, without the need to engage the phone company or ISP offering those services. Any internet based technology could be tracked, assisting in tracking down individuals who may have negative feelings about the Egyptian government. Individual arrested when his fingerprint was taken from a WhatsApp picture of his hand A Drug Dealer was pictured holding an assortment of pills in the UK two months ago. The image of his hand was used to extract an image of his fingerprint. From that, forensic scientists used by UK police, confirmed that officers had arrested the correct person and associated him with drugs. AI solutions to speed up evidence processing including scanning laptops and phones UK police forces are trying out AI software to speed up processing evidence from digital devices. A dozen departments around the UK are using software, called Cellebrite, which employs AI algorithms to search through data found on devices, including phones and laptops. Cellbrite can recognize images that contain child abuse, accepts feeds from multiple devices to see when multiple owners were in the same physical location at the same time and can read text from screenshots. Officers can even feed it photos of suspects to see if a picture of them show up on someone’s hard drive. China takes the surveillance biscuit and may show us a glimpse of the future There are 600 million mobile phone users in China, each producing a great deal of information about their users. China has a notorious record of human rights abuses and the ruling Communist Party takes a controlling interest (a board seat) in many of their largest technology companies, to ensure the work done is in the interest of the party as well as profitable for the corporate. As a result, China is on the front foot when it comes to both AI and surveillance technology. China’s surveillance tools could be a harbinger of the future in the Western world. Chinese cities will be run by a private company Alibaba, China’s equivalent of Amazon, already has control over the traffic lights in one Chinese city, Hangzhou. Alibaba is far from shy about it’s ambitions. It has 120,000 developers working on the problem and intends to commercialise and sell the data it gathers about citizens. The AI based product they’re using is called CityBrain. In the future, all Chinese cities could well all be run by AI from the Alibaba corporation the idea is to use this trial as a template for every city. The technology is likely to be placed in Kuala Lumpur next. In the areas under CityBrain’s control, traffic speeds have increased by 15% already. However, some of those observing the situation have expressed concerns not just about (the lack of) oversight on CityBrain’s current capabilities but the potential for future abuse. What to make of this incredible list of surveillance capabilities Facilities like Mapquest’s new mapping service are beguiling. They’re clever ideas which create a better works. Similar technology, however, behind the scenes, is being adopted by law enforcement bodies in an ever growing list of countries. Even for someone who understands cutting edge technology, the sum of those facilities may be surprising. Literally any aspect of your behaviour, from the way you walk, to your face, your heatmap and, of course, the contents of your phone and laptops can now be monitored. Law enforcement can access and review information feeds with Artificial Intelligence software, to process and summarise findings quickly. In some cases, this is being done without the need for a warrant. Concerningly, these advances seem to be coming without policy or, in many cases any form of oversight. We must change how we think about AI, urge AI founding fathers

0
0
25733

Packt

24 Sep 2015

9 min read

The Zombie Attacks!

Packt

24 Sep 2015

9 min read

In this article by Jamie Dean author of the book Unity Character Animation with Mecanim: RAW, we will demonstrate the process of importing and animating a rigged character in Unity. In this article, we will cover: Starting a blank Unity project and importing the necessary packages Importing a rigged character model in the FBX format and adjusting import settings Typically, an enemy character such as this will have a series of different animation sequences, which will be imported separately or together from a 3D package. In this case, our animation sequences are included in separate files. We will begin, by creating the Unity project. (For more resources related to this topic, see here.) Setting up the project Before we start exploring the animation workflow with Mecanim's tools, we need to set up the Unity project: Create a new project within Unity by navigating to File | New Project.... When prompted, choose an appropriate name and location for the project. In the Unity - Project Wizard dialog that appears, check the relevant boxes for the Character Controller.unityPackage and Scripts.unityPackage packages. Click on the Create button. It may take a few minutes for Unity to initialize. When the Unity interface appears, import the PACKT_cawm package by navigating to Assets | Import Package | Custom Package.... The Import package... window will appear. Navigate to the location where you unzipped the project files, select the unity package, and click on Open.The assets package will take a little time to decompress. When the Importing Package checklist appears, click on the Import button in the bottom-right of the window. Once the assets have finished importing, you will start with a default blank scene. Importing our enemy Now, it is time to import our character model: Minimize Unity. Navigate to the location where you unzipped the project files. Double-click on the Models folder to view its contents. Double-click on the zombie_m subfolder to view its contents.The folder contains an FBX file containing the rigged male zombie model and a separate subfolder containing the associated textures. Open Unity and resize the window so that both Unity and the zombie_m folder contents are visible. In Unity, click on the Assets folder in the Project panel. Drag the zombie_m FBX asset into the Assets panel to import it.Because the FBX file contains a normal map, a window will pop up asking if you want to set this file's import settings to read it correctly. Click on the Fix Now button. FBX files can contain embedded bitmap textures, which can be imported with the model. This will create subfolders containing the materials and textures within the folder where the model has been imported. Leaving the materials and textures as subfolders of the model will make them difficult to find within the project. The zombie model and two folders should now be visible in the FBX_Imports folder in the Assets panel. In the next step, we will move the imported material and texture assets into the appropriate folders in the Unity project. Organizing the material and textures The material and textures associated with the zombie_m model are currently located within the FBX_Imports folder. We will move these into different folders to organize them within the hierarchy of our project: Double-click on the Materials folder and drag the material asset contained within it into the PACKT_Materials folder in the Project panel. Return to the FBX_Imports folder by clicking on its title at the top of the Assets panel interface. Double-click on the textures folder. This will be named to be consistent with the model. Drag the two bitmap textures into the PACKT_Textures folder in the Project panel. Return to the FBX_Imports folder and delete the two empty subfolders.The moved material and textures will still be linked to the model. We will make sure of this by instancing it in the current empty scene. Drag the zombie_m asset into the Hierarchy panel. It may not be immediately visible within the Scene view due to the default import scale settings. We will take care of this in the next step. Adjusting the import scale Unity's import settings can be adjusted to account for the different tools commonly used to create 2D and 3D assets. Import settings are adjusted in the Inspector panel, which will appear on the right of the unity interface by default: Click on the zombie_m game object within the Hierarchy panel.This will bring up the file's import settings in the Inspector panel. Click on the Model tab. In the Scale Factor field, highlight the current number and type 1. The character model has been modeled to scale in meters to make it compatible with Unity's units. All 3D software applications have their own native scale. Unity does a pretty good job at accommodating all of them, but it often helps to know which software was used to create them. Scroll down until the Materials settings are visible. Uncheck the Import Materials checkbox.Now that we have got our textures and materials organized within the project, we want to make sure they are not continuously imported into the same folder as the model. Leave the remaining Model Import settings at their default values.We will be discussing these later on in the article, when we demonstrate the animation import. Click on the Apply button. You may need to scroll down within the Inspector panel to see this: The zombie_m character should now be visible in the Scene view: This character model is a medium resolution model—4410 triangles—and has a single 1024 x 1024 albedo texture and separate 1024 x 1024 specular and normal maps. The character has been rigged with a basic skeleton. The rigging process is essential if the model is to be animated. We need to save our progress, before we get any further: Save the scene by navigating to File | Save Scene as.... Choose an appropriate filename for the scene. Click on the Apply button. Despite the fact that we have only added a single game object to the default scene, there are more steps that we will need to take to set up the character and it will be convenient for us to save the current set up in case anything goes wrong. In the character animation, there are looping and single-shot animation sequences. Some animation sequences such as walk, run, idle are usually seamless loops designed to play back-to-back without the player being aware of where they start and end. Other sequences, typically, shooting, hitting, being injured or dying are often single-shot animations, which do not need to loop. We will start with this kind, and discuss looping animation sequences later in the article. In order to use Mecanim's animation tools, we need to set up the character's Avatar so that the character's hierarchy of bones is recognized and can be used correctly within Unity. Adjusting the rig import settings and creating the Avatar Now that we have imported the model, we will need to adjust the import settings so that the character functions correctly within our scene: Select zombie_m in the Assets panel. The asset's import settings should become visible within the Inspector panel. This settings rollout contains three tabs: Model, Rig, and Animations. Since we have already adjusted the Scale Factor within the Model Import settings, we will move on to the Rig import settings where we can define what kind of skeleton our character has. Choosing the appropriate rig import settings Mecanim has three options for importing rigged models: Legacy, Generic, and Humanoid. It also has a none option that should be applied to models that are not intended to be animated. Legacy format was previously the only option for importing skeletal animation in Unity. It is not possible to retarget animation sequences between models using Legacy, and setting up functioning state machines requires quite a bit of scripting. It is still a useful tool for importing models with fewer animation sequences and for simple mechanical animations. Legacy format animations are not compatible with Mecanim. Generic is one of the new animation formats that are compatible with Mecanim's animator controllers. It does not have the full functionality of Mecanim's character animation tools. Animations sequences imported with the generic format cannot be retargeted and are best used for quadrupeds, mechanical devices, pretty much anything except a character with two arms and two legs. The Humanoid animation type allows the full use of Mecanim's powerful toolset. It requires a minimum of 15 bones, and assumes that your rig is roughly human shaped with a pair of arms and legs. It can accommodate many more intermediary joints and some basic facial animation. One of the greatest benefits of using the Humanoid type is that it allows animation sequences to be retargeted or adapted to work with different rigs. For instance, you may have a detailed player character model with a full skeletal rig (including fingers and toes joints), maybe you want to reuse this character's idle sequence with a background character that is much less detailed, and has a simpler arrangement of bones. Mecanim makes it possible reuse purpose built motion sequences and even create useable sequences from motion capture data. Now that we have introduced these three rig types, we need to choose the appropriate setting for our imported zombie character, which in this case is Humanoid: In the Inspector panel, click on the Rig tab. Set the Animation Type field to Humanoid to suit our character skeleton type. Leave Avatar Definition set to Create From This Model. Optimize Game Objects can be left checked. Click on the Apply button to save the settings and transfer all of the changes that you have made to the instance in the scene. The Humanoid animation type is the only one that supports retargeting. So if you are importing animations that are not unique and will be used for multiple characters, it is a good idea to use this setting. Summary In this article, we covered the major steps involved in animating a premade character using the Mecanim system in Unity. We started with FBX import settings for the model and the rig. We covered the creation of the Avatar by defining the bones in the Avatar Definition settings. Resources for Article: Further resources on this subject: Adding Animations[article] 2D Twin-stick Shooter[article] Skinning a character [article]

0
0
25732

article-image-spring-security-3-tips-and-tricks

Packt

28 Feb 2011

6 min read

Spring Security 3: Tips and Tricks

Packt

28 Feb 2011

6 min read

Spring Security 3 Make your web applications impenetrable. Implement authentication and authorization of users. Integrate Spring Security 3 with common external security providers. Packed full with concrete, simple, and concise examples. It's a good idea to change the default value of the spring_security_login page URL. Tip: Not only would the resulting URL be more user- or search-engine friendly, it'll disguise the fact that you're using Spring Security as your security implementation. Obscuring Spring Security in this way could make it harder for malicious hackers to find holes in your site in the unlikely event that a security hole is discovered in Spring Security. Although security through obscurity does not reduce your application's vulnerability, it does make it harder for standardized hacking tools to determine what types of vulnerabilities you may be susceptible to. Evaluating authorization rules Tip: For any given URL request, Spring Security evaluates authorization rules in top to bottom order. The first rule matching the URL pattern will be applied. Typically, this means that your authorization rules will be ordered starting from most-specific to least-specific order. It's important to remember this when developing complicated rule sets, as developers can often get confused over which authorization rule takes effect. Just remember the top to bottom order, and you can easily find the correct rule in any scenario! Using the JSTL URL tag to handle relative URLs Tip: : Use the JSTL core library's url tag to ensure that URLs you provide in your JSP pages resolve correctly in the context of your deployed web application. The url tag will resolve URLs provided as relative URLs (starting with a /) to the root of the web application. You may have seen other techniques to do this using JSP expression code (<%=request.getContextPath() %>), but the JSTL url tag allows you to avoid inline code! Modifying username or password and the remember me Feature Tip: You have anticipated that if the user changes their username or password, any remember me tokens set will no longer be valid. Make sure that you provide appropriate messaging to users if you allow them to change these bits of their account. Configuration of remember me session cookies Tip: If token-validity-seconds is set to -1, the login cookie will be set to a session cookie, which does not persist after the user closes their browser. The token will be valid (assuming the user doesn't close their browser) for a non-configurable length of 2 weeks. Don't confuse this with the cookie that stores your user's session ID—they're two different things with similar names! Checking Full Authentication without Expressions Tip: If your application does not use SpEL expressions for access declarations, you can still check if the user is fully authenticated by using the IS_ AUTHENTICATED_FULLY access rule (For example, .access="IS_AUTHENTICATED_FULLY"). Be aware, however, that standard role access declarations aren't as expressive as SpEL ones, so you will have trouble handling complex boolean expressions. Debugging remember me cookies Tip: There are two difficulties when attempting to debug issues with remember me cookies. The first is getting the cookie value at all! Spring Security doesn't offer any log level that will log the cookie value that was set. We'd suggest a browser-based tool such as Chris Pederick's Web Developer plug-in (http://chrispederick.com/work/web-developer/) for Mozilla Firefox. Browser-based development tools typically allow selective examination (and even editing) of cookie values. The second (admittedly minor) difficulty is decoding the cookie value. You can feed the cookie value into an online or offline Base64 decoder (remember to add a trailing = sign to make it a valid Base64-encoded string!) Making effective use of an in-memory UserDetailsService Tip: A very common scenario for the use of an in-memory UserDetailsService and hard-coded user lists is the authoring of unit tests for secured components. Unit test authors often code or configure the minimal context to test the functionality of the component under test. Using an in-memory UserDetailsService with a well-defined set of users and GrantedAuthority values provides the test author with an easily controlled test environment. Storing sensitive information Tip: Many guidelines that apply to storage of passwords apply equally to other types of sensitive information, including social security numbers and credit card information (although, depending on the application, some of these may require the ability to decrypt). It's quite common for databases storing this type of information to represent it in multiple ways, for example, a customer's full 16-digit credit card number would be stored in a highly encrypted form, but the last four digits might be stored in cleartext (for reference, think of any internet commerce site that displays XXXX XXXX XXXX 1234 to help you identify your stored credit cards). Annotations at the class level Tip: Be aware that the method-level security annotations can also be applied at the class level as well! Method-level annotations, if supplied, will always override annotations specified at the class level. This can be helpful if your business needs dictate specification of security policies for an entire class at a time. Take care to use this functionality in conjunction with good comments and coding standards, so that developers are very clear about the security characteristics of a class and its methods. Authenticating the user against LDAP Tip: Do not make the very common mistake of configuring an <authentication-provider> with a user-details-service-ref referring to an LdapUserDetailsService, if you are intending to authenticate the user against LDAP itself! Externalize URLs and environment-dependent settings Tip: Coding URLs into Spring configuration files is a bad idea. Typically, storage and consistent reference to URLs is pulled out into a separate properties file, with placeholders consistent with the Spring PropertyPlaceholderConfigurer. This allows for reconfiguration of environment-specific settings via externalizable properties files without touching the Spring configuration files, and is generally considered good practice. Summary In this article we took a look at some of the tips and tricks for Spring Security. Further resources on this subject: Spring Security 3 [Book] Migration to Spring Security 3 [Article] Opening up to OpenID with Spring Security [Article] Spring Security: Configuring Secure Passwords [Article]

0
0
25714

Packt

20 Jun 2017

14 min read

CORS in Node.js

Packt

20 Jun 2017

14 min read

0
0
25710

article-image-knowing-sql-injection-attacks-and-securing-our-android-applications-them

Packt

20 Dec 2013

10 min read

Knowing the SQL-injection attacks and securing our Android applications from them

Packt

20 Dec 2013

10 min read

(For more resources related to this topic, see here.) Enumerating SQL-injection vulnerable content providers Just like web applications, Android applications may use untrusted input to construct SQL queries and do so in a way that's exploitable. The most common case is when applications do not sanitize input for any SQL and do not limit access to content providers. Why would you want to stop a SQL-injection attack? Well, let's say you're in the classic situation of trying to authorize users by comparing a username supplied by querying a database for it. The code would look similar to the following: public boolean isValidUser(){ u_username = EditText( some user value ); u_password = EditText( some user value ); //some un-important code here... String query = "select * from users_table where username = '" + u_username + "' and password = '" + u_password +"'"; SQLiteDatabase db //some un-important code here... Cursor c = db.rawQuery( p_query, null ); return c.getCount() != 0; } What's the problem in the previous code? Well, what happens when the user supplies a password '' or '1'='1'? The query being passed to the database then looks like the following: select * from users_table where username = '" + u_username + "' and password = '' or '1'='1' " The preceding bold characters indicate the part that was supplied by the user; this query forms what's known in Boolean algebra as a logical tautology; meaning no matter what table or data the query is targeted at, it will always be set to true, which means that all the rows in the database will meet the selection criteria. This then means that all the rows in users_table will be returned and as result, even if a nonvalid password ' or '1'=' is supplied, the c.getCount() call will always return a nonzero count, leading to an authentication bypass! Given that not many Android developers would use the rawQuery call unless they need to pull off some really messy SQL queries, I've included another code snippet of a SQL-injection vulnerability that occurs more often in real-world applications. So when auditing Android code for injection vulnerabilities, a good idea would be to look for something that resembles the following: public Cursor query(Uri uri, String[] projection , String selection ,String[] selectionArgs , String sortOrder ) { SQLiteDBHelper sdbh = new StatementDBHelper(this.getContext()); Cursor cursor; try { //some code has been omitted cursor = sdbh .query(projection,selection,selectionArgs,sortOrder); } finally { sdbh.close(); } return cursor; } In the previous code, none of the projection, selection, selectionArgs, or sortOrder variables are sourced directly from external applications. If the content provider is exported and grants URI permissions or, as we've seem before, does not require any permissions, it means that attackers will be able to inject arbitrary SQL to augment the way the malicious query is evaluated. Let's look at how you actually go about attacking SQL-injection vulnerable content providers using drozer. How to do it... In this recipe, I'll talk about two kinds of SQL-injection vulnerabilities: one is when the select clause of a SQL statement is injectable and the other is when the projection is injectable. Using drozer, it is pretty easy to find select-clause-injectable content providers: dz> run app.provider.query [URI] –-selection "1=1" The previous will try to inject what's called a logical tautology into the SQL statement being parsed by the content provider and eventually the database query parser. Due to the nature of the module being used here, you can tell whether or not it actually worked, because it should return all the data from the database; that is, the select-clause criteria is applied to every row and because it will always return true, every row will be returned! You could also try any values that would always be true: dz> run app.provider.query [URI] –-selection "1-1=0" dz> run app.provider.query [URI] –-selection "0=0" dz> run app.provider.query [URI] –-selection "(1+random())*10 > 1" The following is an example of using a purposely vulnerable content provider: dz> run app.provider.query content://com.example. vulnerabledatabase.contentprovider/statements –-selection "1=1" It returns the entire table being queried, which is shown in the following screenshot: You can, of course, inject into the projection of the SELECT statement, that is, the part before FROM in the statement, that is, SELECT [projection] FROM [table] WHERE [select clause]. Securing application components Application components can be secured both by making proper use of the AndroidManifest.xml file and by forcing permission checks at code level. These two factors of application security make the permissions framework quite flexible and allow you to limit the number of applications accessing your components in quite a granular way. There are many measures that you can take to lock down access to your components, but what you should do before anything else is make sure you understand the purpose of your component, why you need to protect it, and what kind of risks your users face should a malicious application start firing off intents to your app and accessing its data. This is called a risk-based approach to security, and it is suggested that you first answer these questions honestly before configuring your AndroidManifest.xml file and adding permission checks to your apps. In this recipe, I have detailed some of the measures that you can take to protect generic components, whether they are activities, broadcast receivers, content providers, or services. How to do it... To start off, we need to review your Android application AndroidManifest.xml file. The android:exported attribute defines whether a component can be invoked by other applications. If any of your application components do not need to be invoked by other applications or need to be explicitly shielded from interaction with the components on the rest of the Android system—other than components internal to your application—you should add the following attribute to the application component's XML element: <[component name] android_exported="false"> </[component name]> Here the [component name] would either be an activity, provider, service, or receiver. How it works… Enforcing permissions via the AndroidManifest.xml file means different things to each of the application component types. This is because of the various inter-process communications ( IPC ) mechanisms that can be used to interact with them. For every application component, the android:permission attribute does the following: Activity : Limits the application components which are external to your application that can successfully call startActivity or startActivityForResult to those with the required permission Service : Limits the external application components that can bind (by calling bindService()) or start (by calling startService()) the service to those with the specified permission Receiver : Limits the number of external application components that can send broadcasted intents to the receiver with the specified permission Provider : Limits access to data that is made accessible via the content provider The android:permission attribute of each of the component XML elements overrides the <application> element's android:permission attribute. This means that if you haven't specified any required permissions for your components and have specified one in the <application> element, it will apply to all of the components contained in it. Though specifying permissions via the <application> element is not something developers do too often because of how it affects the friendliness of the components toward the Android system itself (that is, if you override an activity's required permissions using the <application> element), the home launcher will not be able to start your activity. That being said, if you are paranoid enough and don't need any unauthorized interaction to happen with your application or its components, you should make use of the android:permission attribute of the <application> tag. When you define an <intent-filter> element on a component, it will automatically be exported unless you explicitly set exported="false". However, this seemed to be a lesser-known fact, as many developers were inadvertently opening their content providers to other applications. So, Google responded by changing the default behavior for <provider> in Android 4.2. If you set either android:minSdkVersion or android:targetSdkVersion to 17, the exported attribute on <provider> will default to false. Defending against the SQL-injection attack The previous chapter covered some of the common attacks against content providers, one of them being the infamous SQL-injection attack. This attack leverages the fact that adversaries are capable of supplying SQL statements or SQL-related syntax as part of their selection arguments, projections, or any component of a valid SQL statement. This allows them to extract more information from a content provider than they are not authorized. The best way to make sure adversaries will not be able to inject unsolicited SQL syntax into your queries is to avoid using SQLiteDatabase.rawQuery() instead opting for a parameterized statement. Using a compiled statement, such as SQLiteStatement, offers both binding and escaping of arguments to defend against SQL-injection attacks. Also, there is a performance benefit due to the fact the database does not need to parse the statement for each execution. An alternative to SQLiteStatement is to use the query, insert, update, and delete methods on SQLiteDatabase as they offer parameterized statements via their use of string arrays. When we describe parameterized statement, we are describing an SQL statement with a question mark where values will be inserted or binded. Here's an example of parameterized SQL insert statement: INSERT VALUES INTO [table name] (?,?,?,?,...) Here [table name] would be the name of the relevant table in which values have to be inserted. How to do it... For this example, we are using a simple Data Access Object ( DAO ) pattern, where all of the database operations for RSS items are contained within the RssItemDAO class: When we instantiate RssItemDAO, we compile the insertStatement object with a parameterized SQL insert statement string. This needs to be done only once and can be re-used for multiple inserts: public class RssItemDAO { private SQLiteDatabase db; private SQLiteStatement insertStatement; private static String COL_TITLE = "title"; private static String TABLE_NAME = "RSS_ITEMS"; private static String INSERT_SQL = "insert into " + TABLE_NAME + " (content, link, title) values (?,?,?)"; public RssItemDAO(SQLiteDatabase db) { this.db = db; insertStatement = db.compileStatement(INSERT_SQL); } The order of the columns noted in the INSERT_SQL variable is important, as it directly maps to the index when binding values. In the preceding example, content maps to index 0, link maps to index 1, and title to index 2. Now, when we come to insert a new RssItem object to the database, we bind each of the properties in the order they appear in the statement: public long save(RssItem item) { insertStatement.bindString(1, item.getContent()); insertStatement.bindString(2, item.getLink()); insertStatement.bindString(3, item.getTitle()); return insertStatement.executeInsert(); } Notice that we call executeInsert, a helper method that returns the ID of the newly created row. It's as simple as that to use a SQLiteStatement statement. This shows how to use SQLiteDatabase.query to fetch RssItems that match a given search term: public List<RssItem> fetchRssItemsByTitle(String searchTerm) { Cursor cursor = db.query(TABLE_NAME, null, COL_TITLE + "LIKE ?", new String[] { "%" + searchTerm + "%" }, null, null, null); // process cursor into list List<RssItem> rssItems = new ArrayList<RssItemDAO.RssItem>(); cursor.moveToFirst(); while (!cursor.isAfterLast()) { // maps cursor columns of RssItem properties RssItem item = cursorToRssItem(cursor); rssItems.add(item); cursor.moveToNext(); } return rssItems; } We use LIKE and the SQL wildcard syntax to match any part of the text with a title column. Summary There were a lot of technical details in this article. Firstly, we learned about the components that are vulnerable to SQL-injection attacks. We then figured out how to secure our Android applications from the exploitation attacks. Finally, we learned how to defend our applications from the SQL-injection attacks. Resources for Article: Further resources on this subject: Android Native Application API [Article] So, what is Spring for Android? [Article] Creating Dynamic UI with Android Fragments [Article]

0
0
25689

article-image-lighting-camera-effects-unity-2018

Amarabha Banerjee

04 May 2018

12 min read

Implementing lighting & camera effects in Unity 2018

Amarabha Banerjee

04 May 2018

12 min read

Today, we will explore lighting & camera effects in Unity 2018. We will start with cameras to include perspectives, frustums, and Skyboxes. Next, we will learn a few uses of multiple cameras to include mini-maps. We will also cover the different types of lighting, explore reflection probes, and conclude with a look at shadows. Working with cameras Cameras render scenes so that the user can view them. Think about the hidden complexity of this statement. Our games are 3D, but people playing our games view them on 2D displays such as television, computer monitors, or mobile devices. Fortunately for us, Unity makes implementing cameras easy work. Cameras are GameObjects and can be edited using transform tools in the Scene view as well as in the Inspector panel. Every scene must have at least one camera. In fact, when a new scene is created, Unity creates a camera named Main Camera. As you will see later in this chapter, a scene can have multiple cameras. In the Scene view, cameras are indicated with a white camera silhouette, as shown in the following screenshot: When we click our Main Camera in the Hierarchy panel, we are provided with a Camera Preview in the Scene view. This gives us a preview of what the camera sees as if it were in game mode. We also have access to several parameters via the Inspector panel. The Camera component in the Inspector panel is shown here: Let's look at each of these parameters with relation to our Cucumber Beetle game: The Clear Flags parameter lets you switch between Skybox, Solid Color, Depth Only, and Don't Clear. The selection here informs Unity which parts of the screen to clear. We will leave this setting as Skybox. You will learn more about Skyboxes later in this chapter. The Background parameter is used to set the default background fill (color) of your game world. This will only be visible after all game objects have been rendered and if there is no Skybox. Our Cucumber Beetle game will have a Skybox, so this parameter can be left with the default color. The Culling Mask parameter allows you to select and deselect the layers you want the camera to render. The default selection options are Nothing, Everything, Default, TransparentFX, Ignore Raycast, Water, and UI. For our game, we will select Everything. If you are not sure which layer a game object is associated with, select it and look at the Layer parameter in the top section of the Inspector panel. There you will see the assigned layer. You can easily change the layer as well as create your own unique layers. This gives you finite rendering control. The Projection parameter allows you to select which projection, perspective or orthographic, you want for your camera. We will cover both of those projections later in this chapter. When perspective projection is selected, we are given access to the Field of View parameter. This is for the width of the camera's angle of view. The value range is 1-179°. You can use the slider to change the values and see the results in the Camera Preview window. When orthographic projection is selected, an additional Size parameter is available. This refers to the viewport size. For our game, we will select perspective projection with the Field of View set to 60. The Clipping Planes parameters include Near and Far. These settings set the closest and furthest points, relative to the camera, that rendering will happen at. For now, we will leave the default settings of 0.3 and 1000 for the Near and Far parameters, respectively. The Viewport Rect parameter has four components – X, Y, W, and H – that determine where the camera will be drawn on the screen. As you would expect, the X and Y components refer to horizontal and vertical positions, and the W and H components refer to width and height. You can experiment with these values and see the changes in the Camera Preview. For our game, we will leave the default settings. The Depth parameter is used when we implement more than one camera. We can set a value here to determine the camera's priority in relation to others. Larger values indicate a higher priority. The default setting is sufficient for our game. The Rendering Path parameter defines what rendering methods our camera will use. The options are Use Graphics Settings, Forward, Deferred, Legacy Vertex Lit, and Legacy Deferred (light prepass). We will use the Use Graphics Settings option for our game, which also uses the default setting. The Target Texture parameter is not something we will use in our game. When a render texture is set, the camera is not able to render to the screen. The Occlusion Culling parameter is a powerful setting. If enabled, Unity will not render objects that are occluded, or not seen by the camera. An example would be objects inside a building. If the camera can currently only see the external walls of the building, then none of the objects inside those walls can be seen. So, it makes sense to not render those. We only want to render what is absolutely necessary to help ensure our game has smooth gameplay and no lag. We will leave this as enabled for our game. The Allow HDR parameter is a checkbox that toggles a camera's High Dynamic Range (HDR) rendering. We will leave the default setting of enabled for our game. The Allow MSAA parameter is a toggle that determines whether our camera will use a Multisample Anti-Aliasing (MSAA) render target. MSAA is a computer graphics optimization technique and we want this enabled for our game. Understanding camera projections There are two camera projections used in Unity: perspective and orthographic. With perspective projection, the camera renders a scene based on the camera angle, as it exists in the scene. Using this projection, the further away an object is from the camera, the smaller it will be displayed. This mimics how we see things in the real world. Because of the desire to produce realistic games, or games that approximate the realworld, perspective projection is the most commonly used in modern games. It is also what we will use in our Cucumber Beetle game. The other projection is orthographic. An orthographic perspective camera renders a scene uniformly without any perspective. This means that objects further away will not be displayed smaller than objects closer to the camera. This type of camera is commonly used for top-down games and is the default camera projection used in 2D and Unity's UI system. We will use perspective projection for our Cucumber Beetle game. Orientating your frustum When a camera is selected in the Hierarchy view, its frustum is visible in the Scene view. A frustum is a geometric shape that looks like a pyramid that has had its top cut off, as illustrated here: The near, or top, plane is parallel to its base. The base is also referred to as the far plane. The frustum's shape represents the viable region of your game. Only objects in that region are rendered. Using the camera object in Scene view, we can change our camera's frustum shape. Creating a Skybox When we create game worlds, we typically create the ground, buildings, characters, trees, and other game objects. What about the sky? By default, there will be a textured blue sky in your Unity game projects. That sky is sufficient for testing but does not add to an immersive gaming experience. We want a bit more realism, such as clouds, and that can be accomplished by creating a Skybox. A Skybox is a six-sided cube visible to the player beyond all other objects. So, when a player looks beyond your objects, what they see is your Skybox. As we said, Skyboxes are six-sided cubes, which means you will need six separate images that can essentially be clamped to each other to form the cube. The following screenshot shows the Default Skybox that Unity projects start with as well as the completed Custom Skybox you will create in this section: Perform the following steps to create a Skybox: In the Project panel, create a Skybox subfolder in the Assets folder. We will use this folder to store our textures and materials for the Skybox. Drag the provided six Skybox images, or your own, into the new Skybox folder. Ensure the Skybox folder is selected in the Project panel. From the top menu, select Assets | Create | Material. In the Project panel, name the material Skybox. With the Skybox material selected, turn your attention to the Inspector panel. Select the Shader drop-down menu and select SkyBox | 6 Sided. Use the Select button for each of the six images and navigate to the images you added in step 2. Be sure to match the appropriate texture to the appropriate cube face. For example, the SkyBox_Front texture matches the Front[+Z] cube face on the Skybox Material. In order to assign our new Skybox to our scene, select Window | Lighting | Settings from the top menu. This will bring up the Lighting settings dialog window. In the Lighting settings dialog window, click on the small circle to the right of the Skybox Material input field. Then, close the selection window and the Lighting window. Refer to the following screenshot: You will now be able to see your Skybox in the Scene view. When you click on the Camera in the Hierarchy panel, you will also see the Skybox as it will appear from the camera's perspective. Be sure to save your scene and your project. Using multiple cameras Our Unity games must have a least one camera, but we are not limited to using just one. As you will see we will attach our main camera, or primary camera, to our player character. It will be as if the camera is following the character around the game environment. This will become the eyes of our character. We will play the game through our character's view. A common use of a second camera is to create a mini-map that can be seen in a small window on top of the game display. These mini-maps can be made to toggle on and off or be permanent/fixed display components. Implementations might consist of a fog-of-war display, a radar showing enemies, or a global top-down view of the map for orientation purposes. You are only limited by your imagination. Another use of multiple cameras is to provide the player with the ability to switch between third-person and first-person views. You will remember that the first-person view puts the player's arms in view, while in the third-person view, the player's entire body is visible. We can use two cameras in the appropriate positions to support viewing from either camera. In a game, you might make this a toggle—say, with the C keyboard key—that switches from one camera to the other. Depending on what is happening in the game, the player might enjoy this ability. Some single-player games feature multiple playable characters. Giving the player the ability to switch between these characters gives them greater control over the game strategy. To achieve this, we would need to have cameras attached to each playable character and then give the player the ability to swap characters. We would do this through scripting. This is a pretty advanced implementation of multiple characters. Another use of multiple cameras is adding specialty views in a game. These specialty views might include looking through a door's peep-hole, looking through binoculars at the top of a skyscraper, or even looking through a periscope. We can attach cameras to objects and change their viewing parameters to create unique camera use in our games. We are only limited by our own game designs and imagination. We can also use cameras as cameras. That's right! We can use the camera game object to simulate actual in-game cameras. One example is implementing security cameras in a prison-escape game. Working with lighting In the previous sections, we explored the uses of cameras for Unity games. Just like in the real world, cameras need lights to show us objects. In Unity games, we use multiple lights to illuminate the game environment. In Unity, we have both dynamic lighting techniques as well as light baking options for better performance. We can add numerous light sources throughout our scenes and selectively enable or disable an object's ability to cast or receive shadows. This level of specificity gives us tremendous opportunity to create realistic game scenes. Perhaps the secret behind Unity's ability to so realistically render light and shadows is that Unity models the actual behavior of lights and shadows. Real-time global illumination gives us the ability to instantiate multiple lights in each scene, each with the ability to directly or indirectly impact objects in the scene that are within range of the light sources. We can also add and manipulate ambient light in our game scenes. This is often done with Skyboxes, a tri-colored gradient, or even a single color. Each new scene in Unity has default ambient lighting, which we can control by editing the values in the Lighting window. In that window, you have access to the following settings: Environment Real-time Lighting Mixed Lighting Lightmapping Settings Other Settings Debug Settings No changes to these are required for our game at this time. We have already set the environmental lighting to our Skybox. When we create our scenes in Unity, we have three options for lighting. We can use real-time dynamic light, use the baked lighting approach, or use a mixture of the two. Our games perform more efficiently with baked lighting, compared to real-time dynamic lighting, so if performance is a concern, try using baked lighting where you can. To summarize, we have discussed how to create interesting lighting and camera effects using Unity 2018. This article is an extract from the book Getting Started with Unity 2018 written by Dr. Edward Lavieri. This book will help you create fun filled real world games with Unity 2018. Game Engine Wars: Unity vs Unreal Engine Unity plugins for augmented reality application development How to create non-player Characters (NPC) with Unity 2018

0
0
25654

article-image-debugging-and-improving-code-with-chatgpt

Dan MacLean

23 Nov 2023

8 min read

Debugging and Improving Code with ChatGPT

Dan MacLean

23 Nov 2023

8 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, R Bioinformatics Cookbook - Second Edition, by Dan MacLean. Discover over 80 recipes for modeling and handling real-life biological data using modern libraries from the R ecosystem.IntroductionEmbrace the power of streamlined debugging and code refinement with ChatGPT's expertise. Unravel the possibilities of effortlessly troubleshooting errors, optimizing performance, and refining code structures. This article explores how ChatGPT, armed with its extensive programming knowledge, assists in identifying and rectifying coding errors, offering tailored solutions, and fostering a deeper understanding of code logic. Dive into a journey of code enhancement, where ChatGPT becomes your indispensable companion in navigating complexities and elevating your coding experiencesDebugging and improving code with ChatGPTChatGPT is capable of debugging code due to its extensive training on programming-related text. It has acquired knowledge of common programming patterns and errors, allowing it to understand code snippets, identify potential issues, and provide suggestions or improvements. It can identify minor errors by highlighting syntax issues such as missing brackets, incorrect function calls, and invalid variable assignments. ChatGPT helps troubleshooting by asking clarifying questions to better understand the problem and guide developers toward the root cause, and can aid in code optimization by suggesting efficiency improvements, identifying performance bottlenecks, and proposing more optimized implementations. Lastly, the debugging process with ChatGPT can foster learning and exploration, enabling developers to discover new concepts, programming techniques, or functionalities of specific libraries or packages. In this recipe, we’ll walk through an example debugging session with ChatGPT and look at how it helps us to reflect better on our own code.Getting readyWe just need an account at ChatGPT: https://chat.openai.com/.How to do it…We can debug some code as follows:1. Set up some code that we expect to work and try it:data <- data.frame( predictor = c(1, 2, 3, 4, NA), response = c("A", "B", "C", "D", "E") ) model <- lm(response ~ predictor, data = data)2. Send the broken code to ChatGPT:Help me fix the following code: data <- data.frame( predictor = c(1, 2, 3, 4, NA), response = c("A", "B", "C", "D", "E") ) model <- lm(response ~ predictor, data = data)3. Read and run the resulting response in R:To fix the code, you need to handle the missing value (NA) in the predictor column. Here's the corrected code: # Fixing the code data <- data.frame( predictor = c(1, 2, 3, 4, NA), response = c("A", "B", "C", "D", "E") ) # Replace NA in the predictor column with the mean value data$predictor[is.na(data$predictor)] <- mean(data$predictor, na.rm = TRUE) # Convert the response variable to a factor data$response <- as.factor(data$response) # Perform linear regression model <- lm(response ~ predictor, data = data) summary(model) In the fixed code, I replaced the NA value in the predictor column with the mean value of the remaining numeric values using the mean() function with na.rm = TRUE argument. Then, I converted the response variable to a factor using the as.factor() function. Finally, the linear regression model is computed with the corrected data, and the summary(model) displays the model summary.4. Tell ChatGPT the code still doesn’t work:It still doesn't work - Im getting thie error: Error in as.character.factor(x) : malformed factor In addition: Warning message: In Ops.factor(r, 2) : '^' not meaningful for factors 5. Read the response and run the code:Apologies for the oversight. The error is occurring because the ^ operator is not applicable to factors in the formula used in the linear regression. To fix this, we can convert the factor to a numeric representation using numeric codes or dummy variables. Here's the updated code: # Fixing the code data <- data.frame( predictor = c(1, 2, 3, 4, NA), response = c("A", "B", "C", "D", "E") ) # Replace NA in the predictor column with the mean value data$predictor[is.na(data$predictor)] <- mean(data$predictor, na.rm = TRUE) # Convert the response variable to a factor data$response <- as.factor(data$response) # Convert the factor to numeric codes data$response <- as.numeric(data$response) # Perform linear regression model <- lm(response ~ predictor, data = data) summary(model) In the updated code, I've used as.numeric() to convert the response factor variable to numeric codes. This allows the regression model to treat it as a numerical response. Now, the lm() function should run without the error, and summary(model) will display the model summary.6. Ask it to check whether the original code was appropriately analyzed:Were the original data suitable for a linear model?7. Read the response:No, the original data provided in the code were not suitable for a linear model. Here's why: 1. The `predictor` variable was numeric, which is suitable for a linear model. 2. The `response` variable, however, was categorical with values "A", "B", "C", "D", and "E". Categorical variables cannot be directly used as the response variable in a linear regression model because they do not represent continuous numeric values. To use a categorical variable as the response variable in a linear model, you typically need to encode it into dummy variables or use techniques like multinomial logistic regression. Dummy variable encoding represents each category as a separate binary variable (0 or 1). Once the categorical variable is properly encoded, it can be used as the response variable in a linear regression model.Again, that’s how the process goes though the output can be different from user to user but the meaning should be the same.How it works…In step 1, we set up a data frame of some data as a test set to use in a linear model; we wish to examine the relationship between the two variables so assume that a linear model will work. When we run the code in R, we get this:Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : NA/NaN/Inf in 'y' In addition: Warning message: In storage.mode(v) <- "double" : NAs introduced by coercion This is a typically confounded R error message. We want help, so in step 2, we ask ChatGPT to fix the code.Step 3 shows us ChatGPT’s response, which suggests fixing the NA values that are in the predictor column. That seems reasonable, and, as it explains, ChatGPT gives us some code that imputes a new value from the mean of all the other values – again, a reasonable value to impute. When we run the code, it still doesn’t work and we get a new error, so in step 4, we tell ChatGPT about it and ask it to fix the new errors.In step 5, we see an apologetic language model attempt to correct the error. It gives us a confusing reason for doing some strange text/number conversion and fixed code. When we run this new code in the console, we get output like this:## Call: ## lm(formula = response ~ predictor, data = data) ## ## Residuals: ## 1 2 3 4 5 ## -0.5 -0.5 -0.5 -0.5 2.0 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.5000 1.5546 0.322 0.769 ## predictor 1.0000 0.5774 1.732 0.182 ## ## Residual standard error: 1.291 on 3 degrees of freedom ## Multiple R-squared: 0.5, Adjusted R-squared: 0.3333 ## F-statistic: 3 on 1 and 3 DF, p-value: 0.1817This looks a bit strange – the residuals are weird and the rest of the values look poor. We start to question whether this was the right thing to do in the first place.In step 6, we ask ChatGPT whether the linear model was the right sort of analysis. It responds as in step 7, telling us quite clearly that it was not appropriate.This recipe highlights that we can use ChatGPT to fix code that doesn’t work, but shows also that ChatGPT will not reason without prompting. Here, it let us pursue a piece of code that wasn’t right for the task. As a language model, it can’t know that, even though we believe it would be evident from the question setup. It didn’t try to correct our flawed assumptions or logic. We still need to be responsible for the logic and applicability of our code.ConclusionIn conclusion, ChatGPT emerges as a valuable ally in code debugging, offering insightful solutions and guiding developers toward efficient, error-free code. While it excels in identifying issues and suggesting fixes, it's crucial to recognize that ChatGPT operates within the boundaries of provided information. This journey underscores the importance of critical thinking in code analysis, reminding us that while ChatGPT empowers us with solutions, the responsibility for logical correctness and code applicability ultimately rests on the developer. Leveraging ChatGPT's expertise alongside mindful coding practices paves the way for seamless debugging and continual improvement in software development endeavors.Author BioProfessor Dan MacLean has a Ph.D. in molecular biology from the University of Cambridge and gained postdoctoral experience in genomics and bioinformatics at Stanford University in California. Dan is now Head of Bioinformatics at the world-leading Sainsbury Laboratory in Norwich, UK where he works on bioinformatics, genomics, and machine learning. He teaches undergraduates, post-graduates, and post-doctoral students in data science and computational biology. His research group has developed numerous new methods and software in R, Python, and other languages with over 100,000 downloads combined.

0
0
25634

Packt

19 Oct 2015

11 min read

Getting started with Cocos2d-x

Packt

19 Oct 2015

11 min read

In this article written by Akihiro Matsuura, author of the book Cocos2d-x Cookbook, we're going to install Cocos2d-x and set up the development environment. The following topics will be covered in this article: Installing Cocos2d-x Using Cocos command Building the project by Xcode Building the project by Eclipse Cocos2d-x is written in C++, so it can build on any platform. Cocos2d-x is open source written in C++, so we can feel free to read the game framework. Cocos2d-x is not a black box, and this proves to be a big advantage for us when we use it. Cocos2d-x version 3, which supports C++11, was only recently released. It also supports 3D and has an improved rendering performance. (For more resources related to this topic, see here.) Installing Cocos2d-x Getting ready To follow this recipe, you need to download the zip file from the official site of Cocos2d-x (http://www.cocos2d-x.org/download). In this article we've used version 3.4 which was the latest stable version that was available. How to do it... Unzip your file to any folder. This time, we will install the user's home directory. For example, if the user name is syuhari, then the install path is /Users/syuhari/cocos2d-x-3.4. We call it COCOS_ROOT. The following steps will guide you through the process of setting up Cocos2d-x: Open the terminal Change the directory in terminal to COCOS_ROOT, using the following comand: $ cd ~/cocos2d-x-v3.4 Run setup.py, using the following command: $ ./setup.py The terminal will ask you for NDK_ROOT. Enter into NDK_ROOT path. The terminal will will then ask you for ANDROID_SDK_ROOT. Enter the ANDROID_SDK_ROOT path. Finally, the terminal will ask you for ANT_ROOT. Enter the ANT_ROOT path. After the execution of the setup.py command, you need to execute the following command to add the system variables: $ source ~/.bash_profile Open the .bash_profile file, and you will find that setup.py shows how to set each path in your system. You can view the .bash_profile file using the cat command: $ cat ~/.bash_profile We now verify whether Cocos2d-x can be installed: Open the terminal and run the cocos command without parameters. $ cocos If you can see a window like the following screenshot, you have successfully completed the Cocos2d-x install process. How it works... Let's take a look at what we did throughout the above recipe. You can install Cocos2d-x by just unzipping it. You know setup.py is only setting up the cocos command and the path for Android build in the environment. Installing Cocos2d-x is very easy and simple. If you want to install a different version of Cocos2d-x, you can do that too. To do so, you need to follow the same steps that are given in this recipe, but which will be for a different version. There's more... Setting up the Android environment is a bit tough. If you started to develop at Cocos2d-x soon, you can turn after the settings part of Android. And you would do it when you run on Android. In this case, you don't have to install Android SDK, NDK, and Apache. Also, when you run setup.py, you only press Enter without entering a path for each question. Using the cocos command The next step is using the cocos command. It is a cross-platform tool with which you can create a new project, build it, run it, and deploy it. The cocos command works for all Cocos2d-x supported platforms. And you don't need to use an IDE if you don't want to. In this recipe, we take a look at this command and explain how to use it. How to do it... You can use the cocos command help by executing it with the --help parameter, as follows: $ cocos --help We then move on to generating our new project: Firstly, we create a new Cocos2d-x project with the cocos new command, as shown here: $ cocos new MyGame -p com.example.mygame -l cpp -d ~/Documents/ The result of this command is shown the following screenshot: Behind the new parameter is the project name. The other parameters that are mentioned denote the following: MyGame is the name of your project. -p is the package name for Android. This is the application id in Google Play store. So, you should use the reverse domain name to the unique name. -l is the programming language used for the project. You should use "cpp" because we will use C++. -d is the location in which to generate the new project. This time, we generate it in the user's documents directory. You can look up these variables using the following command: $ cocos new —help Congratulations, you can generate your new project. The next step is to build and run using the cocos command. Compiling the project If you want to build and run for iOS, you need to execute the following command: $ cocos run -s ~/Documents/MyGame -p ios The parameters that are mentioned are explained as follows: -s is the directory of the project. This could be an absolute path or a relative path. -p denotes which platform to run on.If you want to run on Android you use -p android. The available options are IOS, android, win32, mac, and linux. You can run cocos run –help for more detailed information. The result of this command is shown in the following screenshot: You can now build and run iOS applications of cocos2d-x. However, you have to wait for a long time if this is your first time building an iOS application. That's why it takes a long time to build cocos2d-x library, depending on if it was clean build or first build. How it works... The cocos command can create a new project and build it. You should use the cocos command if you want to create a new project. Of course, you can build by using Xcode or Eclipse. You can easier of there when you develop and debug. There's more... The cocos run command has other parameters. They are the following: --portrait will set the project as a portrait. This command has no argument. --ios-bundleid will set the bundle ID for the iOS project. However, it is not difficult to set it later. The cocos command also includes some other commands, which are as follows: The compile command: This command is used to build a project. The following patterns are useful parameters. You can see all parameters and options if you execute cocos compile [–h] command. cocos compile [-h] [-s SRC_DIR] [-q] [-p PLATFORM] [-m MODE] The deploy command: This command only takes effect when the target platform is android. It will re-install the specified project to the android device or simulator. cocos deploy [-h] [-s SRC_DIR] [-q] [-p PLATFORM] [-m MODE] The run command continues to compile and deploy commands. Building the project by Xcode Getting ready Before building the project by Xcode, you require Xcode with an iOS developer account to test it on a physical device. However, you can also test it on an iOS simulator. If you did not install Xcode, you can get it from Mac App Store. Once you have installed it, get it activated. How to do it... Open your project from Xcode. You can open your project by double-clicking on the file placed at: ~/Documents/MyGame/proj.ios_mac/MyGame.xcodeproj. Build and Run by Xcode You should select an iOS simulator or real device on which you want to run your project. How it works... If this is your first time building, it will take a long time. But continue to build with confidence as it's the first time. You can develop your game faster if you develop and debug it using Xcode rather than Eclipse. Building the project by Eclipse Getting ready You must finish the first recipe before you begin this step. If you have not finished it yet, you will need to install Eclipse. How to do it... Setting up NDK_ROOT: Open the preference of Eclipse Open C++ | Build | Environment Click on Add and set the new variable, name is NDK_ROOT, value is NDK_ROOT path. Importing your project into Eclipse: Open the file and click on Import Go to Android | Existing Android Code into Workspace Click on Next Import the project to Eclipse at ~/Documents/MyGame/proj.android. Importing Cocos2d-x library into Eclipse Perform the same steps from Step 3 to Step 4. Import the project cocos2d lib at ~/Documents/MyGame/cocos2d/cocos/platform/android/java, using the folowing command: importing cocos2d lib Build and Run Click on Run icon The first time, Eclipse asks you to select a way to run your application. You select Android Application and click on OK, as shown in the following screenshot: If you connected the Android device on your Mac, you can run your game on your real device or an emulator. The following screenshot shows that it is running it on Nexus5. If you added cpp files into your project, you have to modify the Android.mk file at ~/Documenst/MyGame/proj.android/jni/Android.mk. This file is needed to build NDK. This fix is required to add files. The original Android.mk would look as follows: LOCAL_SRC_FILES := hellocpp/main.cpp ../../Classes/AppDelegate.cpp ../../Classes/HelloWorldScene.cpp If you added the TitleScene.cpp file, you have to modify it as shown in the following code: LOCAL_SRC_FILES := hellocpp/main.cpp ../../Classes/AppDelegate.cpp ../../Classes/HelloWorldScene.cpp ../../Classes/TitleScene.cpp The preceding example shows an instance of when you add the TitleScene.cpp file. However, if you are also adding other files, you need to add all the added files. How it works... You get lots of errors when importing your project into Eclipse, but don't panic. After importing cocos2d-x library, errors soon disappear. This allows us to set the path of NDK, Eclipse could compile C++. After you modified C++ codes, run your project in Eclipse. Eclipse automatically compiles C++ codes, Java codes, and then runs. It is a tedious task to fix Android.mk again to add the C++ files. The following code is original Android.mk: LOCAL_SRC_FILES := hellocpp/main.cpp ../../Classes/AppDelegate.cpp ../../Classes/HelloWorldScene.cpp LOCAL_C_INCLUDES := $(LOCAL_PATH)/../../Classes The following code is customized Android.mk that adds C++ files automatically. CPP_FILES := $(shell find $(LOCAL_PATH)/../../Classes -name *.cpp) LOCAL_SRC_FILES := hellocpp/main.cpp LOCAL_SRC_FILES += $(CPP_FILES:$(LOCAL_PATH)/%=%) LOCAL_C_INCLUDES := $(shell find $(LOCAL_PATH)/../../Classes -type d) The first line of the code gets C++ files to the Classes directory into CPP_FILES variable. The second and third lines add C++ files into LOCAL_C_INCLUDES variable. By doing so, C++ files will be automatically compiled in NDK. If you need to compile a file other than the extension .cpp file, you will need to add it manually. There's more... If you want to manually build C++ in NDK, you can use the following command: $ ./build_native.py This script is located at the ~/Documenst/MyGame/proj.android . It uses ANDROID_SDK_ROOT and NDK_ROOT in it. If you want to see its options, run ./build_native.py –help. Summary Cocos2d-x is an open source, cross-platform game engine, which is free and mature. It can publish games for mobile devices and desktops, including iPhone, iPad, Android, Kindle, Windows, and Mac. The book Cocos2d-x Cookbook focuses on using version 3.4, which is the latest version of Cocos2d-x that was available at the time of writing. We focus on iOS and Android development, and we'll be using Mac because we need it to develop iOS applications. Resources for Article: Further resources on this subject: CREATING GAMES WITH COCOS2D-X IS EASY AND 100 PERCENT FREE [Article] Dragging a CCNode in Cocos2D-Swift [Article] COCOS2D-X: INSTALLATION [Article]

0
0
25632

MongoDB is partnering with Alibaba

How to write effective Stored Procedures in PostgreSQL

Troubleshooting in SQL Server

Can a modified MIT ‘Hippocratic License’ to restrict misuse of open source software prompt a wave of ethical innovation in tech?

6 index types in PostgreSQL 10 you should know

20 lessons on bias in machine learning systems by Kate Crawford at NIPS 2017

Working with Ceph Block Device

Alarming ways governments are using surveillance tech to watch you

The Zombie Attacks!

Spring Security 3: Tips and Tricks

Trending Topics

CORS in Node.js

Knowing the SQL-injection attacks and securing our Android applications from them

Implementing lighting & camera effects in Unity 2018

Debugging and Improving Code with ChatGPT

Getting started with Cocos2d-x

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access