Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-lukasz-langa-at-pylondinium19-if-python-stays-synonymous-with-cpython-for-too-long-well-be-in-big-trouble
Sugandha Lahoti
13 Aug 2019
7 min read
Save for later

Łukasz Langa at PyLondinium19: “If Python stays synonymous with CPython for too long, we’ll be in big trouble”

Sugandha Lahoti
13 Aug 2019
7 min read
PyLondinium, the conference for Python developers was held in London, from the 14th to the 16th of June, 2019. At the Sunday Keynote Łukasz Langa, the creator of Black (Python code formatter) and Python 3.8 release manager spoke on where Python could be in 2020 and how Python developers should try new browser and mobile-friendly versions of Python. Python is an extremely expressive language, says Łukasz. “When I first started I was amazed how much you can accomplish with just a few lines of code especially compared to Java. But there are still languages that are even more expressive and enables even more compact notation.” So what makes Python special? Python is run above pseudocode; it reads like English; it is very elegant. “Our responsibility as developers,” Łukasz mentions “is to make Python’s runnable pseudocode convenient to use for new programmers.” Python has gotten much bigger, stable and more complex in the last decade. However, the most low-hanging fruit, Łukasz says, has already been picked up and what's left is the maintenance of an increasingly fossilizing interpreter and a stunted library. This maintenance is both tedious and tricky especially for a dynamic interpreter language like Python. Python being a community-run project is both a blessing and a curse Łukasz talks about how Python is the biggest community ran programming language on the planet. Other programming languages with similar or larger market penetration are either run by single corporations or multiple committees. Being a community project is both a blessing and a curse for Python, says Łukasz. It's a blessing because it's truly free from shareholder pressure and market swing. It’s a curse because almost the entire core developer team is volunteering their time and effort for free and the Python Software Foundation is graciously funding infrastructure and events; it does not currently employ any core developers. Since there is both Python and software right in the name of the foundation, Lukasz says he wants it to change. “If you don't pay people, you have no influence over what they work on. Core developers often choose problems to tackle based on what inspires them personally. So we never had an explicit roadmap on where Python should go and what problems or developers should focus on,” he adds. Python is no longer governed by a BDFL says Łukasz, “My personal hope is that the steering council will be providing visionary guidance from now on and will present us with an explicit roadmap on where we should go.” Interesting and dead projects in Python Łukasz talked about mypyc and invited people to work and contribute to this project as well as organizations to sponsor it. Mypyc is a compiler that compiles mypy-annotated, statically typed Python modules into CPython C extensions. This restricts the Python language to enable compilation. Mypyc supports a subset of Python. He also mentioned MicroPython, which is a Kickstarter-funded subset of Python optimized to run on microcontrollers and other constrained environments. It is a compatible runtime for microcontrollers that has very little memory- 16 kilobytes of RAM and 256 kilobytes for code memory and minimal computing power. He also talks about micro:bit. He also mentions many dead/dying/defunct projects for alternative Python interpreters, including Unladen Swallow, Pyston, IronPython. He talked about PyPy - the JIT Python compiler written in Python. Łukasz mentions that since it is written in Python 2, it makes it the most complex applications written in the industry. “This is at risk at the moment,” says Łukasz “since it’s a large Python 2 codebase needs updating to Python 3. Without a tremendous investment, it is very unlikely to ever migrate to Python 3.” Also, trying to replicate CPython quirks and bugs requires a lot of effort. Python should be aligned with where developer trends are shifting Łukasz believes that a stronger division between language and the reference implementation is important in case of Python. He declared, “If Python stays synonymous with CPython for too long, we’ll be in big trouble.” This is because CPython is not available where developer trends are shifting. For the web, the lingua franca is JavaScript now. For the two biggest operating systems on mobile, there is Swift the modern take on Objective C and Kotlin, the modern take on Java. For VR AR and 3D games, there is C# provided by Unity. While Python is growing fast, it’s not winning ground in two big areas: the browser, and mobile. Python is also slowly losing ground in the field of systems orchestration where Go is gaining traction. He adds, “if there were not the rise of machine learning and artificial intelligence, Python would have not survived the transition between Python 2 and Python 3.” Łukasz mentions how providing a clear supported and official option for the client-side web is what Python needs in order to satisfy the legion of people that want to use it.  He says, “for Python, the programming language to need to reach new heights we need a new kind of Python. One that caters to where developer trends are shifting - mobile, web, VR, AR, and 3D games. There should be more projects experimenting with Python for these platforms. This especially means trying restricted versions of the language because they are easier to optimize. We need a Python compiler for Web and Python on Mobile Łukasz talked about the need to shift to where developer trends are shifting. He says we need a Python compiler for the web - something that compiles your Python code to the web platform directly. He also adds, that to be viable for professional production use, Python on the web must not be orders of magnitude slower than the default option (Javascript) which is already better supported and has better documentation and training. Similarly, for mobile he wants a small Python application so that websites run fast and have quick user interactions. He gives the example of the Go programming language stating how “one of Go’s claims to fame is the fact that they shipped static binaries so you only have one file. You can choose to still use containers but it’s not necessary; you don't have virtual ends, you don't have pip installs, and you don't have environments that you have to orchestrate.” Łukasz further adds how the areas of modern focus where Python currently has no penetration don't require full compatibility with CPython. Starting out with a familiar subset of Python for the user that looks like Python would simplify the development of a new runtime or compiler a lot and potentially would even fit the target platform better. What if I want to work on CPython? Łukasz says that developers can still work on CPython if they want to. “I'm not saying that CPython is a dead end; it will forever be an important runtime for Python. New people are still both welcome and needed in fact. However, working on CPython today is different from working on it ten years ago; the runtime is mission-critical in many industries which is why developers must be extremely careful.” Łukasz sums his talk by declaring, “I strongly believe that enabling Python on new platforms is an important job. I'm not saying Python as the entire programming language should just abandon what it is now. I would prefer for us to be able to keep Python exactly as it is and just move it to all new platforms. Albeit, it is not possible without multi-million dollar investments over many years.” The talk was well appreciated by Twitter users with people lauding it as ‘fantastic’ and ‘enlightening’. https://twitter.com/WillingCarol/status/1156411772472971264 https://twitter.com/freakboy3742/status/1156365742435995648 https://twitter.com/jezdez/status/1156584209366081536 You can watch the full Keynote on YouTube. NumPy 1.17.0 is here, officially drops Python 2.7 support pushing forward Python 3 adoption Python 3.8 new features: the walrus operator, positional-only parameters, and much more Introducing PyOxidizer, an open source utility for producing standalone Python applications, written in Rust
Read more
  • 0
  • 0
  • 16543

article-image-configuring-apache-and-nginx
Packt
19 Jul 2010
8 min read
Save for later

Configuring Apache and Nginx

Packt
19 Jul 2010
8 min read
(For more resources on Nginx, see here.) There are basically two main parts involved in the configuration, one relating to Apache and one relating to Nginx. Note that while we have chosen to describe the process for Apache in particular, this method can be applied to any other HTTP server. The only point that differs is the exact configuration sections and directives that you will have to edit. Otherwise, the principle of reverse-proxy can be applied, regardless of the server software you are using. Reconfiguring Apache There are two main aspects of your Apache configuration that will need to be edited in order to allow both Apache and Nginx to work together at the same time. But let us first clarify where we are coming from, and what we are going towards. Configuration overview At this point, you probably have the following architecture set up on your server: A web server application running on port 80, such as Apache A dynamic server-side script processing application such as PHP, communicating with your web server via CGI, FastCGI, or as a server module The new configuration that we are going towards will resemble the following: Nginx running on port 80 Apache or another web server running on a different port, accepting requests coming from local sockets only The script processing application configuration will remain unchanged As you can tell, only two main configuration changes will be applied to Apache as well as the other web server that you are running. Firstly, change the port number in order to avoid conflicts with Nginx, which will then be running as the frontend server. Secondly, (although this is optional) you may want to disallow requests coming from the outside and only allow requests forwarded by Nginx. Both configuration steps are detailed in the next sections. Resetting the port number Depending on how your web server was set up (manual build, automatic configuration from server panel managers such as cPanel, Plesk, and so on) you may find yourself with a lot of configuration files to edit. The main configuration file is often found in /etc/httpd/conf/ or /etc/apache2/, and there might be more depending on how your configuration is structured. Some server panel managers create extra configuration files for each virtual host. There are three main elements you need to replace in your Apache configuration: The Listen directive is set to listen on port 80 by default. You will have to replace that port by another such as 8080. This directive is usually found in the main configuration file. You must make sure that the following configuration directive is present in the main configuration file: NameVirtualHost A.B.C.D:8080, where A.B.C.D is the IP address of the main network interface on which server communications go through. The port you just selected needs to be reported in all your virtual host configuration sections, as described below. The virtual host sections must be transformed from the following template <VirtualHost A.B.C.D:80> ServerName example.com ServerAlias www.example.com [...]</VirtualHost> to the following: <VirtualHost A.B.C.D:8080> ServerName example.com:8080 ServerAlias www.example.com [...]</VirtualHost> In this example, A.B.C.D is the IP address of the virtual host and example.com is the virtual host's name. The port must be edited on the first two lines. Accepting local requests only There are many ways you can restrict Apache to accept only local requests, denying access to the outside world. But first, why would you want to do that? As an extra layer positioned between the client and Apache, Nginx provides a certain comfort in terms of security. Visitors no longer have direct access to Apache, which decreases the potential risk regarding all security issues the web server may have. Globally, it's not necessarily a bad idea to only allow access to your frontend server. The first method consists of changing the listening network interface in the main configuration file. The Listen directive of Apache lets you specify a port, but also an IP address, although, by default, no IP address is selected resulting in communications coming from all interfaces. All you have to do is replace the Listen 8080 directive by Listen 127.0.0.1:8080; Apache should then only listen on the local IP address. If you do not host Apache on the same server, you will need to specify the IP address of the network interface that can communicate with the server hosting Nginx. The second alternative is to establish per-virtual-host restrictions: <VirtualHost A.B.C.D:8080> ServerName example.com:8080 ServerAlias www.example.com [...] Order deny,allow allow from 127.0.0.1 allow from 192.168.0.1 eny all</VirtualHost> Using the allow and deny Apache directives, you are able to restrict the allowed IP addresses accessing your virtual hosts. This allows for a finer configuration, which can be useful in case some of your websites cannot be fully served by Nginx. Once all your changes are done, don't forget to reload the server to make sure the new configuration is applied, such as service httpd reload or /etc/init.d/ httpd reload. Configuring Nginx There are only a couple of simple steps to establish a working configuration of Nginx, although it can be tweaked more accurately as seen in the next section. Enabling proxy options The first step is to enable proxying of requests from your location blocks. Since the proxy_pass directive cannot be placed at the http or server level, you need to include it in every single place that you want to be forwarded. Usually, a location / { fallback block suffices since it encompasses all requests, except those that match location blocks containing a break statement. Here is a simple example using a single static backend hosted on the same server: server { server_name .example.com; root /home/example.com/www; [...] location / { proxy_pass http://127.0.0.1:8080; }} In the following example, we make use of an Upstream block allowing us to specify multiple servers: upstream apache { server 192.168.0.1:80; server 192.168.0.2:80; server 192.168.0.3:80 weight=2; server 192.168.0.4:80 backup;} server { server_name .example.com; root /home/example.com/www; [...] location / { proxy_pass http://apache; }} So far, with such a configuration, all requests are proxied to the backend server; we are now going to separate the content into two categories: Dynamic files: Files that require processing before being sent to the client, such as PHP, Perl, and Ruby scripts, will be served by Apache Static files: All other content that does not require additional processing, such as images, CSS files, static HTML files, and media, will be served directly by Nginx We thus have to separate the content somehow to be provided by either server. Separating content In order to establish this separation, we can simply use two different location blocks—one that will match the dynamic file extensions and another one encompassing all the other files. This example passes requests for .php files to the proxy: server { server_name .example.com; root /home/example.com/www; [...] location ~* .php.$ { # Proxy all requests with an URI ending with .php* # (includes PHP, PHP3, PHP4, PHP5...) proxy_pass http://127.0.0.1:8080; } location / { # Your other options here for static content # for example cache control, alias... expires 30d; }} This method, although simple, will cause trouble with websites using URL rewriting. Most Web 2.0 websites now use links that hide file extensions such as http://example.com/articles/us-economy-strengthens/; some even replace file extensions with links resembling the following: http://example.com/useconomy- strengthens.html. When building a reverse-proxy configuration, you have two options: Port your Apache rewrite rules to Nginx (usually found in the .htaccess file at the root of the website), in order for Nginx to know the actual file extension of the request and proxy it to Apache correctly. If you do not wish to port your Apache rewrite rules, the default behavior shown by Nginx is to return 404 errors for such requests. However, you can alter this behavior in multiple ways, for example, by handling 404 requests with the error_page directive or by testing the existence of files before serving them. Both solutions are detailed below. Here is an implementation of this mechanism, using the error_page directive : server { server_name .example.com; root /home/example.com/www; [...] location / { # Your static files are served here expires 30d; [...] # For 404 errors, submit the query to the @proxy # named location block error_page 404 @proxy; } location @proxy { proxy_pass http://127.0.0.1:8080; }} Alternatively, making use of the if directive from the Rewrite module: server { server_name .example.com; root /home/example.com/www; [...] location / { # If the requested file extension ends with .php, # forward the query to Apache if ($request_filename ~* .php.$) { break; # prevents further rewrites proxy_pass http://127.0.0.1:8080; } # If the requested file does not exist, # forward the query to Apache if (!-f $request_filename) { break; # prevents further rewrites proxy_pass http://127.0.0.1:8080; } # Your static files are served here expires 30d; }} There is no real performance difference between both solutions, as they will transfer the same amount of requests to the backend server. You should work on porting your Apache rewrite rules to Nginx if you are looking to get optimal performance.
Read more
  • 0
  • 0
  • 16506

article-image-preparing-and-automating-a-task-in-python-tutorial
Bhagyashree R
10 Jan 2019
15 min read
Save for later

Preparing and automating a task in Python [Tutorial]

Bhagyashree R
10 Jan 2019
15 min read
To properly automate tasks, we need a platform so that they run automatically at the proper times. A task that needs to be run manually is not really fully automated. But, in order to be able to leave them running in the background while worrying about more pressing issues, the task will need to be adequate to run in fire-and-forget mode. We should be able to monitor that it runs correctly, be sure that we are capturing future actions (such as receiving notifications if something interesting arises), and know whether there have been any errors while running it. Ensuring that a piece of software runs consistently with high reliability is actually a very big deal and is one area that, to be done properly, requires specialized knowledge and staff, which typically go by the names of sysadmin, operations, or SRE (Site Reliability Engineering). In this article, we will learn how to prepare and automatically run tasks. It covers how to program tasks to be executed when they should, instead of running them manually, and how to be notified if there has been an error in an automated process. This article is an excerpt from a book written by Jaime Buelta titled Python Automation Cookbook.  The Python Automation Cookbook helps you develop a clear understanding of how to automate your business processes using Python, including detecting opportunities by scraping the web, analyzing information to generate automatic spreadsheets reports with graphs, and communicating with automatically generated emails. To follow along with the examples implemented in the article, you can find the code on the book's GitHub repository. Preparing a task It all starts with defining exactly what task needs to be run and designing it in a way that doesn't require human intervention to run. Some ideal characteristic points are as follows: Single, clear entry point: No confusion on what the task to run is. Clear parameters: If there are any parameters, they should be very explicit. No interactivity: Stopping the execution to request information from the user is not possible. The result should be stored: To be able to be checked at a different time than when it runs. Clear result: If we are working interactively in a result, we accept more verbose results or progress reports. But, for an automated task, the final result should be as concise and to the point as possible. Errors should be logged: To analyze what went wrong. A command-line program has a lot of those characteristics already. It has a clear way of running, with defined parameters, and the result can be stored, even if just in text format. But, it can be improved with a config file to clarify the parameters and an output file. Getting ready We'll start by following a structure in which the main function will serve as the entry point, and all parameters are supplied to it. The definition of the main function with all the explicit arguments covers points 1 and 2. Point 3 is not difficult to achieve. To improve point 2 and 5, we'll look at retrieving the configuration from a file and storing the result in another. How to do it... Prepare the following task and save it as prepare_task_step1.py: import argparse def main(number, other_number): result = number * other_number print(f'The result is {result}') if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('-n1', type=int, help='A number', default=1) parser.add_argument('-n2', type=int, help='Another number', default=1) args = parser.parse_args() main(args.n1, args.n2) Update the file to define a config file that contains both arguments, and save it as prepare_task_step2.py: import argparse import configparser def main(number, other_number): result = number * other_number print(f'The result is {result}') if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('-n1', type=int, help='A number', default=1) parser.add_argument('-n2', type=int, help='Another number', default=1) parser.add_argument('--config', '-c', type=argparse.FileType('r'), help='config file') args = parser.parse_args() if args.config: config = configparser.ConfigParser() config.read_file(args.config) # Transforming values into integers args.n1 = int(config['DEFAULT']['n1']) args.n2 = int(config['DEFAULT']['n2']) main(args.n1, args.n2) Create the config file config.ini: [ARGUMENTS] n1=5 n2=7 Run the command with the config file: $ python3 prepare_task_step2.py -c config.ini The result is 35 $ python3 prepare_task_step2.py -c config.ini -n1 2 -n2 3 The result is 35 Add a parameter to store the result in a file, and save it as prepare_task_step5.py: import argparse import sys import configparser def main(number, other_number, output): result = number * other_number print(f'The result is {result}', file=output) if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('-n1', type=int, help='A number', default=1) parser.add_argument('-n2', type=int, help='Another number', default=1) parser.add_argument('--config', '-c', type=argparse.FileType('r'), help='config file') parser.add_argument('-o', dest='output', type=argparse.FileType('w'), help='output file', default=sys.stdout) args = parser.parse_args() if args.config: config = configparser.ConfigParser() config.read_file(args.config) # Transforming values into integers args.n1 = int(config['DEFAULT']['n1']) args.n2 = int(config['DEFAULT']['n2']) main(args.n1, args.n2, args.output) Run the result to check that it's sending the output to the defined file: $ python3 prepare_task_step5.py -n1 3 -n2 5 -o result.txt $ cat result.txt The result is 15 $ python3 prepare_task_step5.py -c config.ini -o result2.txt $ cat result2.txt The result is 35 How it works... Note that the argparse module allows us to define files as parameters, with the argparse.FileType type, and opens them automatically. This is very handy and will raise an error if the file is not valid. The configparser module allows us to use config files with ease. As demonstrated in Step 2, the parsing of the file is as simple as follows: config = configparser.ConfigParser() config.read_file(file) The config will then be accessible as a dictionary divided by sections, and then values. Note that the values are always stored in string format, requiring to be transformed into other types, such as integers. Python 3 allows us to pass a file parameter to the print function, which will write to that file. Step 5 shows the usage to redirect all the printed information to a file. Note that the default parameter is sys.stdout, which will print the value to the Terminal (standard output). This makes it so that calling the script without an -o parameter will display the information on the screen, which is helpful in debugging: $ python3 prepare_task_step5.py -c config.ini The result is 35 $ python3 prepare_task_step5.py -c config.ini -o result.txt $ cat result.txt The result is 35 Setting up a cron job Cron is an old-fashioned but reliable way of executing commands. It has been around since the 70s in Unix, and it's an old favorite in system administration to perform maintenance, such as freeing space, rotating logs, making backups, and other common operations. Getting ready We will produce a script, called  cron.py: import argparse import sys from datetime import datetime import configparser def main(number, other_number, output): result = number * other_number print(f'[{datetime.utcnow().isoformat()}] The result is {result}', file=output) if __name__ == '__main__': parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter) parser.add_argument('--config', '-c', type=argparse.FileType('r'), help='config file', default='/etc/automate.ini') parser.add_argument('-o', dest='output', type=argparse.FileType('a'), help='output file', default=sys.stdout) args = parser.parse_args() if args.config: config = configparser.ConfigParser() config.read_file(args.config) # Transforming values into integers args.n1 = int(config['DEFAULT']['n1']) args.n2 = int(config['DEFAULT']['n2']) main(args.n1, args.n2, args.output) Note the following details: The config file is by default, /etc/automate.ini. Reuse config.ini from the previous recipe. A timestamp has been added to the output. This will make it explicit when the task is run. The result is being added to the file, as shown with the 'a' mode where the file is open. The ArgumentDefaultsHelpFormatter parameter automatically adds information about default values when printing the help using the -h argument. Check that the task is producing the expected result and that you can log to a known file: $ python3 cron.py [2018-05-15 22:22:31.436912] The result is 35 $ python3 cron.py -o /path/automate.log $ cat /path/automate.log [2018-05-15 22:28:08.833272] The result is 35 How to do it... Obtain the full path of the Python interpreter. This is the interpreter that's on your virtual environment: $ which python /your/path/.venv/bin/python Prepare the cron to be executed. Get the full path and check that it can be executed with no problem. Execute it a couple of times: $ /your/path/.venv/bin/python /your/path/cron.py -o /path/automate.log $ /your/path/.venv/bin/python /your/path/cron.py -o /path/automate.log Check that the result is being added correctly to the result file: $ cat /path/automate.log [2018-05-15 22:28:08.833272] The result is 35 [2018-05-15 22:28:10.510743] The result is 35 Edit the crontab file to run the task once every five minutes: $ crontab -e */5 * * * * /your/path/.venv/bin/python /your/path/cron.py -o /path/automate.log Note that this opens an editing Terminal with your default command-line editor. Check the crontab contents. Note that this displays the crontab contents, but doesn't set it to edit: $ contab -l */5 * * * * /your/path/.venv/bin/python /your/path/cron.py -o /path/automate.log Wait and check the result file to see how the task is being executed: $ tail -F /path/automate.log [2018-05-17 21:20:00.611540] The result is 35 [2018-05-17 21:25:01.174835] The result is 35 [2018-05-17 21:30:00.886452] The result is 35 How it works... The crontab line consists of a line describing how often to run the task (first six elements), plus the task. Each of the initial six elements mean a different unit of time to execute. Most of them are stars, meaning any: * * * * * * | | | | | | | | | | | +-- Year (range: 1900-3000) | | | | +---- Day of the Week (range: 1-7, 1 standing for Monday) | | | +------ Month of the Year (range: 1-12) | | +-------- Day of the Month (range: 1-31) | +---------- Hour (range: 0-23) +------------ Minute (range: 0-59) Therefore, our line, */5 * * * * *, means every time the minute is divisible by 5, in all hours, all days... all years. Here are some examples: 30 15 * * * * means "every day at 15:30" 30 * * * * * means "every hour, at 30 minutes" 0,30 * * * * * means "every hour, at 0 minutes and 30 minutes" */30 * * * * * means "every half hour" 0 0 * * 1 * means "every Monday at 00:00" Do not try to guess too much. Use a cheat sheet like crontab guru for examples and tweaks. Most of the common usages will be described there directly. You can also edit a formula and get a descriptive text on how it's going to run. After the description of how to run the cron job, including the line to execute the task, as prepared in Step 2 in the How to do it… section. Capturing errors and problems An automated task's main characteristic is its fire-and-forget quality. We are not actively looking at the result, but making it run in the background. This recipe will present an automated task that will safely store unexpected behaviors in a log file that can be checked afterward. Getting ready As a starting point, we'll use a task that will divide two numbers, as described in the command line. How to do it... Create the task_with_error_handling_step1.py file, as follows: import argparse import sys def main(number, other_number, output): result = number / other_number print(f'The result is {result}', file=output) if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('-n1', type=int, help='A number', default=1) parser.add_argument('-n2', type=int, help='Another number', default=1) parser.add_argument('-o', dest='output', type=argparse.FileType('w'), help='output file', default=sys.stdout) args = parser.parse_args() main(args.n1, args.n2, args.output) Execute it a couple of times to see that it divides two numbers: $ python3 task_with_error_handling_step1.py -n1 3 -n2 2 The result is 1.5 $ python3 task_with_error_handling_step1.py -n1 25 -n2 5 The result is 5.0 Check that dividing by 0 produces an error and that the error is not logged on the result file: $ python task_with_error_handling_step1.py -n1 5 -n2 1 -o result.txt $ cat result.txt The result is 5.0 $ python task_with_error_handling_step1.py -n1 5 -n2 0 -o result.txt Traceback (most recent call last): File "task_with_error_handling_step1.py", line 20, in <module> main(args.n1, args.n2, args.output) File "task_with_error_handling_step1.py", line 6, in main result = number / other_number ZeroDivisionError: division by zero $ cat result.txt Create the task_with_error_handling_step4.py file: import logging import sys import logging LOG_FORMAT = '%(asctime)s %(name)s %(levelname)s %(message)s' LOG_LEVEL = logging.DEBUG def main(number, other_number, output): logging.info(f'Dividing {number} between {other_number}') result = number / other_number print(f'The result is {result}', file=output) if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('-n1', type=int, help='A number', default=1) parser.add_argument('-n2', type=int, help='Another number', default=1) parser.add_argument('-o', dest='output', type=argparse.FileType('w'), help='output file', default=sys.stdout) parser.add_argument('-l', dest='log', type=str, help='log file', default=None) args = parser.parse_args() if args.log: logging.basicConfig(format=LOG_FORMAT, filename=args.log, level=LOG_LEVEL) else: logging.basicConfig(format=LOG_FORMAT, level=LOG_LEVEL) try: main(args.n1, args.n2, args.output) except Exception as exc: logging.exception("Error running task") exit(1) Run it to check that it displays the proper INFO and ERROR log and that it stores it on the log file: $ python3 task_with_error_handling_step4.py -n1 5 -n2 0 2018-05-19 14:25:28,849 root INFO Dividing 5 between 0 2018-05-19 14:25:28,849 root ERROR division by zero Traceback (most recent call last): File "task_with_error_handling_step4.py", line 31, in <module> main(args.n1, args.n2, args.output) File "task_with_error_handling_step4.py", line 10, in main result = number / other_number ZeroDivisionError: division by zero $ python3 task_with_error_handling_step4.py -n1 5 -n2 0 -l error.log $ python3 task_with_error_handling_step4.py -n1 5 -n2 0 -l error.log $ cat error.log 2018-05-19 14:26:15,376 root INFO Dividing 5 between 0 2018-05-19 14:26:15,376 root ERROR division by zero Traceback (most recent call last): File "task_with_error_handling_step4.py", line 33, in <module> main(args.n1, args.n2, args.output) File "task_with_error_handling_step4.py", line 11, in main result = number / other_number ZeroDivisionError: division by zero 2018-05-19 14:26:19,960 root INFO Dividing 5 between 0 2018-05-19 14:26:19,961 root ERROR division by zero Traceback (most recent call last): File "task_with_error_handling_step4.py", line 33, in <module> main(args.n1, args.n2, args.output) File "task_with_error_handling_step4.py", line 11, in main result = number / other_number ZeroDivisionError: division by zero How it works... To properly capture any unexpected exceptions, the main function should be wrapped into a try-except block, as done in Step 4 in the How to do it… section. Compare this to how Step 1 is not wrapping the code: try: main(...) except Exception as exc: # Something went wrong logging.exception("Error running task") exit(1) The extra step to exit with status 1 with the exit(1) call informs the operating system that something went wrong with our script. The logging module allows us to log. Note the basic configuration, which includes an optional file to store the logs, the format, and the level of the logs to display. Creating logs is easy. You can do this by making a call to the method logging.<logging level>, (where logging level is debug, info, and so on). logging.exception() is a special case that will create an ERROR log, but it will also include information about the exception, such as the stack trace. Remember to check logs to discover errors. A useful reminder is to add a note on the results file, like this: try: main(args.n1, args.n2, args.output) except Exception as exc: logging.exception(exc) print('There has been an error. Check the logs', file=args.output) In this article, we saw how to define and design a task so that no human intervention is needed to run it. We learned how to use cron for automating a task. We further presented an automated task that will safely store unexpected behaviors in a log file that can be checked afterward. If you found this post useful, do check out the book, Python Automation Cookbook to develop a clear understanding of how to automate your business processes using Python. This includes detecting opportunities by scraping the web, analyzing information to generate automatic spreadsheets reports with graphs, and communicating with automatically generated emails. Write your first Gradle build script to start automating your project [Tutorial] Ansible 2 for automating networking tasks on Google Cloud Platform [Tutorial] Automating OpenStack Networking and Security with Ansible 2 [Tutorial]
Read more
  • 0
  • 0
  • 16505

article-image-setting-environment-aspnet-mvc-6
Packt
02 Nov 2016
9 min read
Save for later

Setting Up the Environment for ASP.NET MVC 6

Packt
02 Nov 2016
9 min read
In this article by Mugilan TS Raghupathi author of the book Learning ASP.NET Core MVC Programming explains the setup for getting started with programming in ASP.NET MVC 6. In any development project, it is vital to set up the right kind of development environment so that you can concentrate on the developing the solution rather than solving the environment issues or configuration problems. With respect to .NET, Visual Studio is the de-facto standard IDE (Integrated Development Environment) for building web applications in .NET. In this article, you'll be learning about the following topics: Purpose of IDE Different offerings of Visual Studio Installation of Visual Studio Community 2015 Creating your first ASP.NET MVC 5 project and project structure (For more resources related to this topic, see here.) Purpose of IDE First of all, let us see why we need an IDE, when you can type the code in Notepad, compile, and execute it. When you develop a web application, you might need the following things for you to be productive: Code editor: This is the text editor where you type your code. Your code-editor should be able to recognize different constructs such as the if condition, for loop of your programming language. In Visual Studio, all of your keywords would be highlighted in blue color. Intellisense: Intellisense is a context aware code-completion feature available in most of the modern IDEs including Visual Studio. One such example is, when you type a dot after an object, this Intellisense feature lists out all the methods available on the object. This helps the developers to write code faster and easier. Build/Publish: It would be helpful if you could build or publish the application using a single click or single command. Visual Studio provides several options out of the box to build a separate project or to build the complete solution at a single click. This makes the build and deployment of your application easier. Templates: Depending on the type of the application, you might have to create different folders and files along with the boilerplate code. So, it'll be very helpful if your IDE supports the creation of different kinds of templates. Visual Studio generates different kinds of templates with the code for ASP.Net Web Forms, MVC, and Web API to get you up and running. Ease of addition of items: Your IDE should allow you to add different kinds of items with ease. For example, you should be able to add an XML file without any issues. And if there is any problem with the structure of your XML file, it should be able to highlight the issue along with the information and help you to fix the issues. Visual Studio offerings There are different versions of Visual Studio 2015 available to satisfy the various needs of the developers/organizations. Primarily, there are four versions of Visual Studio 2015: Visual Studio Community Visual Studio Professional Visual Studio Enterprise Visual Studio Test Professional System requirements Visual Studio can be installed on computers installed with Operation System Windows 7 Service Pack1 and above. You can get to know the complete list of requirements from the following URL: https://www.visualstudio.com/en-us/downloads/visual-studio-2015-system-requirements-vs.aspx Visual Studio Community 2015 This is a fully featured IDE available for building desktops, web applications, and cloud services. It is available free of cost for individual users. You can download Visual Studio Community from the following URL: https://www.visualstudio.com/en-us/products/visual-studio-community-vs.aspx Throughout this book, we will be using the Visual Studio Community version for development as it is available free of cost to individual developers. Visual Studio Professional As the name implies, Visual Studio Professional is targeted at professional developers which contains features such as Code Lens for improving your team's productivity. It also has features for greater collaboration within the team. Visual Studio Enterprise Visual Studio Enterprise is a full blown version of Visual Studio with a complete set of features for collaboration, including a team foundation server, modeling, and testing. Visual Studio Test Professional Visual Studio Test Professional is primarily aimed for the testing team or the people who are involved in the testing which might include developers. In any software development methodology—either the waterfall model or agile—developers need to execute the development suite test cases for the code they are developing. Installation of Visual Studio Community Follow the given steps to install Visual Studio Community 2015: Visit the following link to download Visual Studio Community 2015: https://www.visualstudio.com/en-us/products/visual-studio-community-vs.aspx Click on the Download Community 2015 button. Save the file in a folder where you can retrieve it easily later: Run the downloaded executable file: Click on Run and the following screen will appear: There are two types of installation—default and custom installation. Default installation installs the most commonly used features and this will cover most of the use cases of the developer. Custom installation helps you to customize the components that you want to get installed, such as the following: Click on the Install button after selecting the installation type. Depending on your memory and processor speed, it will take 1 to 2 hours to install. Once all the components are installed, you will see the following Setup completed screen: Installation of ASP.NET 5 When we install the Visual Studio Community 2015 edition, ASP.NET 5 will not have been installed by default. As the ASP.NET MVC 6 application runs on top of ASP.NET 5, we need to install ASP.NET 5. There are couple of ways to install ASP.NET 5: Get ASP.NET 5 from https://get.asp.net/ Another option is to install from the New Project template in Visual Studio This option is bit easier as you don't need to search and install. The following are the detailed steps: Create a new project by selecting File | New Project or using the shortcut Ctrl + Shift + N: Select ASP.NET Web Application and enter the project name and click on OK: The following window will appear to select the template. Select the Get ASP.NET 5 RC option as shown in the following screenshot: When you click on OK in the preceding screen, the following window will appear: When you click on the Run or Save button in the preceding dialog, you will get the following screen asking for ASP.NET 5 Setup. Select the checkbox, I agree to the license terms and conditions and click on the Install button: Installation of ASP.NET 5 might take couple of hours and once it is completed you'll get the following screen: During the process of installation of ASP.NET 5 RC1 Update 1, it might ask you to close the Visual Studio. If asked, please do so. Project structure in ASP.Net 5 application Once the ASP.NET 5 RC1 is successfully installed, open the Visual Studio and create a new project and select the ASP.NET 5 Web Application as shown in the following screenshot: A new project will be created and the structure will be like the following: File-based project Whenever you add a file or folder in your file system (inside of our ASP.NET 5 project folder), the changes will be automatically reflected in your project structure. Support for full .NET and .NET core You could see a couple of references in the preceding project: DNX 4.5.1 and DNX Core 5.0. DNX 4.5.1 provides functionalities of full-blown .NET whereas DNX Core 5.0 supports only the core functionalities—which would be used if you are deploying the application across cross-platforms such as Apple OS X, Linux. The development and deployment of an ASP.NET MVC 6 application on a Linux machine will be explained in the book. The Project.json package Usually in an ASP.NET web application, we would be having the assemblies as references and the list of references in a C# project file. But in an ASP.NET 5 application, we have a JSON file by the name of Project.json, which will contain all the necessary configuration with all its .NET dependencies in the form of NuGet packages. This makes dependency management easier. NuGet is a package manager provided by Microsoft, which makes the package installation and uninstallation easier. Prior to NuGet, all the dependencies had to be installed manually. The dependencies section identifies the list of dependent packages available for the application. The frameworks section informs about the frameworks being supported for the application. The scripts section identifies the script to be executed during the build process of the application. Include and exclude properties can be used in any section to include or exclude any item. Controllers This folder contains all of your controller files. Controllers are responsible for handling the requests and communicating the models and generating the views for the same. Models All of your classes representing the domain data will be present in this folder. Views Views are files which contain your frontend components and are presented to the end users of the application. This folder contains all of your Razor View files. Migrations Any database-related migrations will be available in this folder. Database migrations are the C# files which contain the history of any database changes done through an Entity Framework (an ORM framework). This will be explained in detail in the book. The wwwroot folder This folder acts as a root folder and it is the ideal container to place all of your static files such as CSS and JavaScript files. All the files which are placed in wwwroot folder can be directly accessed from the path without going through the controller. Other files The appsettings.json file is the config file where you can configure application level settings. Bower, npm (Node Package Manager), and gulpfile.js are client-side technologies which are supported by ASP.NET 5 applications. Summary In this article, you have learnt about the offerings in Visual Studio. Step-by-step instructions are provided for the installation of the Visual Studio Community version—which is freely available for individual developers. We have also discussed the new project structure of the ASP.Net 5 application and the changes when compared to the previous versions. In this book, we are going to discuss the controllers and their roles and functionalities. We'll also build a controller and associated action methods and see how it works. Resources for Article: Further resources on this subject: Designing your very own ASP.NET MVC Application [article] Debugging Your .NET Application [article] Using ASP.NET Controls in SharePoint [article]
Read more
  • 0
  • 0
  • 16502

article-image-announcing-docker-enterprise-3-0-public-beta
Savia Lobo
02 May 2019
3 min read
Save for later

Announcing Docker Enterprise 3.0 Public Beta!

Savia Lobo
02 May 2019
3 min read
Update: On July 22, 2019, the Docker team announced that the Docker Enterprise 3.0 will be generally available. He also added that more than 2,000 people have tried the Docker Enterprise 3.0 public beta program On April 24, the team at Docker announced Docker Enterprise 3.0, an end-to-end container platform that enables developers to quickly build and share any type of application (from legacy to cloud-native) and securely run them anywhere, from hybrid cloud to the edge. It is now available in Public Beta Docker Enterprise 3.0 delivers new desktop capabilities, advanced development productivity tools, a simplified and secure Kubernetes stack, and a managed service option to make Docker Enterprise 3.0 the platform for digital transformation. Jay Lyman, the Principal Analyst for 451 Research, “Docker’s new Enterprise 3.0 promises to automate the 'development to production' experience with new tooling that aims to reduce the friction between dev and ops teams.” What can you do with the new Docker Enterprise 3.0? Integrated Docker Desktop Enterprise Docker Desktop Enterprise provides a consistent development-to-production experience with a set of automation tools. This makes it possible to start with the developer desktop, deliver an integrated and secure image registry with access to the Hub ecosystem, and then deploy to an enterprise-ready and Kubernetes-conformant environment. Docker Kubernetes Services (DKS) can simplify the scaling and deployment of applications Compatible with Docker Compose, Kubernetes YAML and Helm charts, DKS provides an automated and repeatable way to install, configure, manage and scale Kubernetes-based applications across hybrid and multi-cloud. DKS includes enhanced security, access controls, and automated lifecycle management bringing a new level of security to Kubernetes that integrates seamlessly with the Docker Enterprise platform. Customers will also have the option to use Docker Swarm Services (DSS) as part of the platform’s orchestration services. Docker Applications for high-velocity innovation Docker Applications are based on the CNAB open standard. It removes the friction between Dev and Ops by enabling teams to collaborate on an application by defining a group of related containers that work together to form an application. It also eliminates the configuration overhead by integrating and automating the creation of the Docker Compose and Kubernetes YAML files, Helm charts, etc. It also includes Application Templates, Application Designer and Version Packs, using which Docker Applications makes it possible for flexible deployment across different environments, delivering on the “code once, deploy anywhere” promise. With the announcement of Docker Enterprise 3.0, Docker also introduced Docker Enterprise-as-a-service - a fully-managed service on-premise or in the cloud. To know more about this news in detail, head over to Docker’s official announcement. DockerHub database breach exposes 190K customer data including tokens for GitHub and Bitbucket repositories Are Debian and Docker slowly losing popularity? Creating a Continuous Integration commit pipeline using Docker [Tutorial]
Read more
  • 0
  • 0
  • 16495

article-image-introduction-functional-programming-php
Packt
30 Dec 2016
12 min read
Save for later

Introduction to Functional Programming in PHP

Packt
30 Dec 2016
12 min read
This article by Gilles Crettenand, author of the book Functional PHP, covers some of the concepts explained in book in a concise manner. We will look at the following: Declarative programming Functions Recursion Composing functions Benefits of functional programming (For more resources related to this topic, see here.) Functional programming has gained a lot of traction in the last few years. Various big tech companies started using functional languages, for example: Twitter on Scala (http://www.artima.com/scalazine/articles/twitter_on_scala.html) WhatsApp being written in Erlang (http://www.fastcompany.com/3026758/inside-erlang-the-rare-programming-language-behind-whatsapps-success) Facebook using Haskell (https://code.facebook.com/posts/302060973291128/open-sourcing-haxl-a-library-for-haskell/1) There is some really wonderful and successful work done on functional languages that compile to JavaScript—the Elm and PureScript languages to name a few. There are efforts to create new languages that either extend or compile to some more traditional languages, such as Hy and Coconut for Python. Even Apple's new language for iOS development, Swift, has multiple concepts from functional programming integrated into its core. However, this article is not about using a new language or learning a whole new technology, it is about benefiting from functional techniques without having to change our whole stack. By just applying some principles to our everyday PHP, we can greatly improve the quality of our life and code. Declarative programming Functional programming is also sometimes called declarative programming in contrast to imperative programming. This languages are called programming paradigms. Object-oriented programming is also a paradigm, but it is the one that is strongly tied to the imperative programming. Instead of explaining the difference at length, let's demonstrate with an example. First an imperative programming using PHP: <?php function getPrices(array $products) { // let's assume the $products parameter is an array of products. $prices = []; foreach($products as $p) { if($p->stock > 0) { $prices[] = $p->price; } } return $prices; } Now let's see how you can do the same with SQL which, among other things, is a declarative language: SELECT price FROM products WHERE stock > 0; Notice the difference? In the first example, you tell the computer what to do step by step, taking care of storing intermediary results yourself. In the second example, you only describe what you want and it will then be the role of the database engine to return the results. In a way, functional programming looks a lot more like SQL than the PHP code we just saw. Functions Functional programming, as it names suggests, revolves around functions. In order to apply functional techniques effectively, a language must support functions as a first-class citizen or first functions. This means that functions are considered like any other value. They can be created and passed around as parameters to other functions and they can be used as return values. Luckily, PHP is such a language, you can create functions at will, pass them around as parameters, and even return them. Another fundamental concept is the idea of a pure function or, in other words, functions that only use their input to produce a result. This means that you cannot use any kind of external or internal state to perform your computation. Another way to look at this is from the angle of dependencies. All of the dependencies of your functions need to be clearly declared in the signature. This helps a lot when someone tries to understand how and what your function is doing. Higher-order functions PHP functions can take functions as parameters and return functions as return values. A function that does either of those is called a higher-order function. It is as simple as that. There are a few of those that are commonly used in any functional code base. Map The map, or array_map, method in PHP is a higher order function that applies a given callback to all the elements of a collection. The return value is a collection in the same order. A simple example is as follows: <?php function square(int $x): int { return $x * $x; } $squared = array_map('square', [1, 2, 3, 4]); /* $squared contains [1, 4, 9, 16] */ Filter The filter, or array_filter, method in PHP is a higher order function that keeps only certain elements of a collection based on a Boolean predicate. The return value is a collection that will only contain elements returning true for the predicate function. A simple example is as follows: <?php function odd(int $a): bool { return $a % 2 === 1; } $filtered = array_filter([1, 2, 3, 4, 5, 6], 'odd'); /* $filtered contains [1, 3, 5] */ Fold or reduce Folding refers to a process where you reduce a collection to a return value using a combining function. Depending on the language, this operation can have multiple names—fold, reduce, accumulate, aggregate, or compress. As with other functions related to arrays, the PHP version is the array_reduce function. You may be familiar with the array_sum function, which calculates the sum of all the values in an array. This is in fact a fold and can be easily written using the array_reduce function: <?php function sum(int $carry, int $i): int { return $carry + $i; } $summed = array_reduce([1, 2, 3, 4], 'sum', 0); /* $summed contains 10 */ You don't necessarily need to use the elements to produce a value. You could, for example, implement a naive replacement for the in_array method using fold: <?php function in_array2(string $needle, array $haystack): bool { $search = function(bool $contains, string $i) use ($needle) : bool { return $needle == $i ? true : $contains; }; return array_reduce($haystack, $search, false); } var_dump(in_array2('two', ['one', 'two', 'three'])); // bool(true) Recursion In the academic sense, recursion is the idea of dividing a problem into smaller instances of the same problem. For example, if you need to scan a directory recursively, you first scan the starting directory and then scan its children and grandchildren. Most programming languages support recursion by allowing a function to call itself. This idea is often what is described as being recursion. Let's see how we can scan a directory using recursion: <?php function searchDirectory($dir, $accumulator = []) { foreach (scandir($dir) as $path) { // Ignore hidden files, current directory and parent directory if(strpos($path, '.') === 0) { continue; } $fullPath = $dir.DIRECTORY_SEPARATOR.$path; if(is_dir($fullPath)) { $accumulator = searchDirectory($path, $accumulator); } else { $accumulator[] = $fullPath; } } return $accumulator; } We start by using the scandir method to obtain all files and directories. Then, if we encounter a child directory, we call the function on it again. Otherwise, we simply add the file to the accumulator. This function is recursive because it calls itself. You can write this using control structures, but as you don't know in advance what the depth of your folder hierarchy is, the code will probably be a lot messier and harder to understand. Trampolines Each time you call a function, information gets added to the memory. This can be an issue when doing recursion as you only have a limited amount of memory available. Until the last recursive call, memory usage will continue growing and a stack overflow can happen. The only way we can avoid stack growth is to return a value instead of calling a new function. This value can hold the information that is needed to perform a new function call, which will continue the computation. This also means that we need some cooperation from the caller of the function. This helpful caller is called a trampoline and here is how it works: The trampoline calls our f function Instead of making a recursive call, the f function returns the next call encapsulated inside a data structure with all the arguments The trampoline extracts the information and performs a new call to the f function Repeat the two last steps until the f function returns a real value The trampoline receives a value and returns those to the real caller If you want to use trampolines in your own project, I invite you to install the following library, which offers some helpers as compared to our crude implementation: composer require functional-php/trampoline Here is an example taken from the documentation: <?php use FunctionalPHPTrampoline as t; function factorial($n, $acc = 1) { return $n <= 1 ? $acc : tbounce('factorial', $n - 1, $n * $acc); }; Composing functions Previously, we discussed the idea of building blocks and small pure functions. But, so far, we haven't even hinted at how those can be used to build something bigger. What good is a building block if you cannot use it? The answer partly lies in function's composition. As it is often the case in functional programming, the concept is borrowed from mathematics. If you have two functions f and g, you can create a third function by composing them. The usual notation in mathematics is (f   g)(x), which is equivalent to calling them one after the other as f(g(x)). You can compose any two given functions really easily with PHP using a wrapper function. Say, you want to display a title in all caps and only safe HTML characters: <?php function safe_title2(string $s) { return strtoupper(htmlspecialchars($s)); } Functional libraries for PHP often come with a helper that can create new functions out of multiple subparts easily. For example, using Lars Strojny's Functional PHP library, you can write the following: <?php $titles4 = array_map(compose('htmlspecialchars', 'strtoupper', 'trim'), $titles); Partial application You might want to set some parameters of a function but leave some of them unassigned for later. For example, we might want to create a function that returns an excerpt of a blog post. The dedicated term for setting such a value is "to bind a parameter" or "bind an argument". The process itself is called partial application and the new function is set to be partially applied. The Functional PHP library also comes with helpers to partially apply a function: <?php use function Functionalpartial_right; use function Functionalpartial_left; use function Functionalpartial_any; use const Functional…; $excerpt = partial_right('substr', 0, 5); echo $excerpt('Lorem ipsum dolor si amet.'); // Lorem $fixed_string = partial_left('substr', 'Lorem ipsum dolor si amet.'); echo $fixed_string(6, 5); // ipsum $start_placeholder = partial_any('substr', 'Lorem ipsum dolor si amet.', …(), 5); echo $start_placeholder(12); // dolor Currying Currying is often used as a synonym to partial application. Although both concepts allows us to bind some parameters of a function, the core ideas are a bit different. The idea behind currying is to transform a function that takes multiple arguments into a sequence of functions that take one argument. As this might be a bit hard to grasp, let's try to curry the substr method. The result is called a curryied function. Again, a helper to create such functions is available in the Functional PHP library: <?php use function Functionalcurry; function add($a, $b, $c, $d) { return $a + $b + $c + $d; } $curryedAdd = curry('add'); $add10 = $curryedAdd(10); $add15 = $add10(5); $add42 = $add15(27); $add42(10); // -> 52 Benefits of functional programing As we just saw, the functional world is moving, adoption by the enterprise world is growing, and even new imperative languages are taking inspiration from functional languages. But why it is so? Reduce the cognitive burden on developers You've probably often read or heard that a programmer should not be interrupted because even a small interruption can lead to literally tens of minutes being lost. This is partly due to the cognitive burden or, in other words, the amount of information you have to keep in memory in order to understand the problem or function at hand. By forcing you to clearly state the dependencies of your functions and avoiding using any kind of external data, functional programming helps a lot in writing self-contained code that can be readily understood and thus reduces cognitive burden a lot. Software with fewer bugs We just saw that functional programming reduces the cognitive burden and makes your code easier to reason about. This is already a huge win when it comes to bugs because it will allow you to spot issues quickly as you will spend less time understanding how the code works to focus on what it should do. But all the benefits we've just seen have another advantage. They make testing a lot easier too! If you have a pure function and you test it with a given set of values, you have the absolute certitude that it will always return exactly the same thing in production. Easier refactoring Refactoring is never easy. However, since the only inputs of a pure function are its parameters and its sole output is the returned value, things are simpler. If you're refactored function continues to return the same output for a given input, you can have the guarantee that your software will continue to work. You cannot forget to set a few state somewhere in an object because your function are side-effect free. Enforcing good practices This article and the related book are the proof that functional programming is more about the way we do things instead of a particular language. You can use functional techniques in nearly any language that has functions. Your language still needs to have certain properties, but not that much. I like to talk about having a functional mindset. If it is so, why do companies move to functional languages? Because those languages enforce the best practice that we will learn in this book. In PHP, you will have to always remember to use functional techniques. In Haskell, you cannot do anything else, the language forces you to write pure functions. Summary This small article is by no mean a complete introduction to functional programming, this is what the Functional PHP book is for. I however hope I convinced you it is a set of techniques worth learning. We only brushed the surface here, all topics are covered more in depth in the various chapters. You will also learn about more advanced topics like the following: Functors, applicatives, and Monads Type systems Pattern matching Functional reactive programming Property-based testing Parallel execution of functional code There is also a whole chapter about using functional programming in conjunction with various frameworks like Symfony, Drupal, Laraval, and Wordpress. Resources for Article: Further resources on this subject: Understanding PHP basics [article] Developing Middleware [article] Continuous Integration [article]
Read more
  • 0
  • 0
  • 16495
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-splunks-input-methods-and-data-feeds
Packt
30 May 2016
13 min read
Save for later

Splunk's Input Methods and Data Feeds

Packt
30 May 2016
13 min read
This article being crafted by Ashish Kumar Yadav has been picked from Advanced Splunk book. This book helps you to get in touch with a great data science tool named Splunk. The big data world is an ever expanding forte and it is easy to get lost in the enormousness of machine data available at your bay. The Advanced Splunk book will definitely provide you with the necessary resources and the trail to get you at the other end of the machine data. While the book emphasizes on Splunk, it also discusses its close association with Python language and tools like R and Tableau that are needed for better analytics and visualization purpose. (For more resources related to this topic, see here.) Splunk supports numerous ways to ingest data on its server. Any data generated from a human-readable machine from various sources can be uploaded using data input methods such as files, directories, TCP/UDP scripts can be indexed on the Splunk Enterprise server and analytics and insights can be derived from them. Data sources Uploading data on Splunk is one of the most important parts of analytics and visualizations of data. If data is not properly parsed, timestamped, or broken into events, then it can be difficult to analyze and get proper insight on the data. Splunk can be used to analyze and visualize data ranging from various domains, such as IT security, networking, mobile devices, telecom infrastructure, media and entertainment devices, storage devices, and many more. The machine generated data from different sources can be of different formats and types, and hence, it is very important to parse data in the best format to get the required insight from it. Splunk supports machine-generated data of various types and structures, and the following screenshot shows the common types of data that comes with an inbuilt support in Splunk Enterprise. The most important point of these sources is that if the data source is from the following list, then the preconfigured settings and configurations already stored in Splunk Enterprise are applied. This helps in getting the data parsed in the best and most suitable formats of events and timestamps to enable faster searching, analytics, and better visualization. The following screenshot enlists common data sources supported by Splunk Enterprise: Structured data Machine-generated data is generally structured, and in some cases, it can be semistructured. Some of the types of structured data are EXtensible Markup Language (XML), JavaScript Object Notation (JSON), comma-separated values (CSV), tab-separated values (TSV), and pipe-separated values (PSV). Any format of structured data can be uploaded on Splunk. However, if the data is from any of the preceding formats, then predefined settings and configuration can be applied directly by choosing the respective source type while uploading the data or by configuring it in the inputs.conf file. The preconfigured settings for any of the preceding structured data is very generic. Many times, it happens that the machine logs are customized structured logs; in that case, additional settings will be required to parse the data. For example, there are various types of XML. We have listed two types here. In the first type, there is the <note> tag at the start and </note> at the end, and in between, there are parameters are their values. In the second type, there are two levels of hierarchies. XML has the <library> tag along with the <book> tag. Between the <book> and </book> tags, we have parameters and their values. The first type is as follows: <note> <to>Jack</to> <from>Micheal</from> <heading>Test XML Format</heading> <body>This is one of the format of XML!</body> </note> The second type is shown in the following code snippet: <Library> <book category="Technical"> <title lang="en">Splunk Basic</title> <author>Jack Thomas</author> <year>2007</year> <price>520.00</price> </book> <book category="Story"> <title lang="en">Jungle Book</title> <author>Rudyard Kiplin</author> <year>1984</year> <price>50.50</price> </book> </Library > Similarly, there can be many types of customized XML scripts generated by machines. To parse different types of structured data, Splunk Enterprise comes with inbuilt settings and configuration defined for the source it comes from. Let's say, for example, that the data received from a web server's logs are also structured logs and it can be in either a JSON, CSV, or simple text format. So, depending on the specific sources, Splunk tries to make the job of the user easier by providing the best settings and configuration for many common sources of data. Some of the most common sources of data are data from web servers, databases, operation systems, network security, and various other applications and services. Web and cloud services The most commonly used web servers are Apache and Microsoft IIS. All Linux-based web services are hosted on Apache servers, and all Windows-based web services on IIS. The logs generated from Linux web servers are simple plain text files, whereas the log files of Microsoft IIS can be in a W3C-extended log file format or it can be stored in a database in the ODBC log file format as well. Cloud services such as Amazon AWS, S3, and Microsoft Azure can be directly connected and configured according to the forwarded data on Splunk Enterprise. The Splunk app store has many technology add-ons that can be used to create data inputs to send data from cloud services to Splunk Enterprise. So, when uploading log files from web services, such as Apache, Splunk provides a preconfigured source type that parses data in the best format for it to be available for visualization. Suppose that the user wants to upload apache error logs on the Splunk server, and then the user chooses apache_error from the Web category of Source type, as shown in the following screenshot: On choosing this option, the following set of configuration is applied on the data to be uploaded: The event break is configured to be on the regular expression pattern ^[ The events in the log files will be broken into a single event on occurrence of [ at every start of a line (^) The timestamp is to be identified in the [%A %B %d %T %Y] format, where: %A is the day of week; for example, Monday %B is the month; for example, January %d is the day of the month; for example, 1 %T is the time that has to be in the %H : %M : %S format %Y is the year; for example, 2016 Various other settings such as maxDist that allows the amount of variance of logs can vary from the one specified in the source type and other settings such as category, descriptions, and others. Any new settings required as per our needs can be added using the New Settings option available in the section below Settings. After making the changes, either the settings can be saved as a new source type or the existing source type can be updated with the new settings. IT operations and network security Splunk Enterprise has many applications on the Splunk app store that specifically target IT operations and network security. Splunk is a widely accepted tool for intrusion detection, network and information security, fraud and theft detection, and user behaviour analytics and compliance. A Splunk Enterprise application provides inbuilt support for the Cisco Adaptive Security Appliance (ASA) firewall, Cisco SYSLOG, Call Detail Records (CDR) logs, and one of the most popular intrusion detection application, Snort. The Splunk app store has many technology add-ons to get data from various security devices such as firewall, routers, DMZ, and others. The app store also has the Splunk application that shows graphical insights and analytics over the data uploaded from various IT and security devices. Databases The Splunk Enterprise application has inbuilt support for databases such as MySQL, Oracle Syslog, and IBM DB2. Apart from this, there are technology add-ons on the Splunk app store to fetch data from the Oracle database and the MySQL database. These technology add-ons can be used to fetch, parse, and upload data from the respective database to the Splunk Enterprise server. There can be various types of data available from one source; let's take MySQL as an example. There can be error log data, query logging data, MySQL server health and status log data, or MySQL data stored in the form of databases and tables. This concludes that there can be a huge variety of data generated from the same source. Hence, Splunk provides support for all types of data generated from a source. We have inbuilt configuration for MySQL error logs, MySQL slow queries, and MySQL database logs that have been already defined for easier input configuration of data generated from respective sources. Application and operating system data The Splunk input source type has inbuilt configuration available for Linux dmesg, syslog, security logs, and various other logs available from the Linux operating system. Apart from the Linux OS, Splunk also provides configuration settings for data input of logs from Windows and iOS systems. It also provides default settings for Log4j-based logging for Java, PHP, and .NET enterprise applications. Splunk also supports lots of other applications' data such as Ruby on Rails, Catalina, WebSphere, and others. Splunk Enterprise provides predefined configuration for various applications, databases, OSes, and cloud and virtual environments to enrich the respective data with better parsing and breaking into events, thus deriving at better insight from the available data. The applications' source whose settings are not available in Splunk Enterprise can alternatively have apps or add-ons on the app store. Data input methods Splunk Enterprise supports data input through numerous methods. Data can be sent on Splunk via files and directories, TCP, UDP, scripts or using universal forwarders. Files and directories Splunk Enterprise provides an easy interface to the uploaded data via files and directories. Files can be directly uploaded from the Splunk web interface manually or it can be configured to monitor the file for changes in content, and the new data will be uploaded on Splunk whenever it is written in the file. Splunk can also be configured to upload multiple files by either uploading all the files in one shot or the directory can be monitored for any new files, and the data will get indexed on Splunk whenever it arrives in the directory. Any data format from any sources that are in a human-readable format, that is, no propriety tools are needed to read the data, can be uploaded on Splunk. Splunk Enterprise even supports uploading in a compressed file format such as (.zip and .tar.gz), which has multiple log files in a compressed format. Network sources Splunk supports both TCP and UDP to get data on Splunk from network sources. It can monitor any network port for incoming data and then can index it on Splunk. Generally, in case of data from network sources, it is recommended that you use a Universal forwarder to send data on Splunk, as Universal forwarder buffers the data in case of any issues on the Splunk server to avoid data loss. Windows data Splunk Enterprise provides direct configuration to access data from a Windows system. It supports both local as well as remote collections of various types and sources from a Windows system. Splunk has predefined input methods and settings to parse event log, performance monitoring report, registry information, hosts, networks and print monitoring of a local as well as remote Windows system. So, data from different sources of different formats can be sent to Splunk using various input methods as per the requirement and suitability of the data and source. New data inputs can also be created using Splunk apps or technology add-ons available on the Splunk app store. Adding data to Splunk—new interfaces Splunk Enterprises introduced new interfaces to accept data that is compatible with constrained resources and lightweight devices for Internet of Things. Splunk Enterprise version 6.3 supports HTTP Event Collector and REST and JSON APIs for data collection on Splunk. HTTP Event Collector is a very useful interface that can be used to send data without using any forwarder from your existing application to the Splunk Enterprise server. HTTP APIs are available in .NET, Java, Python, and almost all the programming languages. So, forwarding data from your existing application that is based on a specific programming language becomes a cake walk. Let's take an example, say, you are a developer of an Android application, and you want to know what all features the user uses that are the pain areas or problem-causing screens. You also want to know the usage pattern of your application. So, in the code of your Android application, you can use REST APIs to forward the logging data on the Splunk Enterprise server. The only important point to note here is that the data needs to be sent in a JSON payload envelope. The advantage of using HTTP Event Collector is that without using any third-party tools or any configuration, the data can be sent on Splunk and we can easily derive insights, analytics, and visualizations from it. HTTP Event Collector and configuration HTTP Event Collector can be used when you configure it from the Splunk Web console, and the event data from HTTP can be indexed in Splunk using the REST API. HTTP Event Collector HTTP Event Collector (EC) provides an API with an endpoint that can be used to send log data from applications into Splunk Enterprise. Splunk HTTP Event Collector supports both HTTP and HTTPS for secure connections. The following are the features of HTTP Event Collector, which make's adding data on Splunk Enterprise easier: It is very lightweight is terms of memory and resource usage, and thus can be used in resources constrained to lightweight devices as well. Events can be sent directly from anywhere such as web servers, mobile devices, and IoT without any need of configuration or installation of forwarders. It is a token-based JSON API that doesn't require you to save user credentials in the code or in the application settings. The authentication is handled by tokens used in the API. It is easy to configure EC from the Splunk Web console, enable HTTP EC, and define the token. After this, you are ready to accept data on Splunk Enterprise. It supports both HTTP and HTTPS, and hence it is very secure. It supports GZIP compression and batch processing. HTTP EC is highly scalable as it can be used in a distributed environment as well as with a load balancer to crunch and index millions of events per second. Summary In this article, we walked through various data input methods along with various data sources supported by Splunk. We also looked at HTTP Event Collector, which is a new feature added in Splunk 6.3 for data collection via REST to encourage the usage of Splunk for IoT. The data sources and input methods for Splunk are unlike any generic tool and the HTTP Event Collector is the added advantage compare to other data analytics tools. Resources for Article: Further resources on this subject: The Splunk Interface [article] The Splunk Web Framework [article] Introducing Splunk [article]
Read more
  • 0
  • 0
  • 16491

article-image-implement-long-short-term-memory-lstm-tensorflow
Gebin George
06 Mar 2018
4 min read
Save for later

Implement Long-short Term Memory (LSTM) with TensorFlow

Gebin George
06 Mar 2018
4 min read
[box type="note" align="" class="" width=""]This article is an excerpt from the book, Deep Learning Essentials written by Wei Di, Anurag Bhardwaj, and Jianing Wei. This book will help you get started with the essentials of deep learning and neural network modeling.[/box] In today’s tutorial, we will look at an example of using LSTM in TensorFlow to perform sentiment classification. The input to LSTM will be a sentence or sequence of words. The output of LSTM will be a binary value indicating a positive sentiment with 1 and a negative sentiment with 0. We will use a many-to-one LSTM architecture for this problem since it maps multiple inputs onto a single output. Figure LSTM: Basic cell architecture shows this architecture in more detail. As shown here, the input takes a sequence of word tokens (in this case, a sequence of three words). Each word token is input at a new time step and is input to the hidden state for the corresponding time step. For example, the word Book is input at time step t and is fed to the hidden state ht: Sentiment analysis: To implement this model in TensorFlow, we need to first define a few variables as follows: batch_size = 4 lstm_units = 16 num_classes = 2 max_sequence_length = 4 embedding_dimension = 64 num_iterations = 1000 As shown previously, batch_size dictates how many sequences of tokens we can input in one batch for training. lstm_units represents the total number of LSTM cells in the network. max_sequence_length represents the maximum possible length of a given sequence. Once defined, we now proceed to initialize TensorFlow-specific data structures for input data as follows: import tensorflow as tf labels = tf.placeholder(tf.float32, [batch_size, num_classes]) raw_data = tf.placeholder(tf.int32, [batch_size, max_sequence_length]) Given we are working with word tokens, we would like to represent them using a good feature representation technique. Let us assume the word embedding representation takes a word token and projects it onto an embedding space of dimension, embedding_dimension. The two-dimensional input data containing raw word tokens is now transformed into a three-dimensional word tensor with the added dimension representing the word embedding. We also use pre-computed word embedding, stored in a word_vectors data structure. We initialize the data structures as follows: data = tf.Variable(tf.zeros([batch_size, max_sequence_length, embedding_dimension]),dtype=tf.float32) data = tf.nn.embedding_lookup(word_vectors,raw_data) Now that the input data is ready, we look at defining the LSTM model. As shown previously, we need to create lstm_units of a basic LSTM cell. Since we need to perform a classification at the end, we wrap the LSTM unit with a dropout wrapper. To perform a full temporal pass of the data on the defined network, we unroll the LSTM using a dynamic_rnn routine of TensorFlow. We also initialize a random weight matrix and a constant value of 0.1 as the bias vector, as follows: weight = tf.Variable(tf.truncated_normal([lstm_units, num_classes])) bias = tf.Variable(tf.constant(0.1, shape=[num_classes])) lstm_cell = tf.contrib.rnn.BasicLSTMCell(lstm_units) wrapped_lstm_cell = tf.contrib.rnn.DropoutWrapper(cell=lstm_cell, output_keep_prob=0.8) output, state = tf.nn.dynamic_rnn(wrapped_lstm_cell, data, dtype=tf.float32) Once the output is generated by the dynamic unrolled RNN, we transpose its shape, multiply it by the weight vector, and add a bias vector to it to compute the final prediction value: output = tf.transpose(output, [1, 0, 2]) last = tf.gather(output, int(output.get_shape()[0]) - 1) prediction = (tf.matmul(last, weight) + bias) weight = tf.cast(weight, tf.float64) last = tf.cast(last, tf.float64) bias = tf.cast(bias, tf.float64) Since the initial prediction needs to be refined, we define an objective function with crossentropy to minimize the loss as follows: loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits (logits=prediction, labels=labels)) optimizer = tf.train.AdamOptimizer().minimize(loss) After this sequence of steps, we have a trained, end-to-end LSTM network for sentiment classification of arbitrary length sentences. To summarize, we saw how effectively we can implement LSTM network using TensorFlow. If you are interested to know more, check out this book Deep Learning Essentials which will help you take first steps in training efficient deep learning models and apply them in various practical scenarios.  
Read more
  • 0
  • 0
  • 16484

article-image-flash-game-development-creation-complete-tetris-game
Packt
25 Mar 2011
10 min read
Save for later

Flash Game Development: Creation of a Complete Tetris Game

Packt
25 Mar 2011
10 min read
Tetris features shapes called tetrominoes, geometric shapes composed of four squared blocks connected orthogonally, that fall from the top of the playing field. Once a tetromino touches the ground, it lands and cannot be moved anymore, being part of the ground itself, and a new tetromino falls from the top of the game field, usually a 10x20 tiles vertical rectangle. The player can move the falling tetromino horizontally and rotate by 90 degrees to create a horizontal line of blocks. When a line is created, it disappears and any block above the deleted line falls down. If the stacked tetrominoes reach the top of the game field, it's game over. Defining game design This time I won't talk about the game design itself, since Tetris is a well known game and as you read this article you should be used to dealing with game design. By the way, there is something really important about this game you need to know before you start reading this article. You won't draw anything in the Flash IDE. That is, you won't manually draw tetrominoes, the game field, or any other graphic assets. Everything will be generated on the fly using AS3 drawing methods. Tetris is the best game for learning how to draw with AS3 as it only features blocks, blocks, and only blocks. Moreover, although the game won't include new programming features, its principles make Tetris the hardest game of the entire book. Survive Tetris and you will have the skills to create the next games focusing more on new features and techniques rather than on programming logic. Importing classes and declaring first variables The first thing we need to do, as usual, is set up the project and define the main class and function, as well as preparing the game field. Create a new file (File | New) then from New Document window select Actionscript 3.0. Set its properties as width to 400 px, height to 480 px, background color to #333333 (a dark gray), and frame rate to 30 (quite useless anyway since there aren't animations, but you can add an animated background on your own). Also, define the Document Class as Main and save the file as tetris.fla. Without closing tetris.fla, create a new file and from New Document window select ActionScript 3.0 Class. Save this file as Main.as in the same path you saved tetris.fla. Then write: package { import flash.display.Sprite; import flash.utils.Timer; import flash.events.TimerEvent; import flash.events.KeyboardEvent; public class Main extends Sprite { private const TS_uint=24; private var fieldArray:Array; private var fieldSprite:Sprite; public function Main() { // tetris!! } } } We already know we have to interact with the keyboard to move, drop, and rotate tetrominoes and we have to deal with timers to manage falling delay, so I already imported all needed libraries. Then, there are some declarations to do: private const TS_uint=24; TS is the size, in pixels, of the tiles representing the game field. It's a constant as it won't change its value during the game, and its value is 24. With 20 rows of tiles, the height of the whole game field will be 24x20 = 480 pixels, as tall as the height of our movie. private var fieldArray:Array; fieldArray is the array that will numerically represent the game field. private var fieldSprite:Sprite; fieldSprite is the DisplayObject that will graphically render the game field. Let's use it to add some graphics. Drawing game field background Nobody wants to see an empty black field, so we are going to add some graphics. As said, during the making of this game we won't use any drawn Movie Clip, so every graphic asset will be generated by pure ActionScript. The idea: Draw a set of squares to represent the game field. The development: Add this line to Main function: public function Main() { generateField(); } then write generateField function this way: private function generateField():void { fieldArray = new Array(); fieldSprite=new Sprite(); addChild(fieldSprite); fieldSprite.graphics.lineStyle(0,0x000000); for (var i_uint=0; i<20; i++) { fieldArray[i]=new Array(); for (var j_uint=0; j<10; j++) { fieldArray[i][j]=0; fieldSprite.graphics.beginFill(0x444444); fieldSprite.graphics.drawRect(TS*j,TS*i,TS,TS); fieldSprite.graphics.endFill(); } } } Test the movie and you will see: The 20x10 game field has been rendered on the stage in a lighter gray. I could have used constants to define values like 20 and 10, but I am leaving it to you at the end of the article. Let's see what happened: fieldArray = new Array(); fieldSprite=new Sprite(); addChild(fieldSprite); These lines just construct fieldArray array and fieldSprite DisplayObject, then add it to stage as you have already seen a million times. fieldSprite.graphics.lineStyle(0,0x000000); This line introduces a new world called Graphics class. This class contains a set of methods that will allow you to draw vector shapes on Sprites. lineStyle method sets a line style that you will use for your drawings. It accepts a big list of arguments, but at the moment we'll focus on the first two of them. The first argument is the thickness of the line, in points. I set it to 0 because I wanted it as thin as a hairline, but valid values are 0 to 255. The second argument is the hexadecimal color value of the line, in this case black. Hexadecimal uses sixteen distinct symbols to represent numbers from 0 to 15. Numbers from zero to nine are represented with 0-9 just like the decimal numeral system, while values from ten to fifteen are represented by letters A-F. That's the way it is used in most common paint software and in the web to represent colors. You can create hexadecimal numbers by preceding them with 0x. Also notice that lineStyle method, like all Graphics class methods, isn't applied directly on the DisplayObject itself but as a method of the graphics property. for (var i_uint=0; i<20; i++) { ... } The remaining lines are made by the classical couple of for loops initializing fieldArray array in the same way you already initialized all other array-based games, and drawing the 200 (20x10) rectangles that will form the game field. fieldSprite.graphics.beginFill(0x444444); beginFill method is similar to lineStyle as it sets the fill color that you will use for your drawings. It accepts two arguments, the color of the fill (a dark gray in this case) and the opacity (alpha). Since I did not specify the alpha, it takes the default value of 1 (full opacity). fieldSprite.graphics.drawRect(TS*j,TS*i,TS,TS); With a line and a fill style, we are ready to draw some squares with drawRect method, that draws a rectangle. The four arguments represent respectively the x and y position relative to the registration point of the parent DisplayObject (fieldSprite, that happens to be currently on 0,0 in this case), the width and the height of the rectangle. All the values are to be intended in pixels. fieldSprite.graphics.endFill(); endFill method applies a fill to everything you drew after you called beginFill method. This way we are drawing a square with a TS pixels side for each for iteration. At the end of both loops, we'll have 200 squares on the stage, forming the game field. Drawing a better game field background Tetris background game fields are often represented as a checkerboard, so let's try to obtain the same result. The idea: Once we defined two different colors, we will paint even squares with one color, and odd squares with the other color. The development: We have to modify the way generateField function renders the background: private function generateField():void { var colors_Array=new Array("0x444444","0x555555");"); fieldArray = new Array(); var fieldSprite_Sprite=new Sprite(); addChild(fieldSprite); fieldSprite.graphics.lineStyle(0,0x000000); for (var i_uint=0; i<20; i++) { fieldArray[i]=new Array(); for (var j_uint=0; j<10; j++) { fieldArray[i][j]=0; fieldSprite.graphics.beginFill(colors[(j%2+i%2)%2]); fieldSprite.graphics.drawRect(TS*j,TS*i,TS,TS); fieldSprite.graphics.endFill(); } } } We can define an array of colors and play with modulo operator to fill the squares with alternate colors and make the game field look like a chessboard grid. The core of the script lies in this line: fieldSprite.graphics.beginFill(colors[(j%2+i%2)%2]); that plays with modulo to draw a checkerboard. Test the movie and you will see: Now the game field looks better. Creating the tetrominoes The concept behind the creation of representable tetrominoes is the hardest part of the making of this game. Unlike the previous games you made, such as Snake, that will feature actors of the same width and height (in Snake the head is the same size as the tail), in Tetris every tetromino has its own width and height. Moreover, every tetromino but the square one is not symmetrical, so its size is going to change when the player rotates it. How can we manage a tile-based game with tiles of different width and height? The idea: Since tetrominoes are made by four squares connected orthogonally (that is, forming a right angle), we can split tetrominoes into a set of tiles and include them into an array. The easiest way is to include each tetromino into a 4x4 array, although most of them would fit in smaller arrays, it's good to have a standard array. Something like this: Every tetromino has its own name based on the alphabet letter it reminds, and its own color, according to The Tetris Company (TTC), the company that currently owns the trademark of the game Tetris. Just for your information, TTC sues every Tetris clone whose name somehow is similar to "Tetris", so if you are going to create and market a Tetris clone, you should call it something like "Crazy Bricks" rather than "Tetriz". Anyway, following the previous picture, from left-to-right and from top-to-bottom, the "official" names and colors for tetrominoes are: I—color: cyan (0x00FFFF) T—color: purple (0xAA00FF) L—color: orange (0xFFA500) J—color: blue (0x0000FF) Z—color: red (0xFF0000) S—color: green (0x00FF00) O—color: yellow (0xFFFF00) The development: First, add two new class level variables: private const TS_uint=24; private var fieldArray:Array; private var fieldSprite:Sprite; private var tetrominoes:Array = new Array(); private var colors_Array=new Array(); tetrominoes array is the four-dimensional array containing all tetrominoes information, while colors array will store their colors. Now add a new function call to Main function: public function Main() { generateField(); initTetrominoes(); } initTetrominoes function will initialize tetrominoes-related arrays. private function initTetrominoes():void { // I tetrominoes[0]=[[[0,0,0,0],[1,1,1,1],[0,0,0,0],[0,0,0,0]], [[0,1,0,0],[0,1,0,0],[0,1,0,0],[0,1,0,0]]]; colors[0]=0x00FFFF; // T tetrominoes[1]=[[[0,0,0,0],[1,1,1,0],[0,1,0,0],[0,0,0,0]], [[0,1,0,0],[1,1,0,0],[0,1,0,0],[0,0,0,0]], [[0,1,0,0],[1,1,1,0],[0,0,0,0],[0,0,0,0]], [[0,1,0,0],[0,1,1,0],[0,1,0,0],[0,0,0,0]]]; colors[1]=0x767676; // L tetrominoes[2]=[[[0,0,0,0],[1,1,1,0],[1,0,0,0],[0,0,0,0]], [[1,1,0,0],[0,1,0,0],[0,1,0,0],[0,0,0,0]], [[0,0,1,0],[1,1,1,0],[0,0,0,0],[0,0,0,0]], [[0,1,0,0],[0,1,0,0],[0,1,1,0],[0,0,0,0]]]; colors[2]=0xFFA500; // J tetrominoes[3]=[[[1,0,0,0],[1,1,1,0],[0,0,0,0],[0,0,0,0]], [[0,1,1,0],[0,1,0,0],[0,1,0,0],[0,0,0,0]], [[0,0,0,0],[1,1,1,0],[0,0,1,0],[0,0,0,0]], [[0,1,0,0],[0,1,0,0],[1,1,0,0],[0,0,0,0]]]; colors[3]=0x0000FF; // Z tetrominoes[4]=[[[0,0,0,0],[1,1,0,0],[0,1,1,0],[0,0,0,0]], [[0,0,1,0],[0,1,1,0],[0,1,0,0],[0,0,0,0]]]; colors[4]=0xFF0000; // S tetrominoes[5]=[[[0,0,0,0],[0,1,1,0],[1,1,0,0],[0,0,0,0]], [[0,1,0,0],[0,1,1,0],[0,0,1,0],[0,0,0,0]]]; colors[5]=0x00FF00; // O tetrominoes[6]=[[[0,1,1,0],[0,1,1,0],[0,0,0,0],[0,0,0,0]]]; colors[6]=0xFFFF00; } colors array is easy to understand: it's just an array with the hexadecimal value of each tetromino color. tetrominoes is a four-dimensional array. It's the first time you see such a complex array, but don't worry. It's no more difficult than the two-dimensional arrays you've been dealing with since the creation of Minesweeper. Tetrominoes are coded into the array this way: tetrominoes[n] contains the arrays with all the information about the n-th tetromino. These arrays represent the various rotations, the four rows and the four columns. tetrominoes[n][m] contains the arrays with all the information about the n-th tetromino in the m-th rotation. These arrays represent the four rows and the four columns. tetrominoes[n][m][o] contains the array with the four elements of the n-th tetromino in the m-th rotation in the o-th row. tetrominoes[n][m][o][p] is the p-th element of the array representing the o-th row in the m-th rotation of the n-th tetromino. Such element can be 0 if it's an empty space or 1 if it's part of the tetromino. There isn't much more to explain as it's just a series of data entry. Let's add our first tetromino to the field.
Read more
  • 0
  • 0
  • 16481

article-image-managing-eap-domain-mode
Packt
19 Jul 2016
7 min read
Save for later

Managing EAP in Domain Mode

Packt
19 Jul 2016
7 min read
This article by Francesco Marchioni author of the book Mastering JBoss Enterprise Application Platform 7dives deep into application server management using the domain mode, its main components, and discusses how to shift to advanced configurations that resemble real-world projects. Here are the main topics covered are: Domain mode breakdown Handy domainproperties Electing the domaincontroller (For more resources related to this topic, see here.) Domain mode break down Managing the application server in the domain mode means, in a nutshell, to control multiple servers from a centralized single point of control. The servers that are part of the domain can span across multiple machines (or even across the cloud) and they can be grouped with similar servers of the domain to share a common configuration. To make some rationale, we will break down the domain components into two main categories: Physical components: Theseare the domain elements that can be identified with a Java process running on the operating system Logical components: Theseare the domain elements which can span across several physical components Domain physical components When you start the application server through the domain.sh script, you will be able to identify the following processes: Host controller: Each domain installation contains a host controller. This is a Java process that is in charge to start and stop the servers that are defined within the host.xml file. The host controller is only aware of the items that are specific to the local physical installation such as the domaincontroller host and port, the JVM settings of the servers or their system properties. Domain controller: One host controller of the domain (and only one) is configured to act as domaincontroller. This means basically two things: keeping the domainconfiguration (into the domain.xml file) and assisting the host controller for managing the servers of the domain. Servers: Each host controller can contain any number of servers which are the actual server instances. These server instances cannot be started autonomously. The host controller is in charge to start/stop single servers, when the domaincontroller commands them. If you start the default domain configuration on a Linux machine, you will see that the following processes will show in your operating system: As you can see, the process controller is identified by the [Process Controller] label, while the domaincontroller corresponds to the [Host Controller] label. Each server shows in the process table with the name defined in the host.xml file. You can use common operating system commands such as grep to further restrict the search to a specific process. Domain logical components A domain configuration with only physical elements in it would not add much to a line of standalone servers. The following components can abstract the domain definition, making it dynamic and flexible: Server Group: A server group is a collection of servers. They are defined in the domain.xml file, hence they don't have any reference to an actual host controller installation. You can use a server group to share configuration and deployments across a group of servers. Profile: A profile is an EAP configuration. A domain can hold as many profiles as you need. Out of the box the following configurations are provided: default: This configuration matches with the standalone.xml configuration (in standalone mode) hence it does not include JMS, IIOP, or HA. full: This configuration matches with the standalone-full.xml configuration (in standalone mode) hence it includes JMS and OpenJDK IIOP to the default server. ha: This configuration matches with the standalone-ha.xml configuration (in standalone mode) so it enhances the default configuration with clustering (HA). full-ha: This configuration matches with the standalone-full-ha.xml configuration (in standalone mode), hence it includes JMS, IIOP, and HA. Handy domainproperties So far we have learnt the default configuration files used by JBoss EAP and the location where they are placed. These settings can be however varied by means of system properties. The following table shows how to customize the domain configuration file names: Option Description --domain-config The domain configuration file (default domain.xml) --host-config The host configuration file (default host.xml) On the other hand, this table summarizes the available options to adjust the domain directory structure: Property Description jboss.domain.base.dir The base directory for domain content jboss.domain.config.dir The base configuration directory jboss.domain.data.dir The directory used for persistent data file storage jboss.domain.log.dir The directory containing the host-controller.log and process-controller.log files jboss.domain.temp.dir The directory used for temporary file storage jboss.domain.deployment.dir The directory used to store deployed content jboss.domain.servers.dir The directory containing the managed server instances For example, you can start EAP 7 in domain mode using the domain configuration file mydomain.xml and the host file named myhost.xml based on the base directory /home/jboss/eap7domain using the following command: $ ./domain.sh –domain-config=mydomain.xml –host-config=myhost.xml –Djboss.domain.base.dir=/home/jboss/eap7domain Electing the domaincontroller Before creating your first domain, we will learn more in detail the process which connects one or more host controller to one domaincontroller and how to elect a host controller to be a domaincontroller. The physical topology of the domain is stored in the host.xml file. Within this file, you will find as the first line the Host Controller name, which makes each host controller unique: <host name="master"> One of the host controllers will be configured to act as a domaincontroller. This is done in the domain-controller section with the following block, which states that the domaincontroller is the host controller itself (hence, local): <domain-controller> <local/> </domain-controller> All other host controllers will connect to the domaincontroller, using the following example configuration which uses the jboss.domain.master.address and jboss.domain.master.port properties to specify the domaincontroller address and port: <domain-controller> <remote protocol="remote" host="${jboss.domain.master.address}" port="${jboss.domain.master.port:9999}" security-realm="ManagementRealm"/> </domain-controller> The host controller-domaincontroller communication happens behind the scenes through a management native port that is defined as well into the host.xml file: <management-interfaces> <native-interface security-realm="ManagementRealm"> <socket interface="management" port="${jboss.management.native.port:9999}"/> </native-interface> <http-interface security-realm="ManagementRealm" http-upgrade-enabled="true"> <socket interface="management" port="${jboss.management.http.port:9990}"/> </http-interface> </management-interfaces> The other highlighted attribute is the managementhttpport that can be used by the administrator to reach the domaincontroller. This port is especially relevant if the host controller is the domaincontroller. Both sockets use the management interface, which is defined in the interfaces section of the host.xml file, and exposes the domain controller on a network available address: <interfaces> <interface name="management"> <inet-address value="${jboss.bind.address.management:127.0.0.1}"/> </interface> <interface name="public"> <inet-address value="${jboss.bind.address:127.0.0.1}"/> </interface> </interfaces> If you want to run multiplehost controllers on the same machine, you need to provide a unique jboss.management.native.port for each host controller or a different jboss.bind.address.management. Summary In this article we have some essentials of the domain mode breakdown, handy domain propertiesand also electing the domain controller. Resources for Article: Further resources on this subject: Red5: A video-on-demand Flash Server [article] Animating Elements [article] Data Science with R [article]
Read more
  • 0
  • 0
  • 16460
article-image-responsive-visualizations-using-d3js-and-bootstrap
Packt
01 Mar 2016
21 min read
Save for later

Responsive Visualizations Using D3.js and Bootstrap

Packt
01 Mar 2016
21 min read
In this article by Christoph Körner, the author of the book Learning Responsive Data Visualization, we will design and implement a responsive data visualization using Bootstrap and Media Queries based on real data. We will cover the following topics: Absolute and relative units in the browsers Drawing charts with percentage values Adapting charts using JavaScript event listeners Learning to adapt the resolution of the data Using bootstrap's Media Queries Understanding how to use Media Queries in CSS, LESS and JavaScript Learning how to use bootstrap's grid system (For more resources related to this topic, see here.) First, we will discuss the most important absolute and relative units that are available in modern browsers. You will learn the difference of absolute pixels, relative percentages, em, rem, and many more. In the next section, we will take a look at what is really needed for a chart to be responsive. Adapting the width to the parent element is one of the requirements, and you will learn about two different ways to implement this. After this section, you will know when to use percentage values or JavaScript event listeners. We will also take a look at adapting the data resolution, which is another important property of responsive visualizations. In the next section, we will explore Media Queries, and understand how we can use them to make viewport depended responsive charts. We will take the advantage of Bootstrap's definitions of Media Queries for the most common device resolutions to integrate them into our responsive chart using CSS or LESS. Finally, we will also see how to include Media Queries into JavaScript. In the last section, we will take a look at Bootstrap's grid system and learn how to seamlessly integrate it with the charts. This will give us not only some great flexibility but will also make it easier to combine multiple charts to one big dashboard application. Units and lengths in the browser Creating a responsive design, bet it website or graphics, depends strongly on the units and lengths that a browser can interpret. We can easily create an element that fills the entire width of a container using the percentage values that are relative to the parent container; whereas, achieving the same result width absolute values could be very tricky. Thus, mastering responsive graphics also means knowing all the absolute and relative units that are available in the browser. Units for Absolute lengths The most convenient and popular way in web design and development is to define and measure lengths and dimensions in absolute units, usually in pixels. The reason for this is that designers and developers often want to exactly specify the exact dimensions of an object. The pixel unit called px has been introduced as a visual unit based on a physical measurement to read from a device in the distance of approximately one arm length; however, all the modern browsers can also allow the definitions of lengths based on physical units. The following list shows the most common absolute units and their relations to each other: cm: centimeters (1 cm = 96px/2.54) mm: millimeters (1 mm = 1/10th of 1 cm) in: inches (1in = 2.54 cm = 96 px) pt: points (1 pt = 1/72th of 1 in) px: pixels (1 px = 1/96th of 1 in) More information on the origin and meaning of the pixel unit can be found in the CSS3 Specification available at http://www.w3.org/TR/css3-values/#viewport-relative-lengths. Units for Relative lengths In addition to Absolute lengths, relative lengths that are expressed as the percentage of the width or height of a parent element has also been a common technique to the style dynamic elements of web pages. Traditionally, the % unit has always been the unit of choice for the preceding reason. However, with CSS3, a couple of additional relative units have found their way into the browsers, for example, to define a length relative to the font size of the element. Here is a list of the relative length units that have been specified in the CSS3 specifications and will soon be available in modern browsers: %: the percentage of the width/height of the absolute container Em: the factor of font size of the element Rem: the factor of font size of the root element Vw: 1% of viewport's width vh: 1% viewport's height vmin: 1% of the viewport's smaller dimension (either vw or vh) vmax: 1% of the viewport's larger dimension (either vw or vh) I am aware that as a web developer, we cannot really take the advantage of any technology that will be supported soon; however, I want to point out one unit that will play an important role for feature web developers: the rem unit. The rem unit defines the length of an element based on the font size of the root node (the html node in this case), rather than the font size of the current element such as em. This rem unit is very powerful if we use it to define the lengths and spacings of a layout because the layout can also adapt when the user increases the font size of the browser (that is, for readability). I want to mention that Bootstrap 4 will replace all absolute pixel units for Media Queries by the rem units because of this reason. If we look at the following figure, we also see that rem units are already supported in all major browsers. I recommend you to start replacing all your absolute pixel units of your layout and spacing by the rem units: Cross-browser compatibility of rem units However, percentage units are not dead; we can still use them when they are appropriate. We will use them later to draw SVG elements with dimensions based on their parent elements' dimensions. Units for Resolution To round up the section of relative and absolute units, I want to mention that we can also use resolutions in different units. These resolution units can be used in Media Queries together with the min-resolution or max-resolution attribute: Dpi: dots per inch dpcm: dots per centimeter dppx: dots per px unit Mathematical Expressions We often have the problem of dealing with rational numbers or expressions in CSS; just imagine defining a 3-column grid with a width of 33% per column, or imagine the need of computing a simple expression in the CSS file. CSS3 provides a simple solution for this: calc(exp): This computes the mathematical expression called exp, which can consist of lengths, values, and the operators called +, -, /, and * Note that the + and -operators must be surrounded by whitespace. Otherwise, they will be interpreted as a sign of the second number rather than the operator. Both the other operators called * and / don't require a whitespace, but I encourage you to add them for consistency. We can use these expression in the following snippets. .col-4 { width: calc(100%/3); } .col-sp-2 { width: calc(50% - 2em); } The preceding examples look great; however, as we can see in the following figure, we need to take care of the limitations of the browser compatibility: Cross-browser compatibility of the calc() expression Responsive charts Now that we know some basics about absolute and relative units, we can start to define, design, and implement responsive charts. A responsive chart is a chart that automatically adapts its look and feel to the resolution of the user's device; thus, responsive charts need to adapt the following properties: The dimension (width and height) The resolution of data points The interactions and interaction areas. Adapting the dimensions is most obvious. The chart should always scale and adapt to the width of its parent element. In the previous section, you learned about relative and absolute lengths, so one might think that simply using relative values for the chart's dimensions would be enough. However, there are multiple ways with advantages and disadvantages to achieve this; in this section, we will discuss three of them. Adapting the resolution of the data is a little less obvious and often neglected. The resolution of data points (the amount of data point per pixel) should adapt, so that we can see more points on a device with a higher resolution and less points on a low resolution screen. In this section we will see that this can only be achieved using JavaScript event listeners and by redrawing/updating the whole chart manually. Adapting interactions and interaction areas is important for not just using different screen resolutions but also different devices. We interact differently with a TV than a computer, and we use different input devices on a desktop and mobile phone. However, the chart will allow interactions and interaction areas that are appropriate for a given device and screen resolution.. Using Relative Lengths in SVG The first and most obvious solution for adapting the dimensions of a chart to its parent container is the use of relative values for lengths and coordinates. This means that when we define the chart once with relative values and the browser takes care of recomputing all the values when the dimension of the parent container has changed, there is no manual redrawing of the chart required. First, we will add some CSS styles to scale the SVG element to the full width of the parent container: .chart { height: 16rem; position: relative; } .chart > svg { width: 100%; height: 100%; } Next, we modify all our scales to work on a range at [0, 100] and subtract a padding from both sides: var xScale = d3.scale.ordinal() .domain(flatData.map(xKey)) .rangeBands([padding, 100 - 2*padding]); var yScale = d3.scale.linear() .domain([0, d3.max(flatData, yKey)]) .range([100 - 2*padding, padding]); Finally, we can draw the chart as before, but simply adding percentage signs % at the end of the attributes—to indicate the use of percentage units: $$bars .attr('x', function(d) { return (xScale(d.x) + i*barWidth ) + '%'; }) .attr('y', function(d) { return yScale(d.y) + '%'; }) .attr('height', function(d) { return (yScale(0) - yScale(d.y)) + '%'; }) .attr('width', barWidth + '%') .attr('fill', colors(data.key)); Observe that we only slightly modified the code to plot the bars in the bar chart in order to use percentage values as coordinates for the attributes. However, the effect of this small change is enormous. In the following figure, we can see the result of the chart in a browser window: Bar chart using relative lengths If we now increase the size of the browser, the bar chart scales nicely to the full width of the parent container. We can see the scaled chart in the following figure: Scaled bar chart using relative lengths If you are not impressed by it now, you better be. This is awesome in my opinion because it leaves all the hard work of recomputing the SVG element dimensions to the browser. We don't have to care about them, and these native computations give us some maximal performance. The previous example shows how to use percentage values to create a simple bar chart. However, what we didn't explain so far is why we didn't add any axis and labels to the chart. Well, despite the idea that we can exploit native rescaling of the browser, we need to face the limitations of this technique. Relative values are only allowed in standard attributes, such as width, height, x, y, cx, cy—but not in SVG paths or transform functions. Conclusion about using Relative Lengths While this sounds like an excellent solution (and indeed it is wonderful for certain use cases), it has two major drawbacks: The percentage values are not accepted for the SVG transform attributes and for the d attribute in the path elements, only in standard attributes Due to the fact that the browser is recomputing all the values automatically, we cannot adapt the resolution of the data of the charts The first point is the biggest drawback, which means we can only position elements using the standard attributes called width, height, x, y, cx, cy, and more. However, we can still draw a bar chart that seamlessly adapts according to the parent elements without the use of JavaScript event listeners. The second argument doesn't play a huge role anymore compared to the first one, and it can be circumvented using additional JavaScript event listeners, but I am sure you get the point. Using the JavaScript Resize event The last option is to use JavaScript event handlers and redraw the chart manually when the dimensions of the parent container change. Using this technique, we can always measure the width of the parent container (in absolute units) and use this length to update and redraw the chart accordingly. This gives us great flexibility over the data resolution, and we can adapt the chart to a different aspect ratio when needed as well. The Native Resize event Theoretically, this solution sounds brilliant. We simply watch the parent container or even the SVG container itself (if it uses a width of 100%) for the resize events, and then redraw the chart when the dimensions of the element change. However, there does not exist a native resize event on the div or svg element; modern browsers only support the resize events on the window element. Hence, it triggers only if the dimensions of the browser window changes. This means also that we need to clean up listeners once we remove a chart from the page. Although this is a limitation, in most cases, we can still use the windowresize event to adapt the chart to its parent container; we have to just keep this in our mind. Let's always use the parent container's absolute dimensions for drawing and redrawing the chart; we need to define the following things inside a redraw function: var width = chart.clientWidth; var height = width / aspectRatio; Now, we can add a resize event listener to the window element and call the redraw function whenever the window dimensions change: window.addEventListener('resize', function(event){ redraw(); }); The benefit of this solution is that we can do everything that we want in the redraw function, for example, modifying the aspect ratio, adapting the labels of the axis, or modifying the number of displayed elements. The following figure shows a resized version of the previous chart; we observe that this time, the axis ticks adapt nicely and don't overlap anymore. Moreover, the axis ticks now take the full available space: Resized chart with adapted axis ticks Adapting the Resolution of the Data However, there is another problem that can be nicely solved using these types of manual redraws—the problem of data resolution. How much data should be displayed in a small chart and how much in a bigger chart? Small chart with high data resolution I think you agree that in the preceding figure, we display too much data for the size of the graphic. This is bad and makes the chart useless. Moreover, we should really adapt the resolution of the data according to the small viewport in the redrawing process. Let's implement a function that returns only ever i-th element of an array: function adaptResolution(data, resolution) { resolution = resolution ? Math.ceil(resolution) : 1; return data.filter(function(d, i) { return i % resolution === 0; }); } Great, let's define a width depended data resolution and filter the data accordingly: var pixelsPerData = 20; var resolution = pixelsPerData * (flatData.length) / width; In the previous code, we observed that we can now define the minimum amount of pixel that one data point should have and remove the amount of values accordingly by calling the following: var flatDataRes = adaptResolution(flatData, resolution); The following image shows a small chart with a low number of values, which is perfectly readable even though it is very small: Small chart with a proper data resolution In the next figure, we can see the same chart based on the same data drawn with a bigger container. We immediately observe that also the data resolutions adapts accordingly; and again, the chart looks nice: Big chart with a proper data resolution Conclusion of using Resize events This is the most flexible solution, and therefore, in many situations, it is the solution of choice. However, you need to be aware that there are also drawbacks in using this solution: There is no easy way to listen for resize events of the parent container We need to add event listeners We need to make sure that event listeners are removed properly We need to manually redraw the chart Using Bootstrap's Media Queries Bootstrap is an awesome library that gets you started quickly with new projects. It not just includes a huge amount of useful HTML components but also normalized amd standardized CSS styles. One particular style is the implementation of Media Queries for four typical device types (five types in Bootstrap 4). In this section, we will take a look at how to make use of these Media Queries in our styles and scripts. The great thing about Bootstrap is that it successfully standardizes typical device dimensions for web developers thus, beginners can simply use them without rethinking over and over which pixel width could be the most common one for tablets. Media Queries in CSS The quickest way to use Bootstrap's Media Queries is to simply copy them from the compiled source code. The queries are here: /* Extra small devices (phones, etc. less than 768px) */ /* No media query since this is the default in Bootstrap */ /* Small devices (tablets, etc.) */ @media (min-width: 768px) { ... } /* Medium devices (desktops, 992px and up) */ @media (min-width: 992px) { ... } /* Large devices (large desktops, 1200px and up) */ @media (min-width: 1200px) { ... } We can easily add these queries to our CSS styles and define certain properties and styles for our visualizations, such as four predefined widths, aspect ratios, spacing, and so on in order to adapt the chart appearance to the device type of the user. Bootstrap 4 is currently in alpha; however, I think you can already start using the predefined device types in your CSS. The reason I am strongly arguing for Bootstrap 4 is because of its shift towards the em units instead of pixels: // Extra small devices (portrait phones, etc.) // No media query since this is the default in Bootstrap // Small devices (landscape phones, etc.) @media (min-width: 34em) { ... } // Medium devices (tablets, etc.) @media (min-width: 48em) { ... } // Large devices (desktops, etc.) @media (min-width: 62em) { ... } // Extra large devices (large desktops, etc.) @media (min-width: 75em) { ... } Once again, the huge benefit of this is that the layout can adapt when the user increases the font size of the browser, for example, to enhance readability. Media Queries in LESS/SASS In Bootstrap 3, you can include Media Query mixins to your LESS file, which then gets compiled to plain CSS. To use these mixins, you have to create a LESS file instead of CSS and import the Bootstrap variables.less file. In this file, Bootstrap defines all its dimensions, colors, and other variables. Let's create a style.less file and import variables.less: // style.less @import "bower_components/bootstrap/less/variables.less"; Perfect, that's all. Now, we can go ahead and start using Bootstrap's device types in our LESS file. /* Extra small devices (phones, etc. less than 768px) */ /* No media query since this is the default in Bootstrap */ /* Small devices (tablets, etc.) */ @media (min-width: @screen-sm-min) { ... } /* Medium devices (desktops, etc.) */ @media (min-width: @screen-md-min) { ... } /* Large devices (large desktops, etc.) */ @media (min-width: @screen-lg-min) { ... } Finally, we need to use a LESS compiler to transform our style.less file to plain CSS. To achieve this, we run the following command from the terminal: lessc styles.less styles.css As we can see, the command requires the LESS compiler called lessc being installed. If it's not yet installed on your system, go ahead and install it using the following command: npm install -g less If you are new to LESS, I recommend you to read through the LESS documentation on http://lesscss.org/. Once you check out of LESS, you can also look at the very similar SASS format, which is favored by Bootstrap 4. You can find the SASS documentation at http://sass-lang.com/. We can use the Bootstrap 4 Media Queries in a SASS file by the following mixins: @include media-breakpoint-up(xs) { ... } @include media-breakpoint-up(sm) { ... } @include media-breakpoint-up(md) { ... } @include media-breakpoint-up(lg) { ... } @include media-breakpoint-up(xl) { ... } In my opinion, including Bootstrap's LESS/SASS mixins to the styles of your visualization is the cleanest solution because you always compile your CSS from the latest Bootstrap source, and you don't have to copy CSS into your project. Media Queries in JavaScript Another great possibility of using Bootstrap's Media Queries to adapt your visualization to the user's device is to use them directly in JavaScript. The native window.matchMedia (mediaQuery) function gives you the same control over your JavaScript as Media Queries gives us over CSS. Here is a little example on how to use it: if (window.matchMedia("(min-width: 1200px)").matches) { /* the viewport is at least 1200 pixels wide */ } else { /* the viewport is less than 1200 pixels wide */ } In the preceding code, we see that this function is quite easy to use and adds almost infinite customization possibilities to our visualization. More information about the matchMedia function can be found on the Mozilla Website https://developer.mozilla.org/de/docs/Web/API/Window/matchMedia. However, apart from using the watchMedia function directly, we could also use a wrapper around the native API call. I can really recommend the enquire.js library by Nick Williams, which allows you to declare event listeners for viewport changes. It can be installed via the package manager bower by running the following command from the terminal: bower install enquire Then, we need to add enquire.js to the website and use in the following snippet: enquire.register("screen and (min-width:1200px)", { // triggers when the media query matches. match : function() { /* the viewport is at least 1200 pixels wide */ }, // optional; triggers when the media query transitions unmatch : function() { /* the viewport is less than 1200 pixels wide */ }, }); In the preceding code, we see that we can now can add the match and unmatch listeners almost in the same way as listening for resize events—just much more flexible. More information about require.js can be found on the GitHub page of the project at https://github.com/WickyNilliams/enquire.js. If we would like to use the Bootstrap device types, we could easily implement them (as needed) with enquire.js and trigger events for each device type. However, I prefer being very flexible and using the bare wrapper. Using Bootstrap's Grid System Another great and quick way of making your charts responsive and play nicely together with Bootstrap is to integrate them into Bootstrap's gird system. It is the best and cleanest integration however, is to separate concerns—and make the visualization as general and adaptive as possible. Let's take our bar chart example with the custom resize events and integrate it into a simple grid layout. As usual, you can find the full source code of the example in the code examples: <div class="container"> <div class="row"> <div class="col-md-8"> <div class="chart" data-url="…" …> </div> </div> <div class="col-md-4"> <h2>My Dashboard</h2> <p>This is a simple dashboard</p> </div> </div> <div class="row"> <div class="col-md-4"> <div class="chart" data-url="…" …> </div> </div> <div class="col-md-4"> <div class="chart" data-url="…" …> </div> </div> <div class="col-md-4"> <div class="chart" data-url="…" …> </div> </div> </div> <div class="row"> <div class="col-md-6"> <div class="chart" data-url="…" …> </div> </div> <div class="col-md-6"> <div class="chart" data-url="…" …> </div> </div> </div> We observe that by making use of the parent containers' width, we can simply add the charts as the div elements in the columns of the grid. This is the preferred integration where two components play together nicely but are not depended on each other. In the following figure, we can see a screenshot from the simple dashboard that we just built. We observe that the visualizations already fit nicely into our grid layout, which makes it easy to compose them together: A simple dashboard using Bootstrap's grid layout Summary In this article, you learned the essentials about absolute and relative units to define lengths in a browser. We remember that the em and rem unit plays an important role because it allows a layout to adapt when a user increases the font size of the web site. Then, you learned about how to use relative units and JavaScript resize events to adapt the chart size and the data resolution according to the current container size. We looked into Media Queries in CSS, LESS, und JavaScript. Finally, we saw how to integrate charts with Bootstrap's grid system and implemented a simple Google Analytics-like dashboard with multiple charts.
Read more
  • 0
  • 0
  • 16457

article-image-deep-learning-indaba-presents-the-state-of-natural-language-processing-in-2018
Sugandha Lahoti
12 Dec 2018
5 min read
Save for later

Deep Learning Indaba presents the state of Natural Language Processing in 2018

Sugandha Lahoti
12 Dec 2018
5 min read
The ’Strengthening African Machine Learning’ conference organized by Deep Learning Indaba, at Stellenbosch, South Africa, is ongoing right now. This 6-day conference will celebrate and strengthen machine learning in Africa through state-of-the-art teaching, networking, policy debate, and through support programmes. Yesterday, three conference organizers, Sebastian Ruder, Herman Kamper, and Stephan Gouws asked tech experts their view on the state of Natural Language Processing, more specifically these 4 questions: What do you think are the three biggest open problems in Natural Language Processing at the moment? What would you say is the most influential work in Natural Language Processing in the last decade, if you had to pick just one? What, if anything, has led the field in the wrong direction? What advice would you give a postgraduate student in Natural Language Processing starting their project now? The tech experts interviewed included the likes of Yoshua Bengio, Hal Daumé III, Barbara Plank, Miguel Ballesteros, Anders Søgaard, Lea Frermann, Michael Roth, Annie Louise, Chris Dyer, Felix Hill,  Kevin Knight and more. https://twitter.com/seb_ruder/status/1072431709243744256 Biggest open problems in Natural Language Processing at the moment Although each expert talked about a variety of Natural Language Processing open issues, the following common key themes recurred. No ‘real’ understanding of Natural language understanding Many experts argued that natural Language understanding is central and also important for natural language generation. They agreed that most of our current Natural Language Processing models do not have a “real” understanding. What is needed is to build models that incorporate common sense, and what (biases, structure) should be built explicitly into these models. Dialogue systems and chatbots were mentioned in several responses. Maletšabisa Molapo, a Research Scientist at IBM Research and one of the experts answered, “Perhaps this may be achieved by general NLP Models, as per the recent announcement from Salesforce Research, that there is a need for NLP architectures that can perform well across different NLP tasks (machine translation, summarization, question answering, text classification, etc.)” NLP for low-resource scenarios Another open problem is using NLP for low-resource scenarios. This includes generalization beyond the training data, learning from small amounts of data and other techniques such as Domain-transfer, transfer learning, multi-task learning. Also includes different supervised learning techniques, semi-supervised, weakly-supervised, “Wiki-ly” supervised, distantly-supervised, lightly-supervised, minimally-supervised and unsupervised learning. Per Karen Livescu, Associate Professor Toyota Technological Institute at Chicago, “Dealing with low-data settings (low-resource languages, dialects (including social media text "dialects"), domains, etc.).  This is not a completely "open" problem in that there are already a lot of promising ideas out there; but we still don't have a universal solution to this universal problem.” Reasoning about large or multiple contexts Experts believed that NLP has problems in dealing with large contexts. These large context documents can be either text or spoken documents, which currently lack common sense incorporation. According to, Isabelle Augenstein, tenure-track assistant professor at the University of Copenhagen, “Our current models are mostly based on recurrent neural networks, which cannot represent longer contexts well. One recent encouraging work in this direction I like is the NarrativeQA dataset for answering questions about books. The stream of work on graph-inspired RNNs is potentially promising, though has only seen modest improvements and has not been widely adopted due to them being much less straight-forward to train than a vanilla RNN.” Defining problems, building diverse datasets and evaluation procedures “Perhaps the biggest problem is to properly define the problems themselves. And by properly defining a problem, I mean building datasets and evaluation procedures that are appropriate to measure our progress towards concrete goals. Things would be easier if we could reduce everything to Kaggle style competitions!” - Mikel Artetxe. Experts believe that current NLP datasets need to be evaluated. A new generation of evaluation datasets and tasks are required that show whether NLP techniques generalize across the true variability of human language. Also what is required are more diverse datasets. “Datasets and models for deep learning innovation for African Languages are needed for many NLP tasks beyond just translation to and from English,” said Molapo. Advice to a postgraduate student in NLP starting their project Do not limit yourself to reading NLP papers. Read a lot of machine learning, deep learning, reinforcement learning papers. A PhD is a great time in one’s life to go for a big goal, and even small steps towards that will be valued. — Yoshua Bengio Learn how to tune your models, learn how to make strong baselines, and learn how to build baselines that test particular hypotheses. Don’t take any single paper too seriously, wait for its conclusions to show up more than once. — George Dahl I believe scientific pursuit is meant to be full of failures. If every idea works out, it’s either because you’re not ambitious enough, you’re subconsciously cheating yourself, or you’re a genius, the last of which I heard happens only once every century or so. so, don’t despair! — Kyunghyun Cho Understand psychology and the core problems of semantic cognition. Understand machine learning. Go to NeurIPS. Don’t worry about ACL. Submit something terrible (or even good, if possible) to a workshop as soon as you can. You can’t learn how to do these things without going through the process. — Felix Hill Make sure to go through the complete list of all expert responses for better insights. Google open sources BERT, an NLP pre-training technique Use TensorFlow and NLP to detect duplicate Quora questions [Tutorial] Intel AI Lab introduces NLP Architect Library  
Read more
  • 0
  • 0
  • 16456

article-image-making-web-server-nodejs
Packt
25 Feb 2016
38 min read
Save for later

Making a Web Server in Node.js

Packt
25 Feb 2016
38 min read
In this article, we will cover the following topics: Setting up a router Serving static files Caching content in memory for immediate delivery Optimizing performance with streaming Securing against filesystem hacking exploits (For more resources related to this topic, see here.) One of the great qualities of Node is its simplicity. Unlike PHP or ASP, there is no separation between the web server and code, nor do we have to customize large configuration files to get the behavior we want. With Node, we can create the web server, customize it, and deliver content. All this can be done at the code level. This article demonstrates how to create a web server with Node and feed content through it, while implementing security and performance enhancements to cater for various situations. If we don't have Node installed yet, we can head to http://nodejs.org and hit the INSTALL button appearing on the homepage. This will download the relevant file to install Node on our operating system. Setting up a router In order to deliver web content, we need to make a Uniform Resource Identifier (URI) available. This recipe walks us through the creation of an HTTP server that exposes routes to the user. Getting ready First let's create our server file. If our main purpose is to expose server functionality, it's a general practice to call the server.js file (because the npm start command runs the node server.js command by default). We could put this new server.js file in a new folder. It's also a good idea to install and use supervisor. We use npm (the module downloading and publishing command-line application that ships with Node) to install. On the command-line utility, we write the following command: sudo npm -g install supervisor Essentially, sudo allows administrative privileges for Linux and Mac OS X systems. If we are using Node on Windows, we can drop the sudo part in any of our commands. The supervisor module will conveniently autorestart our server when we save our changes. To kick things off, we can start our server.js file with the supervisor module by executing the following command: supervisor server.js For more on possible arguments and the configuration of supervisor, check out https://github.com/isaacs/node-supervisor. How to do it... In order to create the server, we need the HTTP module. So let's load it and use the http.createServer method as follows: var http = require('http'); http.createServer(function (request, response) {   response.writeHead(200, {'Content-Type': 'text/html'});   response.end('Woohoo!'); }).listen(8080); Now, if we save our file and access localhost:8080 on a web browser or using curl, our browser (or curl) will exclaim Woohoo! But the same will occur at localhost:8080/foo. Indeed, any path will render the same behavior. So let's build in some routing. We can use the path module to extract the basename variable of the path (the final part of the path) and reverse any URI encoding from the client with decodeURI as follows: var http = require('http'); var path = require('path'); http.createServer(function (request, response) {   var lookup=path.basename(decodeURI(request.url)); We now need a way to define our routes. One option is to use an array of objects as follows: var pages = [   {route: '', output: 'Woohoo!'},   {route: 'about', output: 'A simple routing with Node example'},   {route: 'another page', output: function() {return 'Here's     '+this.route;}}, ]; Our pages array should be placed above the http.createServer call. Within our server, we need to loop through our array and see if the lookup variable matches any of our routes. If it does, we can supply the output. We'll also implement some 404 error-related handling as follows: http.createServer(function (request, response) {   var lookup=path.basename(decodeURI(request.url));   pages.forEach(function(page) {     if (page.route === lookup) {       response.writeHead(200, {'Content-Type': 'text/html'});       response.end(typeof page.output === 'function'       ? page.output() : page.output);     }   });   if (!response.finished) {      response.writeHead(404);      response.end('Page Not Found!');   } }).listen(8080); How it works... The callback function we provide to http.createServer gives us all the functionality we need to interact with our server through the request and response objects. We use request to obtain the requested URL and then we acquire its basename with path. We also use decodeURI, without which another page route would fail as our code would try to match another%20page against our pages array and return false. Once we have our basename, we can match it in any way we want. We could send it in a database query to retrieve content, use regular expressions to effectuate partial matches, or we could match it to a filename and load its contents. We could have used a switch statement to handle routing, but our pages array has several advantages—it's easier to read, easier to extend, and can be seamlessly converted to JSON. We loop through our pages array using forEach. Node is built on Google's V8 engine, which provides us with a number of ECMAScript 5 (ES5) features. These features can't be used in all browsers as they're not yet universally implemented, but using them in Node is no problem! The forEach function is an ES5 implementation; the ES3 way is to use the less convenient for loop. While looping through each object, we check its route property. If we get a match, we write the 200 OK status and content-type headers, and then we end the response with the object's output property. The response.end method allows us to pass a parameter to it, which it writes just before finishing the response. In response.end, we have used a ternary operator (?:) to conditionally call page.output as a function or simply pass it as a string. Notice that the another page route contains a function instead of a string. The function has access to its parent object through the this variable, and allows for greater flexibility in assembling the output we want to provide. In the event that there is no match in our forEach loop, response.end would never be called and therefore the client would continue to wait for a response until it times out. To avoid this, we check the response.finished property and if it's false, we write a 404 header and end the response. The response.finished flag is affected by the forEach callback, yet it's not nested within the callback. Callback functions are mostly used for asynchronous operations, so on the surface this looks like a potential race condition; however, the forEach loop does not operate asynchronously; it blocks until all loops are complete. There's more... There are many ways to extend and alter this example. There are also some great non-core modules available that do the legwork for us. Simple multilevel routing Our routing so far only deals with a single level path. A multilevel path (for example, /about/node) will simply return a 404 error message. We can alter our object to reflect a subdirectory-like structure, remove path, and use request.url for our routes instead of path.basename as follows: var http=require('http'); var pages = [   {route: '/', output: 'Woohoo!'},   {route: '/about/this', output: 'Multilevel routing with Node'},   {route: '/about/node', output: 'Evented I/O for V8 JavaScript.'},   {route: '/another page', output: function () {return 'Here's '     + this.route; }} ]; http.createServer(function (request, response) {   var lookup = decodeURI(request.url); When serving static files, request.url must be cleaned prior to fetching a given file. Check out the Securing against filesystem hacking exploits recipe in this article. Multilevel routing could be taken further; we could build and then traverse a more complex object as follows: {route: 'about', childRoutes: [   {route: 'node', output: 'Evented I/O for V8 JavaScript'},   {route: 'this', output: 'Complex Multilevel Example'} ]} After the third or fourth level, this object would become a leviathan to look at. We could alternatively create a helper function to define our routes that essentially pieces our object together for us. Alternatively, we could use one of the excellent noncore routing modules provided by the open source Node community. Excellent solutions already exist that provide helper methods to handle the increasing complexity of scalable multilevel routing. Parsing the querystring module Two other useful core modules are url and querystring. The url.parse method allows two parameters: first the URL string (in our case, this will be request.url) and second a Boolean parameter named parseQueryString. If the url.parse method is set to true, it lazy loads the querystring module (saving us the need to require it) to parse the query into an object. This makes it easy for us to interact with the query portion of a URL as shown in the following code: var http = require('http'); var url = require('url'); var pages = [   {id: '1', route: '', output: 'Woohoo!'},   {id: '2', route: 'about', output: 'A simple routing with Node     example'},   {id: '3', route: 'another page', output: function () {     return 'Here's ' + this.route; }   }, ]; http.createServer(function (request, response) {   var id = url.parse(decodeURI(request.url), true).query.id;   if (id) {     pages.forEach(function (page) {       if (page.id === id) {         response.writeHead(200, {'Content-Type': 'text/html'});         response.end(typeof page.output === 'function'         ? page.output() : page.output);       }     });   }   if (!response.finished) {     response.writeHead(404);     response.end('Page Not Found');   } }).listen(8080); With the added id properties, we can access our object data by, for instance, localhost:8080?id=2. The routing modules There's an up-to-date list of various routing modules for Node at https://github.com/joyent/node/wiki/modules#wiki-web-frameworks-routers. These community-made routers cater to various scenarios. It's important to research the activity and maturity of a module before taking it into a production environment. NodeZoo (http://nodezoo.com) is an excellent tool to research the state of a NODE module. See also The Serving static files and Securing against filesystem hacking exploits recipes discussed in this article Serving static files If we have information stored on disk that we want to serve as web content, we can use the fs (filesystem) module to load our content and pass it through the http.createServer callback. This is a basic conceptual starting point to serve static files; as we will learn in the following recipes, there are much more efficient solutions. Getting ready We'll need some files to serve. Let's create a directory named content, containing the following three files: index.html styles.css script.js Add the following code to the HTML file index.html: <html>   <head>     <title>Yay Node!</title>     <link rel=stylesheet href=styles.css type=text/css>     <script src=script.js type=text/javascript></script>   </head>   <body>     <span id=yay>Yay!</span>   </body> </html> Add the following code to the script.js JavaScript file: window.onload = function() { alert('Yay Node!'); }; And finally, add the following code to the CSS file style.css: #yay {font-size:5em;background:blue;color:yellow;padding:0.5em} How to do it... As in the previous recipe, we'll be using the core modules http and path. We'll also need to access the filesystem, so we'll require fs as well. With the help of the following code, let's create the server and use the path module to check if a file exists: var http = require('http'); var path = require('path'); var fs = require('fs'); http.createServer(function (request, response) {   var lookup = path.basename(decodeURI(request.url)) ||     'index.html';   var f = 'content/' + lookup;   fs.exists(f, function (exists) {     console.log(exists ? lookup + " is there"     : lookup + " doesn't exist");   }); }).listen(8080); If we haven't already done it, then we can initialize our server.js file by running the following command: supervisor server.js Try loading localhost:8080/foo. The console will say foo doesn't exist, because it doesn't. The localhost:8080/script.js URL will tell us that script.js is there, because it is. Before we can serve a file, we are supposed to let the client know the content-type header, which we can determine from the file extension. So let's make a quick map using an object as follows: var mimeTypes = {   '.js' : 'text/javascript',   '.html': 'text/html',   '.css' : 'text/css' }; We could extend our mimeTypes map later to support more types. Modern browsers may be able to interpret certain mime types (like text/javascript), without the server sending a content-type header, but older browsers or less common mime types will rely upon the correct content-type header being sent from the server. Remember to place mimeTypes outside of the server callback, since we don't want to initialize the same object on every client request. If the requested file exists, we can convert our file extension into a content-type header by feeding path.extname into mimeTypes and then pass our retrieved content-type to response.writeHead. If the requested file doesn't exist, we'll write out a 404 error and end the response as follows: //requires variables, mimeType object... http.createServer(function (request, response) {     var lookup = path.basename(decodeURI(request.url)) ||     'index.html';   var f = 'content/' + lookup;   fs.exists(f, function (exists) {     if (exists) {       fs.readFile(f, function (err, data) {         if (err) {response.writeHead(500); response.end('Server           Error!'); return; }         var headers = {'Content-type': mimeTypes[path.extname          (lookup)]};         response.writeHead(200, headers);         response.end(data);       });       return;     }     response.writeHead(404); //no such file found!     response.end();   }); }).listen(8080); At the moment, there is still no content sent to the client. We have to get this content from our file, so we wrap the response handling in an fs.readFile method callback as follows: //http.createServer, inside fs.exists: if (exists) {   fs.readFile(f, function(err, data) {     var headers={'Content-type': mimeTypes[path.extname(lookup)]};     response.writeHead(200, headers);     response.end(data);   });  return; } Before we finish, let's apply some error handling to our fs.readFile callback as follows: //requires variables, mimeType object... //http.createServer,  path exists, inside if(exists):  fs.readFile(f, function(err, data) {     if (err) {response.writeHead(500); response.end('Server       Error!');  return; }     var headers = {'Content-type': mimeTypes[path.extname      (lookup)]};     response.writeHead(200, headers);     response.end(data);   }); return; } Notice that return stays outside of the fs.readFile callback. We are returning from the fs.exists callback to prevent further code execution (for example, sending the 404 error). Placing a return statement in an if statement is similar to using an else branch. However, the pattern of the return statement inside the if loop is encouraged instead of if else, as it eliminates a level of nesting. Nesting can be particularly prevalent in Node due to performing a lot of asynchronous tasks, which tend to use callback functions. So, now we can navigate to localhost:8080, which will serve our index.html file. The index.html file makes calls to our script.js and styles.css files, which our server also delivers with appropriate mime types. We can see the result in the following screenshot: This recipe serves to illustrate the fundamentals of serving static files. Remember, this is not an efficient solution! In a real world situation, we don't want to make an I/O call every time a request hits the server; this is very costly especially with larger files. In the following recipes, we'll learn better ways of serving static files. How it works... Our script creates a server and declares a variable called lookup. We assign a value to lookup using the double pipe || (OR) operator. This defines a default route if path.basename is empty. Then we pass lookup to a new variable that we named f in order to prepend our content directory to the intended filename. Next, we run f through the fs.exists method and check the exist parameter in our callback to see if the file is there. If the file does exist, we read it asynchronously using fs.readFile. If there is a problem accessing the file, we write a 500 server error, end the response, and return from the fs.readFile callback. We can test the error-handling functionality by removing read permissions from index.html as follows: chmod -r index.html Doing so will cause the server to throw the 500 server error status code. To set things right again, run the following command: chmod +r index.html chmod is a Unix-type system-specific command. If we are using Windows, there's no need to set file permissions in this case. As long as we can access the file, we grab the content-type header using our handy mimeTypes mapping object, write the headers, end the response with data loaded from the file, and finally return from the function. If the requested file does not exist, we bypass all this logic, write a 404 error message, and end the response. There's more... The favicon icon file is something to watch out for. We will explore the file in this section. The favicon gotcha When using a browser to test our server, sometimes an unexpected server hit can be observed. This is the browser requesting the default favicon.ico icon file that servers can provide. Apart from the initial confusion of seeing additional hits, this is usually not a problem. If the favicon request does begin to interfere, we can handle it as follows: if (request.url === '/favicon.ico') {   console.log('Not found: ' + f);   response.end();   return; } If we wanted to be more polite to the client, we could also inform it of a 404 error by using response.writeHead(404) before issuing response.end. See also The Caching content in memory for immediate delivery recipe The Optimizing performance with streaming recipe The Securing against filesystem hacking exploits recipe Caching content in memory for immediate delivery Directly accessing storage on each client request is not ideal. For this task, we will explore how to enhance server efficiency by accessing the disk only on the first request, caching the data from file for that first request, and serving all further requests out of the process memory. Getting ready We are going to improve upon the code from the previous task, so we'll be working with server.js and in the content directory, with index.html, styles.css, and script.js. How to do it... Let's begin by looking at our following script from the previous recipe Serving Static Files: var http = require('http'); var path = require('path'); var fs = require('fs');    var mimeTypes = {   '.js' : 'text/javascript',   '.html': 'text/html',   '.css' : 'text/css' };   http.createServer(function (request, response) {   var lookup = path.basename(decodeURI(request.url)) ||     'index.html';   var f = 'content/'+lookup;   fs.exists(f, function (exists) {     if (exists) {       fs.readFile(f, function(err,data) {         if (err) {           response.writeHead(500); response.end('Server Error!');           return;         }         var headers = {'Content-type': mimeTypes[path.extname          (lookup)]};         response.writeHead(200, headers);         response.end(data);       });     return;     }     response.writeHead(404); //no such file found!     response.end('Page Not Found');   }); } We need to modify this code to only read the file once, load its contents into memory, and respond to all requests for that file from memory afterwards. To keep things simple and preserve maintainability, we'll extract our cache handling and content delivery into a separate function. So above http.createServer, and below mimeTypes, we'll add the following: var cache = {}; function cacheAndDeliver(f, cb) {   if (!cache[f]) {     fs.readFile(f, function(err, data) {       if (!err) {         cache[f] = {content: data} ;       }       cb(err, data);     });     return;   }   console.log('loading ' + f + ' from cache');   cb(null, cache[f].content); } //http.createServer A new cache object and a new function called cacheAndDeliver have been added to store our files in memory. Our function takes the same parameters as fs.readFile so we can replace fs.readFile in the http.createServer callback while leaving the rest of the code intact as follows: //...inside http.createServer:   fs.exists(f, function (exists) {   if (exists) {     cacheAndDeliver(f, function(err, data) {       if (err) {         response.writeHead(500);         response.end('Server Error!');         return; }       var headers = {'Content-type': mimeTypes[path.extname(f)]};       response.writeHead(200, headers);       response.end(data);     }); return;   } //rest of path exists code (404 handling)... When we execute our server.js file and access localhost:8080 twice, consecutively, the second request causes the console to display the following output: loading content/index.html from cache loading content/styles.css from cache loading content/script.js from cache How it works... We defined a function called cacheAndDeliver, which like fs.readFile, takes a filename and callback as parameters. This is great because we can pass the exact same callback of fs.readFile to cacheAndDeliver, padding the server out with caching logic without adding any extra complexity visually to the inside of the http.createServer callback. As it stands, the worth of abstracting our caching logic into an external function is arguable, but the more we build on the server's caching abilities, the more feasible and useful this abstraction becomes. Our cacheAndDeliver function checks to see if the requested content is already cached. If not, we call fs.readFile and load the data from disk. Once we have this data, we may as well hold onto it, so it's placed into the cache object referenced by its file path (the f variable). The next time anyone requests the file, cacheAndDeliver will see that we have the file stored in the cache object and will issue an alternative callback containing the cached data. Notice that we fill the cache[f] property with another new object containing a content property. This makes it easier to extend the caching functionality in the future as we would just have to place extra properties into our cache[f] object and supply logic that interfaces with these properties accordingly. There's more... If we were to modify the files we are serving, the changes wouldn't be reflected until we restart the server. We can do something about that. Reflecting content changes To detect whether a requested file has changed since we last cached it, we must know when the file was cached and when it was last modified. To record when the file was last cached, let's extend the cache[f] object as follows: cache[f] = {content: data,timestamp: Date.now() // store a Unix                                                 // time stamp }; That was easy! Now let's find out when the file was updated last. The fs.stat method returns an object as the second parameter of its callback. This object contains the same useful information as the command-line GNU (GNU's Not Unix!) coreutils stat. The fs.stat function supplies three time-related properties: last accessed (atime), last modified (mtime), and last changed (ctime). The difference between mtime and ctime is that ctime will reflect any alterations to the file, whereas mtime will only reflect alterations to the content of the file. Consequently, if we changed the permissions of a file, ctime would be updated but mtime would stay the same. We want to pay attention to permission changes as they happen so let's use the ctime property as shown in the following code: //requires and mimeType object.... var cache = {}; function cacheAndDeliver(f, cb) {   fs.stat(f, function (err, stats) {     if (err) { return console.log('Oh no!, Error', err); }     var lastChanged = Date.parse(stats.ctime),     isUpdated = (cache[f]) && lastChanged  > cache[f].timestamp;     if (!cache[f] || isUpdated) {       fs.readFile(f, function (err, data) {         console.log('loading ' + f + ' from file');         //rest of cacheAndDeliver   }); //end of fs.stat } If we're using Node on Windows, we may have to substitute ctime with mtime, since ctime supports at least Version 0.10.12. The contents of cacheAndDeliver have been wrapped in an fs.stat callback, two variables have been added, and the if(!cache[f]) statement has been modified. We parse the ctime property of the second parameter dubbed stats using Date.parse to convert it to milliseconds since midnight, January 1st, 1970 (the Unix epoch) and assign it to our lastChanged variable. Then we check whether the requested file's last changed time is greater than when we cached the file (provided the file is indeed cached) and assign the result to our isUpdated variable. After that, it's merely a case of adding the isUpdated Boolean to the conditional if(!cache[f]) statement via the || (or) operator. If the file is newer than our cached version (or if it isn't yet cached), we load the file from disk into the cache object. See also The Optimizing performance with streaming recipe discussed in this article Optimizing performance with streaming Caching content certainly improves upon reading a file from disk for every request. However, with fs.readFile, we are reading the whole file into memory before sending it out in a response object. For better performance, we can stream a file from disk and pipe it directly to the response object, sending data straight to the network socket a piece at a time. Getting ready We are building on our code from the last example, so let's get server.js, index.html, styles.css, and script.js ready. How to do it... We will be using fs.createReadStream to initialize a stream, which can be piped to the response object. In this case, implementing fs.createReadStream within our cacheAndDeliver function isn't ideal because the event listeners of fs.createReadStream will need to interface with the request and response objects, which for the sake of simplicity would preferably be dealt with in the http.createServer callback. For brevity's sake, we will discard our cacheAndDeliver function and implement basic caching within the server callback as follows: //...snip... requires, mime types, createServer, lookup and f //  vars...   fs.exists(f, function (exists) {   if (exists) {     var headers = {'Content-type': mimeTypes[path.extname(f)]};     if (cache[f]) {       response.writeHead(200, headers);       response.end(cache[f].content);       return;    } //...snip... rest of server code... Later on, we will fill cache[f].content while we are interfacing with the readStream object. The following code shows how we use fs.createReadStream: var s = fs.createReadStream(f); The preceding code will return a readStream object that streams the file, which is pointed at by variable f. The readStream object emits events that we need to listen to. We can listen with addEventListener or use the shorthand on method as follows: var s = fs.createReadStream(f).on('open', function () {   //do stuff when the readStream opens }); Because createReadStream returns the readStream object, we can latch our event listener straight onto it using method chaining with dot notation. Each stream is only going to open once; we don't need to keep listening to it. Therefore, we can use the once method instead of on to automatically stop listening after the first event occurrence as follows: var s = fs.createReadStream(f).once('open', function () {   //do stuff when the readStream opens }); Before we fill out the open event callback, let's implement some error handling as follows: var s = fs.createReadStream(f).once('open', function () {   //do stuff when the readStream opens }).once('error', function (e) {   console.log(e);   response.writeHead(500);   response.end('Server Error!'); }); The key to this whole endeavor is the stream.pipe method. This is what enables us to take our file straight from disk and stream it directly to the network socket via our response object as follows: var s = fs.createReadStream(f).once('open', function () {   response.writeHead(200, headers);   this.pipe(response); }).once('error', function (e) {   console.log(e);   response.writeHead(500);   response.end('Server Error!'); }); But what about ending the response? Conveniently, stream.pipe detects when the stream has ended and calls response.end for us. There's one other event we need to listen to, for caching purposes. Within our fs.exists callback, underneath the createReadStream code block, we write the following code: fs.stat(f, function(err, stats) {   var bufferOffset = 0;   cache[f] = {content: new Buffer(stats.size)};   s.on('data', function (chunk) {     chunk.copy(cache[f].content, bufferOffset);     bufferOffset += chunk.length;   }); }); //end of createReadStream We've used the data event to capture the buffer as it's being streamed, and copied it into a buffer that we supplied to cache[f].content, using fs.stat to obtain the file size for the file's cache buffer. For this case, we're using the classic mode data event instead of the readable event coupled with stream.read() (see http://nodejs.org/api/stream.html#stream_readable_read_size_1) because it best suits our aim, which is to grab data from the stream as soon as possible. How it works... Instead of the client waiting for the server to load the entire file from disk prior to sending it to the client, we use a stream to load the file in small ordered pieces and promptly send them to the client. With larger files, this is especially useful as there is minimal delay between the file being requested and the client starting to receive the file. We did this by using fs.createReadStream to start streaming our file from disk. The fs.createReadStream method creates a readStream object, which inherits from the EventEmitter class. The EventEmitter class accomplishes the evented part pretty well. Due to this, we'll be using listeners instead of callbacks to control the flow of stream logic. We then added an open event listener using the once method since we want to stop listening to the open event once it is triggered. We respond to the open event by writing the headers and using the stream.pipe method to shuffle the incoming data straight to the client. If the client becomes overwhelmed with processing, stream.pipe applies backpressure, which means that the incoming stream is paused until the backlog of data is handled. While the response is being piped to the client, the content cache is simultaneously being filled. To achieve this, we had to create an instance of the Buffer class for our cache[f].content property. A Buffer class must be supplied with a size (or array or string), which in our case is the size of the file. To get the size, we used the asynchronous fs.stat method and captured the size property in the callback. The data event returns a Buffer variable as its only callback parameter. The default value of bufferSize for a stream is 64 KB; any file whose size is less than the value of the bufferSize property will only trigger one data event because the whole file will fit into the first chunk of data. But for files that are greater than the value of the bufferSize property, we have to fill our cache[f].content property one piece at a time. Changing the default readStream buffer size We can change the buffer size of our readStream object by passing an options object with a bufferSize property as the second parameter of fs.createReadStream. For instance, to double the buffer, you could use fs.createReadStream(f,{bufferSize: 128 * 1024});. We cannot simply concatenate each chunk with cache[f].content because this will coerce binary data into string format, which, though no longer in binary format, will later be interpreted as binary. Instead, we have to copy all the little binary buffer chunks into our binary cache[f].content buffer. We created a bufferOffset variable to assist us with this. Each time we add another chunk to our cache[f].content buffer, we update our new bufferOffset property by adding the length of the chunk buffer to it. When we call the Buffer.copy method on the chunk buffer, we pass bufferOffset as the second parameter, so our cache[f].content buffer is filled correctly. Moreover, operating with the Buffer class renders performance enhancements with larger files because it bypasses the V8 garbage-collection methods, which tend to fragment a large amount of data, thus slowing down Node's ability to process them. There's more... While streaming has solved the problem of waiting for files to be loaded into memory before delivering them, we are nevertheless still loading files into memory via our cache object. With larger files or a large number of files, this could have potential ramifications. Protecting against process memory overruns Streaming allows for intelligent and minimal use of memory for processing large memory items. But even with well-written code, some apps may require significant memory. There is a limited amount of heap memory. By default, V8's memory is set to 1400 MB on 64-bit systems and 700 MB on 32-bit systems. This can be altered by running node with --max-old-space-size=N, where N is the amount of megabytes (the actual maximum amount that it can be set to depends upon the OS, whether we're running on a 32-bit or 64-bit architecture—a 32-bit may peak out around 2 GB and of course the amount of physical RAM available). The --max-old-space-size method doesn't apply to buffers, since it applies to the v8 heap (memory allocated for JavaScript objects and primitives) and buffers are allocated outside of the v8 heap. If we absolutely had to be memory intensive, we could run our server on a large cloud platform, divide up the logic, and start new instances of node using the child_process class, or better still the higher level cluster module. There are other more advanced ways to increase the usable memory, including editing and recompiling the v8 code base. The http://blog.caustik.com/2012/04/11/escape-the-1-4gb-v8-heap-limit-in-node-js link has some tips along these lines. In this case, high memory usage isn't necessarily required and we can optimize our code to significantly reduce the potential for memory overruns. There is less benefit to caching larger files because the slight speed improvement relative to the total download time is negligible, while the cost of caching them is quite significant in ratio to our available process memory. We can also improve cache efficiency by implementing an expiration time on cache objects, which can then be used to clean the cache, consequently removing files in low demand and prioritizing high demand files for faster delivery. Let's rearrange our cache object slightly as follows: var cache = {   store: {},   maxSize : 26214400, //(bytes) 25mb } For a clearer mental model, we're making a distinction between the cache object as a functioning entity and the cache object as a store (which is a part of the broader cache entity). Our first goal is to only cache files under a certain size; we've defined cache.maxSize for this purpose. All we have to do now is insert an if condition within the fs.stat callback as follows: fs.stat(f, function (err, stats) {   if (stats.size<cache.maxSize) {     var bufferOffset = 0;     cache.store[f] = {content: new Buffer(stats.size),       timestamp: Date.now() };     s.on('data', function (data) {       data.copy(cache.store[f].content, bufferOffset);       bufferOffset += data.length;     });   } }); Notice that we also slipped in a new timestamp property into our cache.store[f] method. This is for our second goal—cleaning the cache. Let's extend cache as follows: var cache = {   store: {},   maxSize: 26214400, //(bytes) 25mb   maxAge: 5400 * 1000, //(ms) 1 and a half hours   clean: function(now) {     var that = this;     Object.keys(this.store).forEach(function (file) {       if (now > that.store[file].timestamp + that.maxAge) {         delete that.store[file];       }     });   } }; So in addition to maxSize, we've created a maxAge property and added a clean method. We call cache.clean at the bottom of the server with the help of the following code: //all of our code prior   cache.clean(Date.now()); }).listen(8080); //end of the http.createServer The cache.clean method loops through the cache.store function and checks to see if it has exceeded its specified lifetime. If it has, we remove it from the store. One further improvement and then we're done. The cache.clean method is called on each request. This means the cache.store function is going to be looped through on every server hit, which is neither necessary nor efficient. It would be better if we clean the cache, say, every two hours or so. We'll add two more properties to cache—cleanAfter to specify the time between cache cleans, and cleanedAt to determine how long it has been since the cache was last cleaned, as follows: var cache = {   store: {},   maxSize: 26214400, //(bytes) 25mb   maxAge : 5400 * 1000, //(ms) 1 and a half hours   cleanAfter: 7200 * 1000,//(ms) two hours   cleanedAt: 0, //to be set dynamically   clean: function (now) {     if (now - this.cleanAfter>this.cleanedAt) {       this.cleanedAt = now;       that = this;       Object.keys(this.store).forEach(function (file) {         if (now > that.store[file].timestamp + that.maxAge) {           delete that.store[file];         }       });     }   } }; So we wrap our cache.clean method in an if statement, which will allow a loop through cache.store only if it has been longer than two hours (or whatever cleanAfter is set to) since the last clean. See also The Securing against filesystem hacking exploits recipe discussed in this article Securing against filesystem hacking exploits For a Node app to be insecure, there must be something an attacker can interact with for exploitation purposes. Due to Node's minimalist approach, the onus is on the programmer to ensure that their implementation doesn't expose security flaws. This recipe will help identify some security risk anti-patterns that could occur when working with the filesystem. Getting ready We'll be working with the same content directory as we did in the previous recipes. But we'll start a new insecure_server.js file (there's a clue in the name!) from scratch to demonstrate mistaken techniques. How to do it... Our previous static file recipes tend to use path.basename to acquire a route, but this ignores intermediate paths. If we accessed localhost:8080/foo/bar/styles.css, our code would take styles.css as the basename property and deliver content/styles.css to us. How about we make a subdirectory in our content folder? Call it subcontent and move our script.js and styles.css files into it. We'd have to alter our script and link tags in index.html as follows: <link rel=stylesheet type=text/css href=subcontent/styles.css> <script src=subcontent/script.js type=text/javascript></script> We can use the url module to grab the entire pathname property. So let's include the url module in our new insecure_server.js file, create our HTTP server, and use pathname to get the whole requested path as follows: var http = require('http'); var url = require('url'); var fs = require('fs');   http.createServer(function (request, response) {   var lookup = url.parse(decodeURI(request.url)).pathname;   lookup = (lookup === "/") ? '/index.html' : lookup;   var f = 'content' + lookup;   console.log(f);   fs.readFile(f, function (err, data) {     response.end(data);   }); }).listen(8080); If we navigate to localhost:8080, everything works great! We've gone multilevel, hooray! For demonstration purposes, a few things have been stripped out from the previous recipes (such as fs.exists); but even with them, this code presents the same security hazards if we type the following: curl localhost:8080/../insecure_server.js Now we have our server's code. An attacker could also access /etc/passwd with a few attempts at guessing its relative path as follows: curl localhost:8080/../../../../../../../etc/passwd If we're using Windows, we can download and install curl from http://curl.haxx.se/download.html. In order to test these attacks, we have to use curl or another equivalent because modern browsers will filter these sort of requests. As a solution, what if we added a unique suffix to each file we wanted to serve and made it mandatory for the suffix to exist before the server coughs it up? That way, an attacker could request /etc/passwd or our insecure_server.js file because they wouldn't have the unique suffix. To try this, let's copy the content folder and call it content-pseudosafe, and rename our files to index.html-serve, script.js-serve, and styles.css-serve. Let's create a new server file and name it pseudosafe_server.js. Now all we have to do is make the -serve suffix mandatory as follows: //requires section ...snip... http.createServer(function (request, response) {   var lookup = url.parse(decodeURI(request.url)).pathname;   lookup = (lookup === "/") ? '/index.html-serve'     : lookup + '-serve';   var f = 'content-pseudosafe' + lookup; //...snip... rest of the server code... For feedback purposes, we'll also include some 404 handling with the help of fs.exists as follows: //requires, create server etc fs.exists(f, function (exists) {   if (!exists) {     response.writeHead(404);     response.end('Page Not Found!');     return;   } //read file etc So, let's start our pseudosafe_server.js file and try out the same exploit by executing the following command: curl -i localhost:8080/../insecure_server.js We've used the -i argument so that curl will output the headers. The result? A 404, because the file it's actually looking for is ../insecure_server.js-serve, which doesn't exist. So what's wrong with this method? Well it's inconvenient and prone to error. But more importantly, an attacker can still work around it! Try this by typing the following: curl localhost:8080/../insecure_server.js%00/index.html And voilà! There's our server code again. The solution to our problem is path.normalize, which cleans up our pathname before it gets to fs.readFile as shown in the following code: http.createServer(function (request, response) {   var lookup = url.parse(decodeURI(request.url)).pathname;   lookup = path.normalize(lookup);   lookup = (lookup === "/") ? '/index.html' : lookup;   var f = 'content' + lookup } Prior recipes haven't used path.normalize and yet they're still relatively safe. The path.basename method gives us the last part of the path, thus removing any preceding double dot paths (../) that would take an attacker higher up the directory hierarchy than should be allowed. How it works... Here we have two filesystem exploitation techniques: the relative directory traversal and poison null byte attacks. These attacks can take different forms, such as in a POST request or from an external file. They can have different effects—if we were writing to files instead of reading them, an attacker could potentially start making changes to our server. The key to security in all cases is to validate and clean any data that comes from the user. In insecure_server.js, we pass whatever the user requests to our fs.readFile method. This is foolish because it allows an attacker to take advantage of the relative path functionality in our operating system by using ../, thus gaining access to areas that should be off limits. By adding the -serve suffix, we didn't solve the problem, we put a plaster on it, which can be circumvented by the poison null byte. The key to this attack is the %00 value, which is a URL hex code for the null byte. In this case, the null byte blinds Node to the ../insecure_server.js portion, but when the same null byte is sent through to our fs.readFile method, it has to interface with the kernel. But the kernel gets blinded to the index.html part. So our code sees index.html but the read operation sees ../insecure_server.js. This is known as null byte poisoning. To protect ourselves, we could use a regex statement to remove the ../ parts of the path. We could also check for the null byte and spit out a 400 Bad Request statement. But we don't have to, because path.normalize filters out the null byte and relative parts for us. There's more... Let's further delve into how we can protect our servers when it comes to serving static files. Whitelisting If security was an extreme priority, we could adopt a strict whitelisting approach. In this approach, we would create a manual route for each file we are willing to deliver. Anything not on our whitelist would return a 404 error. We can place a whitelist array above http.createServer as follows: var whitelist = [   '/index.html',   '/subcontent/styles.css',   '/subcontent/script.js' ]; And inside our http.createServer callback, we'll put an if statement to check if the requested path is in the whitelist array, as follows: if (whitelist.indexOf(lookup) === -1) {   response.writeHead(404);   response.end('Page Not Found!');   return; } And that's it! We can test this by placing a file non-whitelisted.html in our content directory and then executing the following command: curl -i localhost:8080/non-whitelisted.html This will return a 404 error because non-whitelisted.html isn't on the whitelist. Node static The module's wiki page (https://github.com/joyent/node/wiki/modules#wiki-web-frameworks-static) has a list of static file server modules available for different purposes. It's a good idea to ensure that a project is mature and active before relying upon it to serve your content. The node-static module is a well-developed module with built-in caching. It's also compliant with the RFC2616 HTTP standards specification, which defines how files should be delivered over HTTP. The node-static module implements all the essentials discussed in this article and more. For the next example, we'll need the node-static module. You could install it by executing the following command: npm install node-static The following piece of code is slightly adapted from the node-static module's GitHub page at https://github.com/cloudhead/node-static: var static = require('node-static'); var fileServer = new static.Server('./content'); require('http').createServer(function (request, response) {   request.addListener('end', function () {     fileServer.serve(request, response);   }); }).listen(8080); The preceding code will interface with the node-static module to handle server-side and client-side caching, use streams to deliver content, and filter out relative requests and null bytes, among other things. Summary To learn more about Node.js and creating web servers, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended: Node Cookbook Second Edition (https://www.packtpub.com/web-development/node-cookbook-second-edition) Node.js Design Patterns (https://www.packtpub.com/web-development/nodejs-design-patterns) Node Web Development Second Edition (https://www.packtpub.com/web-development/node-web-development-second-edition) Resources for Article: Further resources on this subject: Working With Commands And Plugins [article] Node.js Fundamentals And Asynchronous Javascript [article] Building A Movie API With Express [article]
Read more
  • 0
  • 0
  • 16435
article-image-configuring-openai-and-azure-openai-in-power-bi
Greg Beaumont
16 Oct 2023
9 min read
Save for later

Configuring OpenAI and Azure OpenAI in Power BI

Greg Beaumont
16 Oct 2023
9 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, Power BI Machine Learning and OpenAI, by Greg Beaumont. Master core data architecture design concepts and Azure Data & AI services to gain a cloud data and AI architect’s perspective to developing end-to-end solutions IntroductionIn this article, we delve into the exciting world of Power BI integration with OpenAI and Azure OpenAI. Data-driven decision-making is at the core of modern business, and harnessing the capabilities of AI models for generating text adds an invaluable dimension to your insights. Whether you're new to OpenAI or exploring the power of Azure OpenAI, we'll guide you through the technical requirements, API key setup, resource management, and dataflow optimization to seamlessly infuse AI-generated content into your Power BI projects. Let's embark on a journey to supercharge your data analytics capabilities and stay ahead in the ever-evolving world of data science.Technical requirementsFor this article, you’ll need the following:An account with the original open source OpenAI: https://openai.com/. • Optional – Azure OpenAI as part of your Azure subscription: https://azure.microsoft. com/en-us/products/cognitive-services/openai-service. The book is written so this is optional since it is not available to everyone at the time of publication.FAA Wildlife Strike data files from either the FAA website or the Packt GitHub site.• A Power BI Pro license.• One of the following Power BI licensing options for access to Power BI dataflows:Power BI PremiumPower BI Premium Per UserConfiguring OpenAI and Azure OpenAI for use in your Power BI solutionPrior to proceeding with the configuration of OpenAI and Azure OpenAI, it is important to note that OpenAI is still a nascent technology at the time of writing this book. In the future, the integration of OpenAI with Power BI may become less technical, as advancements in the technology continue to be made. However, the use cases that will be demonstrated in this chapter will remain applicable.As such, the instructions provided in this chapter will showcase how this integration can be used to enhance your data analytics capabilities in the context of Power BI.Configuring OpenAIYou can create an account in OpenAI (if you do not have one already) from this link: https:// chat.openai.com/auth/login. At the time of writing, new accounts are granted trial credits to begin using OpenAI. If you run out of trial credits, or if the trial is no longer offered after this book has been written, you may need to pay for the use of OpenAI. Pricing details can be found at this link: https://openai.com/pricing.Once you have an OpenAI account, you will need to create an API key that will be used to authenticate your API calls. An API key can be easily created at this link: https://platform.openai.com/ account/api-keys. Clicking on Create new secret key will allow you to create a new key for API calls that you make later in this chapter. This book will use abc123xyz as an example key for the sample code. Be sure to use the actual Key from OpenAI, and not the Key Name.Once you have an account and an API key, you are ready to go with OpenAI for this book!Configuring Microsoft Azure OpenAIOpenAI is also available as a service in Microsoft Azure. By using the Microsoft Azure OpenAI Service, users can leverage large-scale AI models with the benefits of Azure, such as role-based access security, private networks, and comprehensive security tools that integrate with other Microsoft tools in Azure. Billing and governance can be centralized for large organizations to help ensure the responsible use of AI.For the purposes of this book, Azure OpenAI is optional as an alternative to the original OpenAI. Azure OpenAI may not be available to everyone since it is a new technology with high demand. All of the content for the workshop can be done with either OpenAI or Azure OpenAI.Instructions for setting up Azure OpenAI can be found at this link: https://learn.microsoft. com/en-us/azure/cognitive-services/openai/how-to/create-resource/.Once you’ve created a resource, you can also deploy a model per the instructions at that link. As noted in Chapter 12, you will be using the text-davinci-003 model for the workshop associated with this chapter. OpenAI is evolving rapidly, and you may be able to choose different models at the time you are reading this book. Take note of the following values when walking through these steps; they will be needed later in this chapter:Resource name: Note the name of your Azure OpenAI resource in your subscription. This book will use PBI_OpenAI_project for the examples in this chapter.Deployment name: This is the name of the resource for the text-davinci-003 model deployment. This book will use davinci-PBIML for names of deployments in examples of code.Next, you’ll need to create a key for your Azure OpenAI API calls. From your Azure OpenAI resource, named PBI_OpenAI_project for this book, go to Resource management | Keys and endpoint, and your keys will be on that page. This book will use abc123xyz as an example key for the sample code.Once you have either OpenAI or Azure OpenAI set up and ready to go, you can add some new generative text capabilities to your project using FAA Wildlife Strike data!Preparing a Power BI dataflow for OpenAI and Azure OpenAIIn Chapter 12, you decided to use OpenAI for two use cases with your FAA Wildlife Strike database project:Generating descriptions of airplane models and the operator of the aircraft, for each incidentSummarizing the free text remarks provided in the report for each incidentSince OpenAI is still new at the time of writing this book, Power BI does not yet have connectors built into the product. But you can still call OpenAI and Azure OpenAI APIs from both Power Query and Power BI dataflows using custom M scripts. Let’s get started!First, you will create a new dataflow for use with OpenAI and Cognitive Services in Power BI:1. From your Power BI workspace, on the ribbon, select New | Dataflow.2. Select Define new tables | Link tables from other dataflows.3. Sign in and click Next.4. Expand your workspace.5. Expand the Strike Reports dataflow and check Strike Reports Curated New.6. Click Transform Data.7. Create a group named Sources and move Strike Reports Curated New into that group.8. Right-click Strike Reports Curated New and unselect Enable load.Next, you will create a version of the query that will be used with OpenAI and Cognitive Services:1. Right-click on Strike Reports Curated New and select Reference.2. Rename the new query Strike Reports Curated New OpenAI.3. Create a group named OpenAI and move Strike Reports Curated New OpenAI into the group.In Chapter 12, you decided to use the FAA Wildlife Strike Operator, Aircraft, Species, and Remarks database columns as part of your OpenAI prompts. Filtering out blank and unknown values from Strike Reports Curated New OpenAI will help produce better results for your testing. Note that you may need to select Load more... if the values all come up empty or UNKNOWN:1. For the Operator column, filter out the UNKNOWN, UNKNOWN COMMERCIAL, BUSINESS, and PRIVATELY OWNED values.2. For the Aircraft column, filter out UNKNOWN.3. For the Species column, filter out Unknown bird, Unknown bird – large, Unknown bird – medium, Unknown bird – small, and Unknown bird or bat.For the Remarks column, filter out (blank).Finally – this step is optional – you can filter the number of rows for testing purposes. Both OpenAI and Azure OpenAI can run up a bill, so limiting the number of calls for this workshop makes sense. For the example in this book, the Strike Reports Curated New OpenAI table will be filtered to events happening in or after December 2022, which can be filtered using the Incident Date column.Now you are ready to add OpenAI and Cognitive Services content to your data!ConclusionIn conclusion, configuring OpenAI and Azure OpenAI for integration with Power BI offers valuable enhancements to your data analytics capabilities. While OpenAI is still an evolving technology, the instructions provided in this article remain relevant and applicable. Whether you choose OpenAI or Azure OpenAI, both options empower you to leverage AI models effectively within Power BI.Setting up these services involves creating API keys, resources, and deployments, as outlined in the article. Additionally, preparing your Power BI dataflow for OpenAI and Azure OpenAI is a crucial step. You can filter and optimize your data to improve the quality of AI-generated content.As AI continues to advance, the potential for enhancing data analytics with OpenAI grows, and these configurations provide a strong foundation for leveraging generative text capabilities in your projects.Author BioGreg Beaumont is a Data Architect at Microsoft; Greg is an expert in solving complex problems and creating value for customers. With a focus on the healthcare industry, Greg works closely with customers to plan enterprise analytics strategies, evaluate new tools and products, conduct training sessions and hackathons, and architect solutions that improve the quality of care and reduce costs. With years of experience in data architecture and a passion for innovation, Greg has a unique ability to identify and solve complex challenges. He is a trusted advisor to his customers and is always seeking new ways to drive progress and help organizations thrive. For more than 15 years, Greg has worked with healthcare customers who strive to improve patient outcomes and find opportunities for efficiencies. He is a veteran of the Microsoft data speaker network and has worked with hundreds of customers on their data management and analytics strategies.
Read more
  • 0
  • 0
  • 16425

article-image-introduction-to-gen-ai-studio
Anubhav Singh
07 Sep 2023
6 min read
Save for later

Introduction to Gen AI Studio

Anubhav Singh
07 Sep 2023
6 min read
In this article, we’ll explore the basics of Generative AI Studio and how to run a language model within this suite with practical example. Generative AI Studio is the all-encompassing offering of generative AI-based services on Google Cloud. It includes models of different types, allowing users to generate content that may be - text, image, or audio. On the Generative AI Studio, or Gen AI Studio, users can rapidly prototype and test different types of prompts associated with the different types of models to figure out which parameters and settings work best for their use cases. Then, they can easily shift the tested configurations to the code bases of their solutions. Model Garden on the other hand provides a collection of foundation and customized generative AI models which can be used directly as models in code or as APIs. The foundation models are based on the models that have been trained by Google themselves, whereas the fine-tuned/task-specific models include models that have been developed and trained by third parties. Gen AI Studio  Packaged within Vertex AI, the Generative AI Studio on Google Cloud Platform provides low-code solutions for developing and testing invocations over Google’s AI models that can then be used within customer’s solutions. As of August 2023, the following solutions are a part of the Generative AI Studio -  Language: Models used to generate text-based responses. The models may be generating answers to questions, performing classification, recognizing sentiment, or anything that involves text understanding. Vision: The models are used to generate images/visual content with different types of drawing styles Speech: The speech models perform either speech-to-text conversation or text-to-speech conversion. Let’s explore each one of these in detail. The language models in Gen AI studio are based on the PaLM 2 for Text models and are currently in the form of either “text-bison” or “chat-bison”. The first type of model is the base model which allows performing any kind of tasks related to text understanding and generation. “Chat-bison” models on the other hand are focused on providing a conversational interface for interacting with the model. Thus, they are more suitable for tasks that require a conversation to happen between the model user and the model. There’s another form of the PaLM2 models available as “code-bison” which powers the Codey product suite. This deals with programming languages instead of human languages. Let’s take a look at how we can use a language model in Gen AI Studio. Follow the steps below: 1. First, head over to https://console.cloud.google.com/vertex-ai/generative on your browser with a Billing enabled Google Cloud account. You will be able to see the Generative AI Studio dashboard.   2. Next, click “Open” in the card titled “Language”. 3. Then, click on “Text Prompt” to open the prompt builder interface. The interface should look similar to the image below, however, being an actively developed product, it may change in several ways in the future.   4. Now, let us write a prompt. For our example, we’ll instruct the model to fact check whatever is passed to it. Here’s a sample prompt: You're a Fact Checker Bot. Whatever the user says, fact check it and say any of the following:  1. "This is a fact" if the statement by the user is a true fact. 2. "This is not a fact" if the user's statement is not classifiable as a fact. 3. "This is a myth" if the user's state is a false fact. User:  5. Now, write the user’s part as well and hit the Submit button. The last line of the prompt would now be:  User: I am eating an apple.6. Observe the response. Then, change the user’s part to “I am an apple” and “I am a human”. Observe the response in each case. The following output table is expected: Once we’re satisfied with the model responses based on our prompt, we can shift the model invocation to code. In our example, we’ll do it on Google Colaboratory. Follow the steps below: 1. Open Google Colaboratory by visiting: https://colab.research.google.com/ 2. In the first cell, we’ll install the required libraries for using Gen AI Studio models %%capture  !pip install "shapely<2.0.0"  !pip install google-cloud-aiplatform --upgrade  3. Next, we’ll authenticate the Colab notebook to be able to access the resources available on Google Cloud to the currently logged in user. from google.colab import auth as google_auth  google_auth.authenticate_user() 4. Then we import the required libraries. import vertexai  from vertexai.language_models import TextGenerationModel  5. Now, we instantiate the VertexAI client to work with the project. Take note to replace the PROJECT_ID with your own project’s ID on Google Cloud vertexai.init(project=PROJECT_ID, location="us-central1")  6. Let us now set the configurations that the model will use while answering to our prompts and initialize the model client parameters = {      "candidate_count": 1,      "max_output_tokens": 256,      "temperature": 0,      "top_p": 0.8,      "top_k": 40  }  model = TextGenerationModel.from_pretrained("text-bison@001")  7. Now, we can call the model and observe the response by printing it response = model.predict(      """You\'re a Fact Checker Bot. Whatever the user says, fact check it and say any of the following: 1. \"This is a fact\" if the statement by the user is a true fact.  2. \"This is not a fact\" if the user\'s statement is not classifiable as a fact.  3. \"This is a myth\" if the user\'s state is a false fact.  User: I am a human""",      **parameters  )  print(f"Response from Model: {response.text}")  You can similarly work with the other models available in Gen AI Studio. In this notebook, we’ve provided an example each of Language, Vision and Speech model usage: GenAIStudio&ModelGarden.ipynb  Author BioAnubhav Singh, Co-founder of Dynopii & Google Dev Expert in Google Cloud, is a  seasoned developer since the pre-Bootstrap era, Anubhav has extensive experience as a freelancer and AI startup founder. He authored "Hands-on Python Deep Learning for Web" and "Mobile Deep Learning with TensorFlow Lite, ML Kit, and Flutter." A Google Developer Expert in GCP, he co-organizes for TFUG Kolkata community and formerly led the team at GDG Cloud Kolkata. Anubhav is often found discussing System Architecture, Machine Learning, and Web technologies.
Read more
  • 0
  • 0
  • 16419
Modal Close icon
Modal Close icon