Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-exchange-server-2010-windows-powershell-troubleshooting-mailboxes
Packt
22 Jul 2011
7 min read
Save for later

Exchange Server 2010 Windows PowerShell: Troubleshooting Mailboxes

Packt
22 Jul 2011
7 min read
Microsoft Exchange 2010 PowerShell Cookbook Manage and maintain your Microsoft Exchange 2010 environment with the Exchange Management Shell and Windows PowerShell 2.0 using this book and eBook The reader will benefit from referring two previous articles: Managing Mailboxes and Reporting on Mailbox. Performing some basic steps To work with the code samples in this article, follow these steps to launch the Exchange Management Shell: Log onto a workstation or server with the Exchange Management Tools installed. Open the Exchange Management Shell by clicking on Start | All Programs | Exchange Server 2010. Click on the Exchange Management Shell shortcut. Checking mailbox logon statistics If you have worked with Exchange 2000 or 2003, you probably remember that you could easily view several mailbox-related details for each mailbox under the Logons node of the Exchange System Manager. These details included the user-name, last access time, and more. When viewing mailboxes in the Exchange Management Console in Exchange 2010, these details are not displayed. In this recipe, we will take a look at how we can gather some of this information the Get-LogonStatistics cmdlet. How to do it... The following command will provide a logon statistics report for all mailboxes in the organization: Get-MailboxServer | Get-LogonStatistics | Select UserName,ApplicationId,ClientVersion,LastAccessTime How it works... The Get-LogonStatistics cmdlet c an be useful for doing some basic checks on client logons, but the information returned from the previous command can be a little confusing and might seem inaccurate. For example, the ClientVersion property returned for each logon will always be reported as the same version number for end-user logons. This is due to the fact that client connections go through the Client Access role in Exchange 2010. Whether or not this will be fixed in future versions is unknown. The ApplicationId property will indicate whether clients are connected via RPC or through Outlook Web App. Keep in mind that, depending on the client, multiple connections could be reported. Client's applications initiate multiple connections, so you will likely notice that this cmdlet will return anywhere from three to five records for each user connected to a mailbox. You will also see connections where the username is reported as the name of one or more databases or a system mailbox. These are generated by transport servers and mailbox assistant agents. There's more... There are a couple of other ways you can run this cmdlet. First, you can generate a report for an individual user. Instead of selecting individual properties, you can pipe the command to Format-List with a wildcard to display all of them: Get-LogonStatistics -Identity testuser | Format-List * You can also retrieve the logon statistics for a particular database using the -Database parameter: Get-LogonStatistics -Database DB1 When users access their mailbox through Outlook Web App you may find that logon statistics for these sessions are missing or not what you would expect when running the Get-LogonStatistics cmdlet. This is because OWA users are not continuously connected to the Exchange server and the OWA client only connects to the server as needed to perform a task. Setting storage quotas for mailboxes One thing that has been around for several versions of Exchange is the concept of storage quotas. Using quotas, we can control the size of each mailbox to ensure that our mailbox databases don't grow out of control. In addition to setting storage quotas at the database level, we can also configure storage quotas on a per-mailbox basis. In this recipe, we will take a look at how to configure mailbox storage quotas from the Exchange Management Shell. How to do it... Use the following command syntax to set custom limits on mailbox: Set-Mailbox -Identity testuser ` -IssueWarningQuota 1gb ` -ProhibitSendQuota 1.5gb ` -ProhibitSendReceiveQuota 2gb ` -UseDatabaseQuotaDefaults $false How it works... The Set-Mailbox cmdlet is used to configure the quota warning and send and receive limits for each mailbox. In this example, we are setting the -IssueWarningQuota parameter to one gigabyte. When the user's mailbox exceeds this size, they will receive a warning message from the system that they are approaching their quota limit. The -ProhibitSendQuota is set to 1.5 gigabytes, and when the total mailbox size exceeds this limit, the user will no longer be able to send messages, although new incoming e-mail messages will still be received. We've set the -ProhibitSendReceiveQuota parameter value to two gigabytes. Once this mailbox reaches this size, the user will no longer be able to send or receive mail. It's important to point out here that we have disabled the option to inherit the storage quota limits from the database by setting the -UseDatabaseQuotaDefaults to $false. If this setting were set to $true, the custom mailbox quota settings would not be used. There's more... By default, mailboxes are configured to inherit their storage quota limits from their parent database. In most cases, this is ideal since you can centrally control the settings for each mailbox in a particular database. However, it is unlikely that having single quota limit for the entire organization will be sufficient. For example, you will probably have a group of managers, VIP users, or executives that require a larger amount of space for their mailboxes. Even though you could create a separate database for these users with higher quota values, this might not make sense in your environment, and instead, you may want to override the database quota defaults with a custom setting on an individual basis. Let's say that all users with their Title set to Manager should have a custom quota setting. We can use the following commands to make this change in bulk: Get-User -RecipientTypeDetails UserMailbox ` -Filter {Title -eq 'Manager'} | Set-Mailbox -IssueWarningQuota 2gb ` -ProhibitSendQuota 2.5gb ` -ProhibitSendReceiveQuota 3gb ` -UseDatabaseQuotaDefaults $false What we are doing here is searching Active Directory with the Get-User cmdlet and filtering the results so that only mailbox-enabled users with their title set to Manager are returned. This command is piped further to get the Set-Mailbox cmdlet which configures the mailbox quota values and disables the option to use the database quota defaults. Finding inactive mailboxes If you support a large Exchange environment, it's likely that users come and go frequently. In this case, it's quite possible over time that you will end up with multiple unused mailboxes. In this recipe, you will learn a couple of techniques used when searching for inactive mailboxes with the Exchange Management Shell. How to do it... The following command will retrieve a list of mailboxes that have not been logged on to in over 90 days: $mailboxes = Get-Mailbox -ResultSize Unlimited $mailboxes | ?{ (Get-MailboxStatistics $_).LastLogonTime -and ` (Get-MailboxStatistics $_).LastLogonTime -le ` (Get-Date).AddDays(-90) } How it works... You can see here that we're retrieving all of the mailboxes in the organization using the Get-Mailbox cmdlet and storing the results in the $mailboxes variable. We then pipe this collection to the Where-Object cmdlet (using the ? alias) and use the Get-MailboxStatistics cmdlet to build a filter. This first part of this filter indicates that we only want to retrieve mailboxes that have a value set for the LastLogonTime property. If this value is $null, it indicates that these mailboxes have never been used, and have probably been recently created, which means that they will probably soon become active mailboxes. The second part of the filter compares the value for the LastLogonTime. If that value is less than or equal to the date 90 days ago then we have a match and the mailbox will be returned. There's more... Finding unused mailboxes in your environment might be as simple as searching for disabled user accounts in Active Directory that are mailbox-enabled. If that is the case, you can use the following one-liner to discover these mailboxes: Get-User -ResultSize Unlimited -RecipientTypeDetails UserMailbox | ?{$_.UserAccountControl -match 'AccountDisabled'} This command uses the Get-User cmdlet to search through all of the mailbox-enabled users in Active Directory. Next, we filter the results even further by piping those results to the Where-Object cmdlet to find any mailboxes where the UserAccountControl property contains the AccountDisabled value, indicating that the associated Active Directory user account has been disabled.
Read more
  • 0
  • 0
  • 9413

article-image-creating-dynamic-reports-databases-using-jasperreports-35-2
Packt
05 Oct 2009
10 min read
Save for later

Creating Dynamic Reports from Databases Using JasperReports 3.5

Packt
05 Oct 2009
10 min read
Datasource definition A datasource is what JasperReports uses to obtain data for generating a report. Data can be obtained from databases, XML files, arrays of objects, collections of objects, and XML files. In this article, we will focus on using databases as a datasource. Database for our reports We will use a MySQL database to obtain data for our reports. The database is a subset of public domain data that can be downloaded from http://dl.flightstats.us. The original download is 1.3 GB, so we deleted most of the tables and a lot of data to trim the download size considerably. MySQL dump of the modified database can be found as part of code download at http://www.packtpub.com/files/code/8082_Code.zip. The flightstats database contains the following tables: aircraft aircraft_models aircraft_types aircraft_engines aircraft_engine_types The database structure can be seen in the following diagram: The flightstats database uses the default MyISAM storage engine for the MySQL RDBMS, which does not support referential integrity (foreign keys). That is why we don't see any arrows in the diagram indicating dependencies between the tables. Let's create a report that will show the most powerful aircraft in the database. Let's say, those with horsepower of 1000 or above. The report will show the aircraft tail number and serial number, the aircraft model, and the aircraft's engine model. The following query will give us the required results: SELECT a.tail_num, a.aircraft_serial, am.model as aircraft_model, ae.model AS engine_model FROM aircraft a, aircraft_models am, aircraft_engines ae WHERE a.aircraft_engine_code in (select aircraft_engine_code from aircraft_engines where horsepower >= 1000) AND am.aircraft_model_code = a.aircraft_model_code AND ae.aircraft_engine_code = a.aircraft_engine_code The above query retrieves the following data from the database: Generating database reports There are two ways to generate database reports—either by embedding SQL queries into the JRXML report template or by passing data from the database to the compiled report through a datasource. We will discuss both of these techniques. We will first create the report by embedding the query into the JRXML template. Then, we will generate the same report by passing it through a datasource containing the database data. Embedding SQL queries into a report template JasperReports allows us to embed database queries into a report template. This can be achieved by using the <queryString> element of the JRXML file. The following example demonstrates this technique: <?xml version="1.0" encoding="UTF-8" ?> <jasperReport xsi_schemaLocation="http://jasperreports.sourceforge.net/jasperreports http://jasperreports.sourceforge.net/xsd/jasperreport.xsd" name="DbReport"> <queryString> <![CDATA[select a.tail_num, a.aircraft_serial, am.model as aircraft_model, ae.model as engine_model from aircraft a, aircraft_models am, aircraft_engines ae where a.aircraft_engine_code in ( select aircraft_engine_code from aircraft_engines where horsepower >= 1000) and am.aircraft_model_code = a.aircraft_model_code and ae.aircraft_engine_code = a.aircraft_engine_code]]> </queryString> <field name="tail_num" class="java.lang.String" /> <field name="aircraft_serial" class="java.lang.String" /> <field name="aircraft_model" class="java.lang.String" /> <field name="engine_model" class="java.lang.String" /> <pageHeader> <band height="30"> <staticText> <reportElement x="0" y="0" width="69" height="24" /> <textElement verticalAlignment="Bottom" /> <text> <![CDATA[Tail Number: ]]> </text> </staticText> <staticText> <reportElement x="140" y="0" width="79" height="24" /> <text> <![CDATA[Serial Number: ]]> </text> </staticText> <staticText> <reportElement x="280" y="0" width="69" height="24" /> <text> <![CDATA[Model: ]]> </text> </staticText> <staticText> <reportElement x="420" y="0" width="69" height="24" /> <text> <![CDATA[Engine: ]]> </text> </staticText> </band> </pageHeader> <detail> <band height="30"> <textField> <reportElement x="0" y="0" width="69" height="24" /> <textFieldExpression class="java.lang.String"> <![CDATA[$F{tail_num}]]> </textFieldExpression> </textField> <textField> <reportElement x="140" y="0" width="69" height="24" /> <textFieldExpression class="java.lang.String"> <![CDATA[$F{aircraft_serial}]]> </textFieldExpression> </textField> <textField> <reportElement x="280" y="0" width="69" height="24" /> <textFieldExpression class="java.lang.String"> <![CDATA[$F{aircraft_model}]]> </textFieldExpression> </textField> <textField> <reportElement x="420" y="0" width="69" height="24" /> <textFieldExpression class="java.lang.String"> <![CDATA[$F{engine_model}]]> </textFieldExpression> </textField> </band> </detail> </jasperReport> The <queryString> element is used to embed a database query into the report template. In the given code example, the <queryString> element contains the query wrapped in a CDATA block for execution. The <queryString> element has no attributes or subelements other than the CDATA block containing the query. Text wrapped inside an XML CDATA block is ignored by the XML parser. As seen in the given example, our query contains the > character, which would invalidate the XML block if it wasn't inside a CDATA block. A CDATA block is optional if the data inside it does not break the XML structure. However, for consistency and maintainability, we chose to use it wherever it is allowed in the example. The <field> element defines fields that are populated at runtime when the report is filled. Field names must match the column names or alias of the corresponding columns in the SQL query. The class attribute of the <field> element is optional; its default value is java.lang.String. Even though all of our fields are strings, we still added the class attribute for clarity. In the last example, the syntax to obtain the value of a report field is $F{field_name}, where field_name is the name of the field as defined. The next element that we'll discuss is the <textField> element. Text fields are used to display dynamic textual data in reports. In this case, we are using them to display the value of the fields. Like all the subelements of <band>, text fields must contain a <reportElement> subelement indicating the text field's height, width, and x, y coordinates within the band. The data that is displayed in text fields is defined by the <textFieldExpression> subelement of <textField>. The <textFieldExpresson> element has a single subelement, which is the report expression that will be displayed by the text field and wrapped in an XML CDATA block. In this example, each text field is displaying the value of a field. Therefore, the expression inside the <textFieldExpression> element uses the field syntax $F{field_name}, as explained before. Compiling a report containing a query is no different from compiling a report without a query. It can be done programmatically or by using the custom JasperReports jrc ANT task. Generating the report As we have mentioned previously, in JasperReports terminology, the action of generating a report from a binary report template is called filling the report. To fill a report containing an embedded database query, we must pass a database connection object to the report. The following example illustrates this process: package net.ensode.jasperbook; import java.sql.Connection; import java.sql.DriverManager; import java.sql.SQLException; import java.util.HashMap; import net.sf.jasperreports.engine.JRException; import net.sf.jasperreports.engine.JasperFillManager; public class DbReportFill { Connection connection; public void generateReport() { try { Class.forName("com.mysql.jdbc.Driver"); connection = DriverManager.getConnection("jdbc:mysql: //localhost:3306/flightstats?user=user&password=secret"); System.out.println("Filling report..."); JasperFillManager.fillReportToFile("reports/DbReport. jasper", new HashMap(), connection); System.out.println("Done!"); connection.close(); } catch (JRException e) { e.printStackTrace(); } catch (ClassNotFoundException e) { e.printStackTrace(); } catch (SQLException e) { e.printStackTrace(); } } public static void main(String[] args) { new DbReportFill().generateReport(); } } As seen in this example, a database connection is passed to the report in the form of a java.sql.Connection object as the last parameter of the static JasperFillManager.fillReportToFile() method. The first two parameters are as follows: a string (used to indicate the location of the binary report template or jasper file) and an instance of a class implementing the java.util.Map interface (used for passing additional parameters to the report). As we don't need to pass any additional parameters for this report, we used an empty HashMap. There are six overloaded versions of the JasperFillManager.fillReportToFile() method, three of which take a connection object as a parameter. For simplicity, our examples open and close database connections every time they are executed. It is usually a better idea to use a connection pool, as connection pools increase the performance considerably. Most Java EE application servers come with connection pooling functionality, and the commons-dbcp component of Apache Commons includes utility classes for adding connection pooling capabilities to the applications that do not make use of an application server. After executing the above example, a new report, or JRPRINT file is saved to disk. We can view it by using the JasperViewer utility included with JasperReports. In this example, we created the report and immediately saved it to disk. The JasperFillManager class also contains methods to send a report to an output stream or to store it in memory in the form of a JasperPrint object. Storing the compiled report in a JasperPrint object allows us to manipulate the report in our code further. We could, for example, export it to PDF or another format. The method used to store a report into a JasperPrint object is JasperFillManager.fillReport(). The method used for sending the report to an output stream is JasperFillManager.fillReportToStream(). These two methods accept the same parameters as JasperFillManager.fillReportToFile() and are trivial to use once we are familiar with this method. Refer to the JasperReports API for details. In the next example, we will fill our report and immediately export it to PDF by taking advantage of the net.sf.jasperreports.engine.JasperRunManager.runReportToPdfStream() method. package net.ensode.jasperbook; import java.io.IOException; import java.io.InputStream; import java.io.PrintWriter; import java.io.StringWriter; import java.sql.Connection; import java.sql.DriverManager; import java.util.HashMap; import javax.servlet.ServletException; import javax.servlet.ServletOutputStream; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import net.sf.jasperreports.engine.JasperRunManager; public class DbReportServlet extends HttpServlet { protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { Connection connection; response.setContentType("application/pdf"); ServletOutputStream servletOutputStream = response .getOutputStream(); InputStream reportStream = getServletConfig() .getServletContext().getResourceAsStream( "/reports/DbReport.jasper"); try { Class.forName("com.mysql.jdbc.Driver"); connection = DriverManager.getConnection("jdbc:mysql: //localhost:3306/flightstats?user=dbUser&password=secret"); JasperRunManager.runReportToPdfStream(reportStream, servletOutputStream, new HashMap(), connection); connection.close(); servletOutputStream.flush(); servletOutputStream.close(); } catch (Exception e) { // display stack trace in the browser StringWriter stringWriter = new StringWriter(); PrintWriter printWriter = new PrintWriter(stringWriter); e.printStackTrace(printWriter); response.setContentType("text/plain"); response.getOutputStream().print(stringWriter.toString()); } } } The only difference between static and dynamic reports is that for dynamic reports we pass a connection to the report for generating a database report. After deploying this servlet and pointing the browser to its URL, we should see a screen similar to the following screenshot:  
Read more
  • 0
  • 0
  • 9406

article-image-securing-your-network-using-firewalld
Packt
23 Jun 2015
13 min read
Save for later

Securing Your Network using firewalld

Packt
23 Jun 2015
13 min read
In this article by Andrew Mallett, author of the book Learning RHEL Networking, we see on how to secure our network using the firewall daemon, that is, firewalld. The default user interface for netfilter, the kernel-based firewall, on RHEL7 is firewalld. Administrators now have a choice to use firewalld or iptables to manage firewalls. Underlying either process, we can still implement the kernel-based netfilter firewall. The frontend command to this new interface is firewall-cmd. The main benefit this offers is the ability to refresh the netfilter setting when the firewall is running. This is not possible with the iptables interface; additionally, we are able to use zone management. This enables us to have different firewall configurations, which depends on the network we are connected to. In this article, we will be cover the following topics: The firewall status Routing The zone management The source management Firewall rules using services Firewall rules using ports Masquerading and the network address translation Using rich rules Implementing direct rules Reverting to iptables (For more resources related to this topic, see here.) The firewall status The firewall service can provide protection for your RHEL system and services from other hosts on the local network or Internet. Although firewalling is often maintained on the border routers to your network, additional protection can be provided by host-based firewalls, such as the netfilter firewall on the Linux kernel. The netfilter firewall on RHEL 7 can be implemented via the iptables or firewalld service, with the latter being the default. The status of the firewalld service can be interrogated in a normal manner using the systemctl command. This will provide a verbose output if the service is running. This will include the PID (process ID) of firewalld along with recent log messages. The following is a command from RHEL7.1: # systemctl status firewalld If you just need a quick check with a less verbose output, make use of the firewall-cmd command. This is the main administrative tool used to manage firewalld. If firewalld was not active, the output would show as not running. Routing Although not strictly necessary for a firewall, you may need to implement routing on your RHEL7 system. Often, this will be associated with multi-homed systems with more than one network interface card; however, this is not a requirement of network routing, which allows packets to be forwarded to the correct destination network. Network routing is enabled in procfs in the /proc/sys/net/ipv4/ip_forward file. If this file contains a value of 0, then routing is disabled; if it has a value of 1, routing is enabled. This can be set using the echo command as follows: # echo 1 > /proc/sys/net/ipv4/ip_forward However, this is then turned on until the next reboot when the routing will revert to the configured setting. To make this setting permanent traditionally, the /etc/sysctl.conf file has been used. It's now recommended to add you own configurations to /etc/sysctl.d/. Here is an example of this: # echo "net.ipv4.ip_forward = 1" > /etc/sysctl.d/ipforward.conf This will create a file and set its directive. To make this setting effective prior to the next reboot, we can make use of the sysctl command, as shown in the following command: # sysctl -p /etc/sysctl.d/ipforward.conf Zone management A new feature you will find in firewalld that is more aimed at mobile systems—such as laptops—is the inclusion of zones. However, these zones can be equally used on a multihomed system, which associates different NICs with appropriate zones. Using zones in either mobile or multihomed systems, firewall rules can be assigned to zones and these rules will be associated with NICs included in that zone. If an interface is not assigned explicitly to a zone, then it will become a part of the default zone. To interrogate the default zone on your system, we can use the firewall-cmd command, as shown in the following command line: # firewall-cmd --get-default-zone Should you need to list all the configured zones on your system, the following command can be used: # firewall-cmd --get-zones Perhaps more usefully, we can display zones with interfaces assigned to them; if no assignments have been made, then all the interfaces will be in the public zone. The --get-active-zones option will help us with this, as shown in the following command: # firewall-cmd --get-active-zones Should we require a more verbose output, we can list all the zone names, associated rules, and interfaces. The following command demonstrates how this can be achieved: # firewall-cmd --list-all-zones If you need to utilize zones, you can choose the default zone and assign interfaces to specific zones as well. Firstly, assign a new default zone as follows: # firewall-cmd --set-default-zone=work Here, we redirect the default zone to the work zone. In this way, all NICs that have not been explicitly assigned will participate in the work zone. The preceding command should report back with success. We can also explicitly assign a zone to an interface as follows: # firewall-cmd --zone=public --change-interface=eno16777736 The change made through this command will be temporary until the next reboot; to make it permanent, we will add the --permanent option: # firewall-cmd --zone=public --change-interface=eno16777736 --permanent Making a setting permanent will persist the configuration within the zone file located in the /etc/firewalld/zones/ directory. In our case, the file is /etc/firewalld/zones/public.xml. After having implemented the permanent change as detailed here, we can list the contents of the XML file with the cat command. We can either interrogate an individual NIC to view the zone it's associated with or list all interfaces within a zone; the following commands illustrate this: # firewall-cmd --get-zone-of-interface=eno16777736 # firewall-cmd --zone=public --list-all You can use tab completion to assist with options and arguments with firewall-cmd. If the supplied zones are not ample or perhaps the names do not work for your naming schemes, it's possible to create your own zones and add interfaces and rules. After adding your zone, you can reload the configuration to allow it to be used immediately as follows: # firewall-cmd --permanent --new-zone=packt # firewall-cmd --reload The --reload option can reload the configuration that allows current connections to continue uninterrupted; whereas the --complete-reload option will stop all connections during the process. Source management The problem that you may encounter using interfaces assigned to your zones is that it does not differentiate between network addresses. Often, this is not an issue as only one network address is bound to the NIC; however, if you have more than one address bound to the NIC, you may want to implement the firewalld source. Like interfaces, sources can be assigned to zones. In the following command, we will add a network range to the trusted zone and another range, perhaps on the same NIC to the public zone: # firewall-cmd --permanent --zone=trusted --add-source=192.168.1.0/24 # firewall-cmd --permanent --zone=public --add-source=172.17.0.0/16 Similar to interfaces, binding a source to a zone will activate that zone and will be listed with the --get-active-zones option. Firewall rules using services When we think of firewalls, we think of allowing or denial of access to ports. The use of service XML files can ease the port management with one service, perhaps listing multiple ports. The other point to take note of is that firewalld daemon's default policy is to deny access, so any access needed has to be explicitly granted to a port associated with a service. To list services that have been allowed on the default zone, we can simply use the --list-services option, as shown in the following example: # firewall-cmd --list-services Similarly, we can gain access to services allowed in a specific zone by including the --zone= option. This can be seen in the following example: # firewall-cmd --zone=home --list-services As you start enabling services, you can easily allow a predefined service through a zone. Predefined services are listed as XML files in the /usr/lib/firewalld/services directory. RHEL 7 is representative of a more mature Linux distribution; as such, it recognizes that the need to separate the /usr directory from the root filesystem is depreciated and the /lib, /bin, and /sbin directories are soft-linked to their respective directories after /usr/. Hence, /lib is now the same as /usr/lib. While defining your own services, you may create XML files within the /etc/firewalld/services directory. The squid proxy server does not have its own service file, and if we choose to allow this as a service rather than just opening the required port the file would look similar to the /etc/firewalld/services/squid.xml, as follows: <?xml version="1.0" encoding="utf-8"?> <service> <short>Squid</short> <description>Squid Web Proxy</description> <port protocol="tcp" port="3128"/> </service> Assuming that we are using SELinux in the Enforcing mode, we will need to set the correct context for the new file using the following commands: # cd /etc/firewalld/services # restorecon squid.xml The permissions on this file should be 640 and it will be set using the following command: # chmod 640 /etc/firewalld/services/squid.xml Having defined the new service now or using pre-existing services, we can add them to a zone. If we are using the default zone, this is achieved simply with the following commands. Note that we reload the configuration at the start to identify the new squid service as follows: # firewall-cmd --reload # firewall-cmd --permanent --add-service=squid # firewall-cmd --reload Similarly, to update a specified zone other than the default zone, we will use the following commands: # firewall-cmd --permanent --add-service=squid --zone=work # firewall-cmd --reload Should we later need to remove this service from the work zone, we can use the following command: # firewall-cmd --permanent --remove-service=squid --zone=work # firewall-cmd --reload Firewall rules using ports In the previous example, where the squid service only required a single port, we could easily add a port rule to allow access to a service. Although the process is simple, in some organizations, the preference will still be to create the service file that documents the need of the port in the description field. If we need to add a port, we have similar options in --add-port and --remove-port. The following command shows how to add the squid TCP port 3128 to the work zone without the need to define the service file: # firewall-cmd --permanent --add-port=3128/tcp --zone=work # firewall-cmd --reload Masquerading and Network Address Translation If your firewalld server is your network router running RHEL 7, you may wish to provide access to the Internet to your internal hosts on a private network. If this is the case, we can enable masquerading. This is also known as NAT (Network Address Translation), where the server's public IP address is used by internal clients. To establish this, we can make use of the built-in internal and external zones and configure masquerading on the external zone. The internal NIC should be assigned to the internal zone and the external NIC should be assigned to the external zone. To establish masquerading on the external zone, we can use the following command: # firewall-cmd --zone=external --add-masquerade Masquerading is removed using the --remove-masquerade option. We may also query the status of masquerading in a zone using the --query-masquerade option. Using rich rules The firewalld rich language allows an administrator to easily configure more complex firewall rules without having knowledge of the iptables syntax. This can include logging and examination of the source address. To add a rule to allow NTP connection on the default zone, but logging the connection at no more than 1 per minute, use the following command: # firewall-cmd --permanent --add-rich-rule='rule service name="ntp" audit limit value="1/m" accept' # firewall-cmd --reload Similarly, we can add a rule that only allows access to the squid service from one subnet only: # firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.166.0.0/24" service name="squid" accept' # firewall-cmd --reload The Fedora project maintains the documentation for rich rules in firewalld and these can be accessed at https://fedoraproject.org/wiki/Features/FirewalldRichLanguage should you need more detailed examples. Implementing direct rules If you have a prior experience with iptables and want to combine you knowledge of iptables with the features in firewalld, direct rules are here to help with this migration. Firstly, if we want to implement a rule on the INPUT chain, we can check the current settings with the following command: # firewall-cmd --direct --get-rules ipv4 filter INPUT If you have not added any rules, the output will be empty. We will add a new rule and use a priority of 0. This means that it will be listed at the top of the chain; however, this means little when no other rules are in place. We do need to verify that rules are added in the correct order to process if other rules are implemented: # firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 0 -p tcp --dport 3128 -j ACCEPT # firewall-cmd --reload Reverting to iptables Additionally, there is nothing stopping you from using the iptables service if this is what you are most familiar with. Firstly, we can install iptables with the following command: # yum install iptables-service We can mask the firewalld service to more effectively disable the service, preventing it from being started without first unmasking this service: # systemctl mask firewalld We can enable iptables with the following commands: # systemctl enable iptables # systemctl enable ip6tables # systemctl start iptables # systemctl start ip6tables Permanent rules are added as they always have been, via the /etc/sysconfig directory and the iptables and ip6tables files. The firewalld project is maintained by Fedora and is the new administrative service and interface for the netfilter firewall on the Linux Kernel. As administrators, we can choose to use this default service or switch back to iptables; however, firewalld is able to provide us with the ability to reload configuration without dropping connections and mechanisms to migrate from iptables. We have seen how we can use zones to segregate network interfaces and sources if we need to share address ranges on a single NIC. Neither the NIC nor the source is bound to the zone. We can then add rules to a zone to control access to our resources. These rules are based on services or ports. If more complexity is required, we have the option of using rich or direct rules. Rich rules are written in the rich language from firewalld, whereas direct rules are written in the iptables syntax. Summary In this article, you learned on how to secure your network using firewalld. Resources for Article: Further resources on this subject: Installation of Oracle VM VirtualBox on Linux [article] Managing public and private groups [article] Target Exploitation [article]
Read more
  • 0
  • 0
  • 9404

article-image-supercharge-your-business-applications-with-azure-openai
Aroh Shukla
10 Oct 2023
8 min read
Save for later

Supercharge Your Business Applications with Azure OpenAI

Aroh Shukla
10 Oct 2023
8 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionThe rapid advancement of technology, particularly in the domain of extensive language models like ChatGPT, is making waves across industries. These models leverage vast data resources and cloud computing power, gaining popularity not just among tech enthusiasts but also mainstream consumers. As a result, there's a growing demand for such experiences in daily tools, both from employees and customers who increasingly expect AI integration. Moreover, this technology promises transformative impacts. This article explores how organizations can harness Azure OpenAI with low-code platforms to leverage these advancements, opening doors to innovative applications and solutions.Introduction to the Power PlatformPower Platform is a Microsoft Low Code platform that spans Microsoft 365, Azure, Dynamics 365, and standalone apps.  a) Power Apps: Rapid low-code app development for businesses with a simple interface, scalable data platform, and cross-device compatibility.b) Power Automate: Automates workflows between apps, from simple tasks to enterprise-grade processes, accessible to users of all technical levels.c) Power BI: Delivers data insights through visualizations, scaling across organizations with built-in governance and security.d) Power Virtual Agents: Creates chatbots with a no-code interface, streamlining integration with other systems through Power Automate.e) Power Pages: Enterprise-grade, low-code SaaS for creating and hosting external-facing websites, offering customization and user-friendly design.Introduction to Azure OpenAIThe collaboration between Microsoft's Azure cloud platform and OpenAI, known as Azure OpenAI Open, presents an exciting opportunity for developers seeking to harness the capabilities of cutting-edge AI models and services. This collaboration facilitates the creation of innovative applications across a spectrum of domains, ranging from natural language processing to AI-powered solutions, all seamlessly integrated within the Azure ecosystem.The rapid pace of technological advancement, characterized by the proliferation of extensive language models like ChatGPT, has significantly altered the landscape of AI. These models, by tapping into vast data resources and leveraging the substantial computing power available in today's cloud infrastructure, have gained widespread popularity. Notably, this technological revolution has transcended the boundaries of tech enthusiasts and has become an integral part of mainstream consumer experiences.As a result, organizations can anticipate a growing demand for AI-driven functionalities within their daily tools and services. Employees seek to enhance their productivity with these advanced capabilities, while customers increasingly expect seamless integration of AI to improve the quality and efficiency of the services they receive.Beyond meeting these rising expectations, the collaboration between Azure and OpenAI offers the potential for transformative impacts. This partnership enables developers to create applications that can deliver tangible and meaningful outcomes, revolutionizing the way businesses operate and interact with their audiences. Azure OpenAI Prerequisites These prerequisites enable you to leverage Azure OpenAI's capabilities for your projects.1. Azure Account: Sign up for an Azure account free or paid: https://azure.microsoft.com/2. Azure Subscription: Acquire an active Azure subscription. 3. Azure OpenAI  Request Form: Follow this form https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR7en2Ais5pxKtso_Pz4b1_xUOFA5Qk1UWDRBMjg0WFhPMkIzTzhKQ1dWNyQlQCN0PWcu . 4. API Endpoint: Know your API endpoint for integration. 5. Azure SDKs: Familiarize with Azure SDKs for seamless development: https://docs.microsoft.com/azure/developer/python/azure-sdk-overview  Step to Supercharge your business applications with Azure OpenAIStep 1: Azure OpenAI Instance and keys 1.      Create a new Azure OpenAI Instance. Select region that is available for Azure OpenAI and name the instance accordingly. 2.      Once Azure OpenAI instanced is provisioned, select he Explore button 3.      Under Management, select Deployment and select Create new deployment  4.      Select gpt-35-turbo and give a deployment name a meaningful name.  5.     Under playground select deployment and select View code.   6.      In this sample code, copy the endpoint and key in a notepad file that you will use in the next steps. Step 2: Create a Power Apps Canvas app1.      Create a Power Apps Canvas App with Tablet format.2.      Add textboxes for questions, a button that will trigger Power Automate, and gallery for output from Power Automate via Azure OpenAI service: 3.      Connect Flow by Selecting the Power Automate icon, selecting Add flow and selecting Create new flow   4.      Select a blank flow Step 3: Power Automate1.      In Power Automate, name the flow, next step search for HTTP action. Take note HTTP action is a Premium connector.   2.      Next, configure the HTTP action with this step.a.      Method: POSTb.      URL: You copied this at Step 1.6c.      Endpoint:  You copied this at Step 1.6d.      Body: Follow the Microsoft Azure OpenAI documentation https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/chatgpt?pivots=programming-language-chat-completionse.      Select Add dynamic content, double click on Ask in PowerApps and new parameter HTTP_Body has been createdf.       Use Power Apps Parameter. 3.      In the Next Step, search for Compose, use Body as a parameter, and save the flow. 4.      Run the flow  5.      Copy the outputs of the Compose action to Notepad. You will use it in the next steps later.  6.      In the Next Step, search for Parse JSON, in Add dynamic content locate Body, and drag to Content  7.      Select Generate from sample  8.      Paste the content that you did in previous Step 3.5.  9.      In the next step, search for Response action, in Add dynamic content select Choices  5.      Save the flow.  Step 4: Write up the Canvas app with Power Automate1.      Set variables at button control  2.      Use Gallery Control to display output from Power Automate via Azure OpenAI. Step 5: Test the App1.      Ask a question in the Power Apps user interface as shown below 2.      After a few seconds, we get a response that comes from Azure OpenAI. ConclusionYou've successfully configured a comprehensive integration involving three key services to enhance your application's capabilities:1. Power Apps: This service serves as the user-facing visual interface, providing an intuitive and user-friendly experience for end-users. With Power Apps, you can design and create interactive applications tailored to your specific needs, empowering users to interact with your system effortlessly.2. Power Automate: For establishing seamless connectivity between your application and Azure OpenAI, Power Automate comes into play. It acts as the bridge that enables data and process flow between different services. With Power Automate, you can automate workflows, manage data, and trigger actions, ensuring a smooth interaction between your application and Azure OpenAI's services.3. Azure OpenAI: Leveraging the capabilities of Azure OpenAI, particularly the ChatGPT 3.5 Turbo service, opens up a world of advanced natural language processing and AI capabilities. This service enables your application to understand and respond to user inputs, making it more intelligent and interactive. Whether it's for chatbots, text generation, or language understanding, Azure OpenAI's powerful tools can significantly enhance your application's functionality.By integrating these three services seamlessly, you've created a robust ecosystem for your application. Users can interact with a visually appealing and user-friendly interface powered by Power Apps, while Power Automate ensures that data and processes flow smoothly behind the scenes. Azure OpenAI, with its advanced language capabilities, adds a layer of intelligence to your application, making it more responsive and capable of understanding and generating human-like text.This integration not only improves user experiences but also opens up possibilities for more advanced and dynamic applications that can understand, process, and respond to natural language inputs effectively.Source CodeYou can rebuild the entire solution at my GitHub Repo at  https://github.com/aarohbits/AzureOpenAIWithPowerPlatform/blob/main/01%20AzureOpenAIPowerPlatform/Readme.md Author BioAroh Shukla is a Microsoft Most Valuable Professional (MVP) Alumni and a Microsoft Certified Trainer (MCT) with expertise in Power Platform and Azure. He assists customers from various industries in their digital transformation endeavors, crafting tailored solutions using advanced Microsoft technologies. He is not only dedicated to serving students and professionals on their Microsoft technology journeys but also excels as a community leader with strong interpersonal skills, active listening, and a genuine commitment to community progress.He possesses a deep understanding of the Microsoft cloud platform and remains up-to-date with the latest innovations. His exceptional communication abilities enable him to effectively convey complex technical concepts with clarity and conciseness, making him a valuable resource in the tech community.
Read more
  • 0
  • 0
  • 9400

article-image-high-availability-scenarios
Packt
26 Nov 2014
14 min read
Save for later

High Availability Scenarios

Packt
26 Nov 2014
14 min read
"Live Migration between hosts in a Hyper-V cluster is very straightforward and requires no specific configuration, apart from type and amount of simultaneous Live Migrations. If you add multiple clusters and standalone Hyper-V hosts into the mix, I strongly advise you to configure Kerberos Constrained Delegation for all hosts and clusters involved." Hans Vredevoort – MVP Hyper-V This article written by Benedict Berger, the author of Hyper-V Best Practices, will guide you through the installation of Hyper-V clusters and their best practice configuration. After installing the first Hyper-V host, it may be necessary to add another layer of availability to your virtualization services. With Failover Clusters, you get independence from hardware failures and are protected from planned or unplanned service outages. This article includes prerequirements and implementation of Failover Clusters. (For more resources related to this topic, see here.) Preparing for High Availability Like every project, a High Availability (HA) scenario starts with a planning phase. Virtualization projects are often turning up the question for additional availability for the first time in an environment. In traditional data centers with physical server systems and local storage systems, an outage of a hardware component will only affect one server hosting one service. The source of the outage can be localized very fast and the affected parts can be replaced in a short amount of time. Server virtualization comes with great benefits, such as improved operating efficiency and reduced hardware dependencies. However, a single component failure can impact a lot of virtualized systems at once. By adding redundant systems, these single points of failure can be avoided. Planning a HA environment The most important factor in the decision whether you need a HA environment is your business requirements. You need to find out how often and how long an IT-related production service can be interrupted unplanned, or planned, without causing a serious problem to your business. Those requirements are defined in a central IT strategy of a business as well as in process definitions that are IT-driven. They include Service Level Agreements of critical business services run in the various departments of your company. If those definitions do not exist or are unavailable, talk to the process owners to find out the level of availability needed. High Availability is structured in different classes, measured by the total uptime in a defined timespan, that is 99.999 percent in a year. Every nine in this figure adds a huge amount of complexity and money needed to ensure this availability, so take time to find out the real availability needed by your services and resist the temptation to plan running every service on multi-redundant, geo-spread cluster systems, as it may not fit in the budget. Be sure to plan for additional capacity in a HA environment, so you can lose hardware components without the need to sacrifice application performance. Overview of the Failover Cluster A Hyper-V Failover Cluster consists of two or more Hyper-V Server compute nodes. Technically, it's possible to use a Failover Cluster with just one computing node; however, it will not provide any availability advantages over a standalone host and is typically only used for migration scenarios. A Failover Cluster is hosting roles such as Hyper-V virtual machines on its computing nodes. If one node fails due to a hardware problem, it will not answer any more to cluster heartbeat communication, even though the service interruption is almost instantly detected. The virtual machines running on the particular node are powered off immediately due to the hardware failure on their computing node. The remaining cluster nodes then immediately take over these VMs in an unplanned failover process and start them on their respective own hardware. The virtual machines will be the backup running after a successful boot of their operating systems and applications in just a few minutes. Hyper-V Failover Clusters work under the condition that all compute nodes have access to a shared storage instance, holding the virtual machine configuration data and its virtual hard disks. In case of a planned failover, that is, for patching compute nodes, it's possible to move running virtual machines from one cluster node to another without interrupting the VM. All cluster nodes can run virtual machines at the same time, as long as there is enough failover capacity running all services when a node goes down. Even though a Hyper-V cluster is still called a Failover Cluster—utilizing the Windows Server Failover-Clustering feature—it is indeed capable of running an Active/Active Cluster. To ensure that all these capabilities of a Failover Cluster are indeed working, it demands an accurate planning and implementation process. Failover Cluster prerequirements To successfully implement a Hyper-V Failover Cluster, we need suitable hardware, software, permissions, and network and storage infrastructure as outlined in the following sections. Hardware The hardware used in a Failover Cluster environment needs to be validated against the Windows Server Catalogue. Microsoft will only support Hyper-V clusters when all components are certified for Windows Server 2012 R2. The servers used to run our HA virtual machines should ideally consist of identical hardware models with identical components. It is possible, and supported, to run servers in the same cluster with different hardware components, that is, different size of RAM; however, due to a higher level of complexity, this is not best practice. Special planning considerations are needed to address the CPU requirements of a cluster. To ensure maximum compatibility, all CPUs in a cluster should be exactly the same model. While it's possible from a technical point of view to mix even CPUs from Intel and AMD in the same cluster through to different architecture, you will lose core cluster capabilities such as Live Migration. Choosing a single vendor for your CPUs is not enough, even when using different CPU models your cluster nodes may be using a different set of CPU instruction set extensions. With different instructions sets, Live Migrations won't work either. There is a compatibility mode that disables most of the instruction set on all CPUs on all cluster nodes; however, this leaves you with a negative impact on performance and should be avoided. A better approach to this problem would be creating another cluster from the legacy CPUs running smaller or non-production workloads without affecting your high-performance production workloads. If you want to extend your cluster after some time, you will find yourself with the problem of not having the exact same hardware available to purchase. Choose the current revision of the model or product line you are already using in your cluster and manually compare the CPU instruction sets at http://ark.intel.com/ and http://products.amd.com/, respectively. Choose the current CPU model that best fits the original CPU features of your cluster and have this design validated by your hardware partner. Ensure that your servers are equipped with compatible CPUs, the same amount of RAM, and the same network cards and storage controllers. The network design Mixing different vendors of network cards in a single server is fine and best practice for availability, but make sure all your Hyper-V hosts are using an identical hardware setup. A network adapter should only be used exclusively for LAN traffic or storage traffic. Do not mix these two types of communication in any basic scenario. There are some more advanced scenarios involving converged networking that can enable mixed traffic, but in most cases, this is not a good idea. A Hyper-V Failover Cluster requires multiple layers of communication between its nodes and storage systems. Hyper-V networking and storage options have changed dramatically through the different releases of Hyper-V. With Windows Server 2012 R2, the network design options are endless. In this article, we will work with a typically seen basic set of network designs. We have at least six Network Interface Cards (NICs) available in our servers with a bandwidth of 1 Gb/s. If you have more than five interface cards available per server, use NIC Teaming to ensure the availability of the network or even use converged networking. Converged networking will also be your choice if you have less than five network adapters available. The First NIC will be exclusively used for Host Communication to our Hyper-V host and will not be involved in the VM network traffic or cluster communication at any time. It will ensure Active Directory and management traffic to our Management OS. The second NIC will ensure Live Migration of virtual machines between our cluster nodes. The third NIC will be used for VM traffic. Our virtual machines will be connected to the various production and lab networks through this NIC. The fourth NIC will be used for internal cluster communication. The first four NICs can either be teamed through Windows Server NIC Teaming or can be abstracted from the physical hardware through to Windows Server network virtualization and converged fabric design. The fifth NIC will be reserved for storage communication. As advised, we will be isolating storage and production LAN communication from each other. If you do not use iSCSI or SMB3 storage communication, this NIC will not be necessary. If you use Fibre Channel SAN technology, use a FC-HBA instead. If you leverage Direct Attached Storage (DAS), use the appropriate connector for storage communication. The sixth NIC will also be used for storage communication as a redundancy. The redundancy will be established via MPIO and not via NIC Teaming. There is no need for a dedicated heartbeat network as in older revisions of Windows Server with Hyper-V. All cluster networks will automatically be used for sending heartbeat signals throughout the other cluster members. If you don't have 1 Gb/s interfaces available, or if you use 10 GbE adapters, it’s best practice to implement a converged networking solution. Storage design All cluster nodes must have access to the virtual machines residing on a centrally shared storage medium. This could be a classic setup with a SAN, a NAS, or a more modern concept with Windows Scale Out File Servers hosting Virtual Machine Files SMB3 Fileshares. In this article, we will use a NetApp SAN system that's capable of providing a classic SAN approach with LUNs mapped to our Hosts as well as utilizing SMB3 Fileshares, but any other Windows Server 2012 R2 validated SAN will fulfill the requirements. In our first setup, we will utilize Cluster Shared Volumes (CSVs) to store several virtual machines on the same storage volume. It's not good these days to create a single volume per virtual machine due to a massive management overhead. It's a good rule of thumb to create one CSV per cluster node; in larger environments with more than eight hosts, a CSV per two to four cluster nodes. To utilize CSVs, follow these steps: Ensure that all components (SAN, Firmware, HBAs, and so on) are validated for Windows Server 2012 R2 and are up to date. Connect your SAN physically to all your Hyper-V hosts via iSCSI or Fibre Channel connections. Create two LUNs on your SAN for hosting virtual machines. Activate Hyper-V performance options for these LUNs if possible (that is, on a NetApp, by setting the LUN type to Hyper-V). Size the LUNs for enough capacity to host all your virtual hard disks. Label the LUNs CSV01 and CSV02 with appropriate LUN IDs. Create another small LUN with 1 GB in size and label it Quorum. Make the LUNs available to all Hyper-V hosts in this specified cluster by mapping it on the storage device. Do not make these LUNs available to any other hosts or cluster. Prepare storage DSMs and drivers (that is, MPIO) for Hyper-V host installation. Refresh disk configuration on hosts, install drivers and DSMs, and format volumes as NTFS (quick). Install Microsoft Multipath IO when using redundant storage paths: Install-WindowsFeature -Name Multipath-IO –Computername ElanityHV01, ElanityHV02 In this example, I added the MPIO feature to two Hyper-V hosts with the computer names ElanityHV01 and ElanityHV02. SANs typically are equipped with two storage controllers for redundancy reasons. Make sure to disperse your workloads over both controllers for optimal availability and performance. If you leverage file servers providing SMB3 shares, the preceding steps do not apply to you. Perform the following steps instead: Create a storage space with the desired disk-types, use storage tiering if possible. Create a new SMB3 Fileshare for applications. Customize the Permissions to include all Hyper-V servers from the planned clusters as well as the Hyper-V cluster object itself with full control. Server and software requirements To create a Failover Cluster, you need to install a second Hyper-V host. Use the same unattended file but change the IP address and the hostname. Join both Hyper-V hosts to your Active Directory domain if you have not done this until yet. Hyper-V can be clustered without leveraging Active Directory but it's lacking several key components, such as Live Migration, and shouldn't be done on purpose. The availability to successfully boot up a domain-joined Hyper-V cluster without the need to have any Active Directory domain controller present during boot time is the major benefit from the Active Directory independency of Failover Clusters. Ensure that you create a Hyper-V virtual switch as shown earlier with the same name on both hosts, to ensure cluster compatibility and that both nodes are installed with all updates. If you have System Center 2012 R2 in place, use the System Center Virtual Machine Manager to create a Hyper-V cluster. Implementing Failover Clusters After preparing our Hyper-V hosts, we will now create a Failover Cluster using PowerShell. I'm assuming your hosts are installed, storage and network connections are prepared, and the Hyper-V role is already active utilizing up-to-date drivers and firmware on your hardware. First, we need to ensure that Servername, Date, and Time of our Hosts are correct. Time and Timezone configurations should occur via Group Policy. For automatic network configuration later on, it's important to rename the network connections from default to their designated roles using PowerShell, as seen in the following commands: Rename-NetAdapter -Name "Ethernet" -NewName "Host" Rename-NetAdapter -Name "Ethernet 2" -NewName "LiveMig" Rename-NetAdapter -Name "Ethernet 3" -NewName "VMs" Rename-NetAdapter -Name "Ethernet 4" -NewName "Cluster" Rename-NetAdapter -Name "Ethernet 5" -NewName "Storage" The Network Connections window should look like the following screenshot: Hyper-V host Network Connections Next, IP configuration of the network adapters. If you are not using DHCP for your servers, manually set the IP configuration (different subnets) of the specified network cards. Here is a great blog post on how to automate this step: http://bit.ly/Upa5bJ Next, we need to activate the necessary Failover Clustering features on both of our Hyper-V hosts: Install-WindowsFeature -Name Failover-Clustering-IncludeManagementTools –Computername ElanityHV01, ElanityHV02 Before actually creating the cluster, we are launching a cluster validation cmdlet via PowerShell: Test-Cluster ElanityHV01, ElanityHV02 Test-Cluster cmdlet Open the generated .mht file for more details, as shown in the following screenshot: Cluster validation As you can see, there are some warnings that should be investigated. However, as long as there are no errors, the configuration is ready for clustering and fully supported by Microsoft. However, check out Warnings to be sure you won't run into problems in the long run. After you have fixed potential errors and warnings listed in the Cluster Validation Report, you can finally create the cluster as follows: New-Cluster-Name CN=ElanityClu1,OU=Servers,DC=cloud,DC=local-Node ElanityHV01, ElanityHV02-StaticAddress 192.168.1.49 This will create a new cluster named ElanityClu1 consisting of the nodes ElanityHV01 and ElanityHV02 and using the cluster IP address 192.168.1.49. This cmdlet will create the cluster and the corresponding Active Directory Object in the specified OU. Moving the cluster object to a different OU later on is no problem at all; even renaming is possible when done the right way. After creating the cluster, when you open the Failover Cluster Management console, you should be able to connect to your cluster: Failover Cluster Manager You will see that all your cluster nodes and Cluster Core Resources are online. Rerun the Validation Report and copy the generated .mht files to a secure location if you need them for support queries. Keep in mind that you have to rerun this wizard if any hardware or configuration changes occurring to the cluster components, including any of its nodes. The initial cluster setup is now complete and we can continue with post creation tasks. Summary With the knowledge from this article, you are now able to design and implement Hyper-V Failover Clusters as well as guest clusters. You are aware of the basic concepts of High Availability and the storage and networking options necessary to achieve this. You have seen real-world proven configurations to ensure a stable operating environment. Resources for Article: Further resources on this subject: Planning Desktop Virtualization [Article] Backups in the VMware View Infrastructure [Article] Virtual Machine Design/a> [Article]
Read more
  • 0
  • 0
  • 9389

article-image-scientific-computing-apis-python
Packt
20 Aug 2015
23 min read
Save for later

Scientific Computing APIs for Python

Packt
20 Aug 2015
23 min read
In this article, by Hemant Kumar Mehta author of the book Mastering Python Scientific Computing we will have comprehensive discussion of features and capabilities of various scientific computing APIs and toolkits in Python. Besides the basics, we will also discuss some example programs for each of the APIs. As symbolic computing is relatively different area of computerized mathematics, we have kept a special sub section within the SymPy section to discuss basics of computerized algebra system. In this article, we will cover following topics: Scientific numerical computing using NumPy and SciPy Symbolic Computing using SymPy (For more resources related to this topic, see here.) Numerical Scientific Computing in Python The scientific computing mainly demands for facility of performing calculations on algebraic equations, matrices, differentiations, integrations, differential equations, statistics, equation solvers and much more. By default Python doesn't come with these functionalities. However, development of NumPy and SciPy has enabled us to perform these operations and much more advanced functionalities beyond these operations. NumPy and SciPy are very powerful Python packages that enable the users to efficiently perform the desired operations for all types of scientific applications. NumPy package NumPy is the basic Python package for the scientific computing. It provides facility of multi-dimensional arrays and basic mathematical operations such as linear algebra. Python provides several data structure to store the user data, while the most popular data structures are lists and dictionaries. The list objects may store any type of Python object as an element. These elements can be processed using loops or iterators. The dictionary objects store the data in key, value format. The ndarrays data structure The ndaarays are also similar to the list but highly flexible and efficient. The ndarrays is an array object to represent multidimensional array of fixed-size items. This array should be homogeneous. It has an associated object of dtype to define the data type of elements in the array. This object defines type of the data (integer, float, or Python object), size of data in bytes, byte ordering (big-endian or little-endian). Moreover, if the type of data is record or sub-array then it also contains details about them. The actual array can be constructed using any one of the array, zeros or empty methods. Another important aspect of ndarrays is that the size of arrays can be dynamically modified. Moreover, if the user needs to remove some elements from the arrays then it can be done using the module for masked arrays. In a number of situations, scientific computing demands deletion/removal of some incorrect or erroneous data. The numpy.ma module provides the facility of masked array to easily remove selected elements from arrays. A masked array is nothing but the normal ndarrays with a mask. Mask is another associated array with true or false values. If for a particular position mask has true value then the corresponding element in the main array is valid and if the mask is false then the corresponding element in the main array is invalid or masked. In such case while performing any computation on such ndarrays the masked elements will not be considered. File handling Another important aspect of scientific computing is storing the data into files and NumPy supports reading and writing on both text as well as binary files. Mostly, text files are good way for reading, writing and data exchange as they are inherently portable and most of the platforms by default have capabilities to manipulate them. However, for some of the applications sometimes it is better to use binary files or the desired data for such application can only be stored in binary files. Sometimes the size of data and nature of data like image, sound etc. requires them to store in binary files. In comparison to text files binary files are harder to manage as they have specific formats. Moreover, the size of binary files are comparatively very small and the read/ write operations are very fast then the read/ write text files. This fast read/ write is most suitable for the application working on large datasets. The only drawback of binary files manipulated with NumPy is that they are accessible only through NumPy. Python has text file manipulation functions such as open, readlines and writelines functions. However, it is not performance efficient to use these functions for scientific data manipulation. These default Python functions are very slow in reading and writing the data in file. NumPy has high performance alternative that load the data into ndarrays before actual computation.  In NumPy, text files can be accessed using numpy.loadtxt and numpy.savetxt functions.  The loadtxt function can be used to load the data from text files to the ndarrays. NumPy also has a separate functions to manipulate the data in binary files. The function for reading and writing are numpy.load and numpy.save respectively. Sample NumPy programs The NumPy array can be created from a list or tuple using the array, this method can transform sequences of sequences into two dimensional array. import numpy as np x = np.array([4,432,21], int) print x                            #Output [  4 432  21] x2d = np.array( ((100,200,300), (111,222,333), (123,456,789)) ) print x2d Output: [  4 432  21] [[100 200 300] [111 222 333] [123 456 789]] Basic matrix arithmetic operation can be easily performed on two dimensional arrays as used in the following program.  Basically these operations are actually applied on elements hence the operand arrays must be of equal size, if the size is not matching then performing these operations will cause a runtime error. Consider the following example for arithmetic operations on one dimensional array. import numpy as np x = np.array([4,5,6]) y = np.array([1,2,3]) print x + y                      # output [5 7 9] print x * y                      # output [ 4 10 18] print x - y                       # output [3 3 3]    print x / y                       # output [4 2 2] print x % y                    # output [0 1 0] There is a separate subclass named as matrix to perform matrix operations. Let us understand matrix operation by following example which demonstrates the difference between array based multiplication and matrix multiplication. The NumPy matrices are 2-dimensional and arrays can be of any dimension. import numpy as np x1 = np.array( ((1,2,3), (1,2,3), (1,2,3)) ) x2 = np.array( ((1,2,3), (1,2,3), (1,2,3)) ) print "First 2-D Array: x1" print x1 print "Second 2-D Array: x2" print x2 print "Array Multiplication" print x1*x2   mx1 = np.matrix( ((1,2,3), (1,2,3), (1,2,3)) ) mx2 = np.matrix( ((1,2,3), (1,2,3), (1,2,3)) ) print "Matrix Multiplication" print mx1*mx2   Output: First 2-D Array: x1 [[1 2 3]  [1 2 3]  [1 2 3]] Second 2-D Array: x2 [[1 2 3]  [1 2 3]  [1 2 3]] Array Multiplication [[1 4 9]  [1 4 9]  [1 4 9]] Matrix Multiplication [[ 6 12 18]  [ 6 12 18]  [ 6 12 18]] Following is a simple program to demonstrate simple statistical functions given in NumPy: import numpy as np x = np.random.randn(10)   # Creates an array of 10 random elements print x mean = x.mean() print mean std = x.std() print std var = x.var() print var First Sample Output: [ 0.08291261  0.89369115  0.641396   -0.97868652  0.46692439 -0.13954144  -0.29892453  0.96177167  0.09975071  0.35832954] 0.208762357623 0.559388806817 0.312915837192 Second Sample Output: [ 1.28239629  0.07953693 -0.88112438 -2.37757502  1.31752476  1.50047537   0.19905071 -0.48867481  0.26767073  2.660184  ] 0.355946458357 1.35007701045 1.82270793415 The above programs are some simple examples of NumPy. SciPy package SciPy extends Python and NumPy support by providing advanced mathematical functions such as differentiation, integration, differential equations, optimization, interpolation, advanced statistical functions, equation solvers etc. SciPy is written on top of the NumPy array framework. SciPy has utilized the arrays and the basic operations on the arrays provided in NumPy and extended it to cover most of the mathematical aspects regularly required by scientists and engineers for their applications. In this article we will cover examples of some basic functionality. Optimization package The optimization package in SciPy provides facility to solve univariate and multivariate minimization problems. It provides solutions to minimization problems using a number of algorithms and methods. The minimization problem has wide range of application in science and commercial domains. Generally, we perform linear regression, search for function's minimum and maximum values, finding the root of a function, and linear programming for such cases. All these functionalities are supported by the optimization package.  Interpolation package A number of interpolation methods and algorithms are provided in this package as built-in functions. It provides facility to perform univariate and multivariate interpolation, one dimensional and two dimensional Splines etc. We use univariate interpolation when data is dependent of one variable and if data is around more than one variable then we use multivariate interpolation. Besides these functionalities it also provides additional functionality for Lagrange and Taylor polynomial interpolators. Integration and differential equations in SciPy Integration is an important mathematical tool for scientific computations. The SciPy integrations sub-package provides functionalities to perform numerical integration. SciPy provides a range of functions to perform integration on equations and data. It also has ordinary differential equation integrator. It provides various functions to perform numerical integrations using a number of methods from mathematics using numerical analysis. Stats module SciPy Stats module contains a functions for most of the probability distributions and wide range or statistical functions. Supported probability distributions include various continuous distribution, multivariate distributions and discrete distributions. The statistical functions range from simple means to the most of the complex statistical concepts, including skewness, kurtosis chi-square test to name a few. Clustering package and Spatial Algorithms in SciPy Clustering analysis is a popular data mining technique having wide range of application in scientific and commercial applications. In Science domain biology, particle physics, astronomy, life science, bioinformatics are few subjects widely using clustering analysis for problem solution. Clustering analysis is being used extensively in computer science for computerized fraud detection, security analysis, image processing etc. The clustering package provides functionality for K-mean clustering, vector quantization, hierarchical and agglomerative clustering functions. The spatial class has functions to analyze distance between data points using triangulations, Voronoi diagrams, and convex hulls of a set of points. It also has KDTree implementations for performing nearest-neighbor lookup functionality. Image processing in SciPy           SciPy provides support for performing various image processing operations including basic reading and writing of image files, displaying images, simple image manipulations operations such as cropping, flipping, rotating etc. It has also support for image filtering functions such as mathematical morphing, smoothing, denoising and sharpening of images. It also supports various other operations such as image segmentation by labeling pixels corresponding to different objects, Classification, Feature extraction for example edge detection etc. Sample SciPy programs In the subsequent subsections we will discuss some example programs using SciPy modules and packages. We start with a simple program performing standard statistical computations. After this, we will discuss a program performing finding a minimal solution using optimizations. At last we will discuss image processing programs. Statistics using SciPy The stats module of SciPy has functions to perform simple statistical operations and various probability distributions. The following program demonstrates simple statistical calculations using SciPy stats.describe function. This single function operates on an array and returns number of elements, minimum value, maximum value, mean, variance, skewness and kurtosis. import scipy as sp import scipy.stats as st s = sp.randn(10) n, min_max, mean, var, skew, kurt = st.describe(s) print("Number of elements: {0:d}".format(n)) print("Minimum: {0:3.5f} Maximum: {1:2.5f}".format(min_max[0], min_max[1])) print("Mean: {0:3.5f}".format(mean)) print("Variance: {0:3.5f}".format(var)) print("Skewness : {0:3.5f}".format(skew)) print("Kurtosis: {0:3.5f}".format(kurt)) Output: Number of elements: 10 Minimum: -2.00080 Maximum: 0.91390 Mean: -0.55638 Variance: 0.93120 Skewness : 0.16958 Kurtosis: -1.15542 Optimization in SciPY Generally, in mathematical optimization a non convex function called Rosenbrock function is used to test the performance of the optimization algorithm. The following program is demonstrating the minimization problem on this function. The Rosenbrock function of N variable is given by following equation and it has minimum value 0 at xi =1. The program for the above function is: import numpy as np from scipy.optimize import minimize   # Definition of Rosenbrock function def rosenbrock(x):      return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)   x0 = np.array([1, 0.7, 0.8, 2.9, 1.1]) res = minimize(rosenbrock, x0, method = 'nelder-mead', options = {'xtol': 1e-8, 'disp': True})   print(res.x) Output is: Optimization terminated successfully.          Current function value: 0.000000          Iterations: 516          Function evaluations: 827 [ 1.  1.  1.  1.  1.] The last line is the output of print(res.x) where all the elements of array are 1. Image processing using SciPy Following two programs are developed to demonstrate the image processing functionality of SciPy. First of these program is simply displaying the standard test image widely used in the field of image processing called Lena. The second program is applying geometric transformation on this image. It performs image cropping and rotation by 45 %. The following program is displaying Lena image using matplotlib API. The imshow method renders the ndarrays into an image and the show method displays the image. from scipy import misc l = misc.lena() misc.imsave('lena.png', l) import matplotlib.pyplot as plt plt.gray() plt.imshow(l) plt.show() Output: The output of the above program is the following screen shot: The following program is performing geometric transformation. This program is displaying transformed images and along with the original image as a four axis array. import scipy from scipy import ndimage import matplotlib.pyplot as plt import numpy as np   lena = scipy.misc.lena() lx, ly = lena.shape crop_lena = lena[lx/4:-lx/4, ly/4:-ly/4] crop_eyes_lena = lena[lx/2:-lx/2.2, ly/2.1:-ly/3.2] rotate_lena = ndimage.rotate(lena, 45)   # Four axes, returned as a 2-d array f, axarr = plt.subplots(2, 2) axarr[0, 0].imshow(lena, cmap=plt.cm.gray) axarr[0, 0].axis('off') axarr[0, 0].set_title('Original Lena Image') axarr[0, 1].imshow(crop_lena, cmap=plt.cm.gray) axarr[0, 1].axis('off') axarr[0, 1].set_title('Cropped Lena') axarr[1, 0].imshow(crop_eyes_lena, cmap=plt.cm.gray) axarr[1, 0].axis('off') axarr[1, 0].set_title('Lena Cropped Eyes') axarr[1, 1].imshow(rotate_lena, cmap=plt.cm.gray) axarr[1, 1].axis('off') axarr[1, 1].set_title('45 Degree Rotated Lena')   plt.show() Output: The SciPy and NumPy are core of Python's support for scientific computing as they provide solid functionality of numerical computing. Symbolic computations using SymPy Computerized computations performed over the mathematical symbols without evaluating or changing their meaning is called as symbolic computations. Generally the symbolic computing is also called as computerized algebra and such computerized system are called computer algebra system. The following subsection has a brief and good introduction to SymPy. Computer Algebra System (CAS) Let us discuss the concept of CAS. CAS is a software or toolkit to perform computations on mathematical expressions using computers instead of doing it manually. In the beginning, using computers for these applications was named as computer algebra and now this concept is called as symbolic computing. CAS systems may be grouped into two types. First is the general purpose CAS and the second type is the CAS specific to particular problem. The general purpose systems are applicable to most of the area of algebraic mathematics while the specialized CAS are the systems designed for the specific area such as group theory or number theory. Most of the time, we prefer the general purpose CAS to manipulate the mathematical expressions for scientific applications. Features of a general purpose CAS Various desired features of a general purpose computer algebra system for scientific applications are as: A user interface to manipulate mathematical expressions. An interface for programming  and debugging Such systems requires simplification of various mathematical expressions hence, a simplifier is a most essential component of such computerized algebra system. The general purpose CAS system must support exhaustive set of functions to perform various mathematical operations required by any algebraic computations Most of the applications perform extensive computations an efficient memory management is highly essential. The system must provide support to perform mathematical computations on high precision numbers and large quantities. A brief idea of SymPy SymPy is an open source and Python based implementation of computerized algebra system (CAS). The philosophy behind the SymPy development is to design and develop a CAS having all the desired features yet its code as simple as possible so that it will be highly and easily extensible. It is written completely in Python and do not requires any external library. The basic idea about using SymPy is the creation and manipulation of expressions. Using SymPy, the user represents mathematical expressions in Python language using SymPy classes and objects. These expressions are composed of numbers, symbols, operators, functions etc. The functions are the modules to perform a mathematical functionality such as logarithms, trigonometry etc. The development of SymPy was started by Ondřej Čertíkin August 2006. Since then, it has been grown considerably with the contributions more than hundreds of the contributors. This library now consists of 26 different integrated modules. These modules have capability to perform computations required for basic symbolic arithmetic, calculus, algebra, discrete mathematics, quantum physics, plotting and printing with the option to export the output of the computations to LaTeX and other formats. The capabilities of SymPy can be divided into two categories as core capability and advanced capabilities as SymPy library is divided into core module with several advanced optional modules. The various supported functionality by various modules are as follows: Core capabilities The core capability module supports basic functionalities required by any mathematical algebra operations to be performed. These operations include basic arithmetic like multiplications, addition, subtraction and division, exponential etc. It also supports simplification of expressions to simplify complex expressions. It provides the functionality of expansion of series and symbols. Core module also supports functions to perform operations related to trigonometry, hyperbola, exponential, roots of equations, polynomials, factorials and gamma functions, logarithms etc. and a number of special functions for B-Splines, spherical harmonics, tensor functions, orthogonal polynomials etc. There is strong support also given for pattern matching operations in the core module. Core capabilities of the SymPy also include the functionalities to support substitutions required by algebraic operations. It not only supports the high precision arithmetic operations over integers, rational and gloating point numbers but also non-commutative variables and symbols required in polynomial operations. Polynomials Various functions to perform polynomial operations belong to the polynomial module. These functions includes basic polynomial operations such as division, greatest common divisor (GCD) least common multiplier (LCM), square-free factorization, representation of  polynomials with symbolic coefficients, some special operations like computation of resultant, deriving trigonometric identities, Partial fraction decomposition, facilities for Gröbner basis over polynomial rings and fields. Calculus Various functionalities supporting different operations required by basic and advanced calculus are provided in this module. It supports functionalities required by limits, there is a limit function for this. It also supports differentiation and integrations and series expansion, differential equations and calculus of finite differences. SymPy is also having special support for definite integrals and integral transforms. In differential it supports numerical differential, composition of derivatives and fractional derivatives.  Solving equations Solver is the name of the SymPy module providing equations solving functionality. This module supports solving capabilities for complex polynomials, roots of polynomials and solving system of polynomial equations. There is a function to solve the algebraic equations. It not only provides support for solutions for differential equations including ordinary differential equations, some forms of partial differential equations, initial and boundary values problems etc. but also supports solution of difference equations. In mathematics, difference equation is also called recurrence relations, that is an equation that recursively defines a sequence or multidimensional array of values. Discrete math Discrete mathematics includes those mathematical structures which are discrete in nature rather than the continuous mathematics like calculus. It deals with the integers, graphs, statements from logic theory etc. This module has full support for binomial coefficient, products, summations etc. This module also supports various functions from number theory including residual theory, Euler's Totient, partition and a number of functions dealing with prime numbers and their factorizations. SymPy also supports creation and manipulations of logic expressions using symbolic and Boolean values. Matrices SymPy has a strong support for various operations related to the matrices and determinants. Matrix belongs to linear algebra category of mathematics. It supports creation of matrix, basic matrix operations like multiplication, addition, matrix of zeros and ones, creation of random matrix and performing operations on matrix elements. It also supports special functions line computation of Hessian matrix for a function, Gram-Schmidt process on set of vectors, Computation of Wronskian for matrix of functions etc. It has also full support for Eigenvalues/eigenvectors, matrix inversion, solution of matrix and determinants.  For computing determinants of the matrix, it also supports Bareis' fraction-free algorithm and berkowitz algorithms besides the other methods. For matrices it also supports nullspace calculation, cofactor expansion tools, derivative calculation for matrix elements, calculation of dual of matrix etc. Geometry SymPy is also having module that supports various operations associated with the two-dimensional (2-D) geometry. It supports creation of various 2-D entity or objects such as point, line, circle, ellipse, polygon, triangle, ray, segment etc. It also allows us to perform query on these entities such as area of some of the suitable objects line ellipse/ circle or triangle, intersection points of lines etc. It also supports other queries line tangency determination, finding similarity and intersection of entities. Plotting There is a very good module that allows us to draw two-dimensional and three-dimensional plots. At present, the plots are rendered using the matplotlib package. It also supports other packages such as TextBackend, Pyglet, textplot etc.  It has a very good interactive interface facility of customizations and plotting of various geometric entities. The plotting module has the following functions: Plotting 2-D line plots Plotting of 2-D parametric plots. Plotting of 2-D implicit and region plots. Plotting of 3-D plots of functions involving two variables. Plotting of 3-D line and surface plots etc. Physics There is a module to solve the problem from Physics domain. It supports functionality for mechanics including classical and quantum mechanics, high energy physics. It has functions to support Pauli Algebra, quantum harmonic oscillators in 1-D and 3-D. It is also having functionality for optics. There is a separate module that integrates unit systems into SymPy. This will allow users to select the specific unit system for performing his/ her computations and conversion between the units. The unit systems are composed of units and constant for computations. Statistics The statistics module introduced in SymPy to support the various concepts of statistics required in mathematical computations. Apart from supporting various continuous and discrete statistical distributions, it also supports functionality related to the symbolic probability. Generally, these distributions support functions for random number generations in SymPy. Printing SymPy is having a module for provide full support for Pretty-Printing. Pretty-print is the idea of conversions of various stylistic formatting into the text files such as source code, text files and markup files or similar content. This module produces the desired output by printing using ASCII and or Unicode characters. It supports various printers such as LATEX and MathML printer. It is also capable of producing source code in various programming languages such as c, Python or FORTRAN. It is also capable of producing contents using markup languages like HTML/ XML. SymPy modules The following list has formal names of the modules discussed in above paragraphs: Assumptions: assumption engine Concrete: symbolic products and summations Core: basic class structure: Basic, Add, Mul, Pow etc. functions: elementary and special functions galgebra: geometric algebra geometry: geometric entities integrals: symbolic integrator interactive: interactive sessions (e.g. IPython) logic: boolean algebra, theorem proving matrices: linear algebra, matrices mpmath: fast arbitrary precision numerical math ntheory: number theoretical functions parsing: Mathematica and Maxima parsers physics: physical units, quantum stuff plotting: 2D and 3D plots using Pyglet polys: polynomial algebra, factorization printing: pretty-printing, code generation series: symbolic limits and truncated series simplify: rewrite expressions in other forms solvers: algebraic, recurrence, differential statistics: standard probability distributions utilities: test framework, compatibility stuf There are numerous symbolic computing systems available in various mathematical toolkits. There are some proprietary software such as Maple/ Mathematica and there are some open source alternatives also such as Singular/ AXIOM. However, these products have their own scripting language, difficult to extend their functionality and having slow development cycle. Whereas SymPy is highly extensible, designed and developed in Python language and open source API that supports speedy development life cycle. Simple exemplary programs These are some very simple examples to get idea about the capacity of SymPy. These are less than ten lines of SymPy source codes which covers topics ranging from basis symbol manipulations to limits, differentiations and integrations. We can test the execution of these programs on SymPy live running SymPy online on Google App Engine available on http://live.sympy.org/. Basic symbol manipulation The following code is defines three symbols, an expression on these symbols and finally prints the expression. import sympy a = sympy.Symbol('a') b = sympy.Symbol('b') c = sympy.Symbol('c') e = ( a * b * b + 2 * b * a * b) + (a * a + c * c) print e Output: a**2 + 3*a*b**2 + c**2     (here ** represents power operation). Expression expansion in SymPy The following program demonstrates the concept of expression expansion. It defines two symbols and a simple expression on these symbols and finally prints the expression and its expanded form. import sympy a = sympy.Symbol('a') b = sympy.Symbol('b') e = (a + b) ** 4 print e print e.expand() Output: (a + b)**4 a**4 + 4*a**3*b + 6*a**2*b**2 + 4*a*b**3 + b**4 Simplification of expression or formula The SymPy has facility to simplify the mathematical expressions. The following program is having two expressions to simplify and displays the output after simplifications of the expressions. import sympy x = sympy.Symbol('x') a = 1/x + (x*exp(x) - 1)/x simplify(a) simplify((x ** 3 +  x ** 2 - x - 1)/(x ** 2 + 2 * x + 1)) Output: ex x – 1 Simple integrations The following program is calculates the integration of two simple functions. import sympy from sympy import integrate x = sympy.Symbol('x') integrate(x ** 3 + 2 * x ** 2 + x, x) integrate(x / (x ** 2 + 2 * x), x) Output: x**4/4+2*x**3/3+x**2/2 log(x + 2) Summary In this article, we have discussed the concepts, features and selective sample programs of various scientific computing APIs and toolkits. The article started with a discussion of NumPy and SciPy. After covering NymPy, we have discussed concepts associated with symbolic computing and SymPy. In the remaining article we have discussed the Interactive computing and data analysis & visualization alog with their APIs or toolkits. IPython is the python toolkit for interactive computing. We have also discussed the data analysis package Pandas and the data visualization API names Matplotlib. Resources for Article: Further resources on this subject: Optimization in Python [article] How to do Machine Learning with Python [article] Bayesian Network Fundamentals [article]
Read more
  • 0
  • 0
  • 9389
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-common-qlikview-script-errors
Packt
22 Nov 2013
3 min read
Save for later

Common QlikView script errors

Packt
22 Nov 2013
3 min read
(For more resources related to this topic, see here.) QlikView error messages displayed during the running of the script, during reload, or just after the script is run are key to understanding what errors are contained in your code. After an error is detected and the error dialog appears, review the error, and click on OK or Cancel on the Script Error dialog box. If you have the debugger open, click on Close, then click on Cancel on the Sheet Properties dialog. Re-enter the Script Editor and examine your script to fix the error. Errors can come up as a result of syntax, formula or expression errors, join errors, circular logic, or any number of issues in your script. The following are a few common error messages you will encounter when developing your QlikView script. The first one, illustrated in the following screenshot, is the syntax error we received when running the code that missed a comma after Sales. This is a common syntax error. It's a little bit cryptic, but the error is contained in the code snippet that is displayed. The error dialog does not exactly tell you that it expected a comma in a certain place, but with practice, you will realize the error quickly. The next error is a circular reference error. This error will be handled automatically by QlikView. You can choose to accept QlikView's fix of loosening one of the tables in the circular reference (view the data model in Table Viewer for more information on which table is loosened, or view the Document Properties dialog, Tables tab to find out which table is marked Loosely Coupled). Alternatively, you can choose another table to be loosely coupled in the Document Properties, Tables tab, or you can go back into the script and fix the circular reference with one of the methods. The following screenshot is a warning/error dialog displayed when you have a circular reference in a script: Another common issue is an unknown statement error that can be caused by an error in writing your script—missed commas, colons, semicolons, brackets, quotation marks, or an improperly written formula. In the case illustrated in the following screenshot, the error has encountered an unknown statement—namely, the Customers line that QlikView is attempting to interpret as Customers Load *…. The fix for this error is to add a colon after Customers in the following way: Customers: There are instances when a load script will fail silently. Attempting to store a QVD or CSV to a file that is locked by another user viewing it is one such error. Another example is when you have two fields with the same name in your load statement. The debugger can help you find the script lines in which the silent error is present. Summary In this article we learned about QlikView error messages displayed during the script execution. Resources for Article: Further resources on this subject: Meet QlikView [Article] Introducing QlikView elements [Article] Linking Section Access to multiple dimensions [Article]
Read more
  • 0
  • 0
  • 9388

article-image-setting-microsoft-bot-framework-dev-environment
Packt
30 Dec 2016
8 min read
Save for later

Setting up Microsoft Bot Framework Dev Environment

Packt
30 Dec 2016
8 min read
In this article by Kishore Gaddam, author of the book Building Bots with Microsoft Bot Framework, we introduced what is Microsoft Bot Framework and how it helps in the development of bots. (For more resources related to this topic, see here.) Since past several decades, the corporate, government, and business world has experienced several waves of IT architecture foundations, moving from mainframes, to minicomputers, to distributed PCs, to the Internet, to social/mobile and now the Cloud/Internet of Thuings (IoT) Stack. We call this the Sixth wave of Corporate IT, and like its predecessors, Cloud and IoT technologies are causing significant disruption and displacement, even while it drives new levels of productivity. Each architecture focused on key business processes and supported killer technology applications to drive new levels of value. Very soon we will be looking at an enormous networked interconnection of everyday machines to one another, as well as to humans. Machine-to-machine-to-human connectivity will have a profound impact on the consumer and corporate IT experience. As these machines become social and talkto us, we have enormous opportunity to greatly enhance their value proposition through improved product quality, customer experience, and lowered cost of operations. A heightened consumer expectation for more personal and real-time interactions is driving business to holistically embrace the next wave of technology innovation like Cloud, IoT, and Bots to boost business performance. In this age of billions of connected devices, there is a need for such a technology where our apps could talk back, like bots? Bots that have specific purposes and talk to any device or any app or to anyone, Bots that live in cloud, Bots that we can talk to you via any communication channel such as email, text, voice, chat, and others. Bots can go where no apps have gone before when it comes to machine-to-machine-to-human connectivity. And to make this happen we will need a whole new platform. A Platform for Conversations. Conservation as a Service (CaaS) Messaging apps in general are becoming a second home screen for many people, acting as their entry point into the internet. And where the youngins are, the brands will follow. Companies are coming to messaging apps as bots and apps, to offer everything from customer service to online shopping and banking. Conversations are shaping up be the next major human-computer interface. Thanks to advances in natural language processing and machine learning, the tech is finally getting fast and accurate enough to be viable. Imagine a platform where language is the new UI layer. When we talk about conversation as a platform, there are 3 parts: There are people talking to people – Skype translator as an ex where people can communicate across cross languages Then there is the presence or being able to enhance a conversation by the ability to be present and interact remotely Then there is personal assistance and the bots Think of Bots as the new mechanism that you can converse with. Instead of looking through multiple mobile apps or pages and pages of websites, you can call on any application as a bot within the conversational canvas. Bots are the new apps and digital assistants are the meta apps. This way intelligence is infused into all our interactions. This leads us to Microsoft Bot Framework, which is a comprehensive offering from Microsoft to build and deploy high quality bots for your users to interact using Conversation as a Platform (CaaP). This is a framework that lets you build and connect intelligent bots. The idea is that they interact naturally wherever your users are talking, like Skype, Slack, Facebook Messenger, Text/SMS, and others. Basically any kind of channel that you use today as a human being to talk to other people, well, you will be able to use them to talk to bots all using natural language. Microsoft Bot Framework is a Microsoft operated CaaP service and an open source SDK. Bot Framework is one of the many tools Microsoft is offering to for building a complete Bot. Other tools include Language Understanding Intelligent Service (LUIS), Speech APIs, Microsoft Azure, Cortana Intelligence Suit and many more. Your Bot The Microsoft Bot Builder SDK is one of three main components of the Microsoft Bot Framework. First you have to build your bot. Your bot lives in the cloud and you host it yourself. You write it just like a web service component using Node.js or C#, like a ASP.NET WebAPI component. Microsoft Bot builder SDK is open source and so you will have more languages and web stack get supported over time. Your bot will have its own logic, but you also need a conversation logic using dialogs to model a conversation. The Bot builder SDK gives you facilities for this and there are many types of dialogs that are included from simple Yes/No questions to full natural language understanding or LUIS, which is one of the API's provided in Microsoft Cognitive Services: Bot Connector Bot Connector is hosted and operated by Microsoft. Think of it as a central router between your bots and many channels to communicate with your bots. Apart from routing messages, it would be managing state within the conversation. The Bot Connector is an easy way to create a single back-end and then publish to a bunch of different platforms called channels. Bot Directory Bot Directory is where user will be able to find bots. It's like app store for mobile apps. The Bot Directory is a public directory of all reviewed bots registered through the developer portal. Users will be able to discover, try, and add bots to their favorite conversation experiences from the Bot Directory. Anyone can access it and anyone can submit Bots to the directory. As you begin your development with Microsoft Bot Framework, you might be wondering how to best get started. Bots can be built in C#, however, Microsoft's Bot Framework can also be used to build bots using Node.js. For developing any bots, we need to first setup the development environment and have the right tools installed for successfully developing and deploying a bot. Let's see how we can setup a development environment using Visual Studio. Setting up development environment Let's first look at the Prerequisites required to set up the development environment: Prerequisites To use the Microsoft Bot Framework Connector, you must have: A Microsoft Account (Hotmail, Live, or Outlook) to log into the Bot Framework developer portal, which you will use to register your Bot. An Azure subscription (Free trial: https://azure.microsoft.com/en-us/). This Azure subscription is essential for having an Azure-accessible REST endpoint exposing a callback for the Connector service. Developer accounts on one or more communication services (such as Skype, Slack, Facebook) where your Bot will communicate. In addition, you may wish to have an Azure App Insights account so you can capture telemetry from your Bot. There are additionally different ways to go about building a Bot; from scratch, coded directly to the Bot Connector REST API, the Bot Builder SDK's for Node.js and .NET, and the Bot Connector .NET template which is what this quick start guide demonstrates. Setting up Bot Framework Connector SDK .NET This is a step-by-step guide to setting up dev environment to develop a Bot in C# using the Bot Framework Connector SDK .NET template: Install prerequisite software Visual Studio 2015 (latest update) - you can download the community version here for free: www.visualstudio.com Important: Please update all Visual Studio extensions to their latest versions to do so navigate to Tools | Extensions and Updates | Updates Download and install the Bot Application template: Download the file from the direct download link at http://aka.ms/bf-bc-vstemplate Save the zip file to your Visual Studio 2015 templates directory which is traditionally in %USERPROFILE%DocumentsVisual Studio 2015TemplatesProjectTemplatesVisual C# Open Visual Studio. Create a new C# project using the new Bot Application template. The template is a fully functional Echo Bot that takes the user's text utterance as input and returns it as output. In order to run however: The bot has to be registered with Bot Connector The AppId and AppPassword from the Bot Framework registration page have to be recorded in the project's web.config The project needs to be published to the web Emulator Use the Bot Framework Emulator to test your Bot application. The Bot Framework provides a channel emulator that lets you test calls to your Bot as if it were being called by the Bot Framework cloud service. To install the Bot Framework Emulator, download it from https://download.botframework.com/bf-v3/tools/emulator/publish.html. One installed, you're ready to test. First, start your Bot in Visual Studio using a browser as the application host. The following screenshot uses Microsoft Edge: Summary In this article, we introduced what is Microsoft Bot Framework and how it helps in the development of bots. Also, we have seen how to setup development environment, Emulator and the tools needed for programming. This article is based on the thought that programming knowledge and experience grow best when they grow together.  Resources for Article: Further resources on this subject: Talking to Bot using Browser [article] Webhooks in Slack [article] Creating our first bot, WebBot [article]
Read more
  • 0
  • 0
  • 9384

article-image-pfsense-essentials
Packt
06 Sep 2016
60 min read
Save for later

pfSense Essentials

Packt
06 Sep 2016
60 min read
In this article by David Zientara, the author of the book Mastering pfSense, While high-speed Internet connectivity is becoming more and more common, many in the online world—especially those with residential connections or small office/home office (SOHO) setups—lack the hardware to fully take advantage of those speeds. Fiber optic technology brings with it the promise of a gigabit speed or greater, and the technology surrounding traditional copper networks is also yielding improvements. Yet many people are using consumer-grade routers that offer, at best, mediocre performance. (For more resources related to this topic, see here.) pfSense, an open source router/firewall solution is a far better alternative that is available to you. You have likely already downloaded, installed, and configured pfSense, possibly in a residential or SOHO environment. As an intermediate-level pfSense user, you do not need to be sold on the benefits of pfSense. Nevertheless, you may be looking to deploy pfSense in a different environment (for example, a corporate network), or you may just be looking to enhance your knowledge of pfSense. This chapter is designed to review the process of getting your pfSense system up and running. It will guide you through the process of choosing the right hardware for your deployment, but it will not provide a detailed treatment of installation and initial configuration. The emphasis will be on troubleshooting, as well as some of the newer configuration options. Finally, the article will provide a brief treatment of how to upgrade, back up, and restore pfSense. This article will cover the following topics: A brief overview of the pfSense project pfSense deployment scenarios Minimum specifications and hardware sizing guidelines An introduction to Virtual local area networks (VLANs) and Domain Name System (DNS) The best practices for installation and configuration Basic configuration from both the console and the pfSense web GUI Upgrading, backing up, and restoring pfSense pfSense project overview The origins of pfSense can be traced to the OpenBSD packet filter known as PF, which was incorporated into FreeBSD in 2001. As PF is limited to a command-line interface, several projects have been launched in order to provide a graphical interface for PF. m0n0wall, which was released in 2003, was the earliest attempt at such a project. pfSense began as a fork of the m0n0wall project. Version 1.0 of pfSense was released on October 4, 2006. Version 2.0 was released on September 17, 2011. Version 2.1 was released on September 15, 2013, and Version 2.2 was released on January 23, 2015. As of writing this, Version 2.2.6 (released on December 21, 2015) is the latest version. Version 2.3 is expected to be released soon, and will be a watershed release in many respects. The web GUI has had a major facelift, and support for some legacy technologies is being phased out. Support for Point-to-Point Tunnelling Protocol (PPTP) will be discontinued, as will support for Wireless Encryption Protocol (WEP). The current version of pfSense incorporates such functions as traffic shaping, the ability to act as a Virtual Private Network (VPN) client or server, IPv6 support, and through packages, intrusion detection and prevention, the ability to act as a proxy server, spam and virus blocking, and much more. Possible deployment scenarios Once you have decided to add a pfSense system to your network, you need to consider how it is going to be deployed on your network. pfSense is suitable for a variety of networks, from small to large ones, and can be employed in a variety of deployment scenarios. In this article, we will cover the following possible uses for pfSense: Perimeter firewall Router Switch Wireless router/wireless access point The most common way to add pfSense to your network is to use it as a perimeter firewall. In this scenario, your Internet connection is connected to one port on the pfSense system, and your local network is connected to another port on the system. The port connected to the Internet is known as the WAN (wide area network) interface, and the port connected to the local network is known as the LAN (local area network) interface. Diagram showing a deployment in which pfSense is the perimeter firewall. If pfSense is your perimeter firewall, you may choose to set it up as a dedicated firewall, or you might want to have it perform the double duty of a firewall and a router. You may also choose to have more than two interfaces in your pfSense system (known as optional interfaces). In order to act as a perimeter firewall, however, a pfSense system requires at least two interfaces: a WAN interface (to connect to outside networks), and a LAN interface (to connect to the local network). In more complex network setups, your pfSense system may have to exchange routing information with other routers on the network. There are two types of protocols for exchanging such information: distance vector protocols obtain their routing information by exchanging information with neighboring routers; Routers that use link-state protocols to build a map of the network in order to calculate the shortest path to another router, with each router calculating distances independently. pfSense is capable of running both types of protocols. Packages are available for distance vector protocols such as RIP and RIPv2, and link-state protocols such as Border Gateway Protocol (BGP). Another common deployment scenario is to set up pfSense as a router. In a home or SOHO environment, firewall and router functions are often performed by the same device. In mid-sized to large networks, however, the router is a device separate from that of the perimeter firewall. In larger networks, which have several network segments, pfSense can be used to connect these segments. In corporate-type environments, these are often used in conjunction, which allows a single network interface card (NIC) to operate in multiple broadcast domains via 802.1q tagging. VLANs are often used with the ever-popular router on a stick configuration, in which the router has a single physical connection to a switch, with the single Ethernet interface divided into multiple VLANs, and the router forwarding packets between the VLANs. One of the advantages of this setup is that it only requires a single port, and, as a result, it allows us to use pfSense with systems on when adding another NIC would be cumbersome or even impossible: for example, a laptop or certain thin clients. In most cases, where pfSense is deployed as a router on mid-sized and large networks, it would be used to connect different LAN segments; however, it could also be used as a WAN router. In this case, pfSense's function would be to provide a private WAN connection to the end user. Another possible deployment scenario is to use pfSense as a switch. If you have multiple interfaces on your pfSense system and bridge them together, pfSense can function as a switch. This is a far less common scenario, however, for several reasons: Using pfSense as a switch is generally not cost-effective. You can purchase a 5-port Ethernet switch for less than what it would cost to purchase the hardware for a pfSense system. Buying a commercially available switch will also save you money in the long run, as they likely would consume far less power than whatever computer you would be using to run pfSense. Commercially available switches will likely outperform pfSense, as pfSense will process all packets that pass between ports, while a typical Ethernet switch will handle it locally with dedicated hardware made specifically for passing data between ports quickly. While you can disable filtering entirely in pfSense if you know what you're doing, you will still be limited by the speed of the bus on which your network cards reside, whether it is PCI, PCI-X, or PCI Express (PCI-e). There is also the administrative overhead of using pfSense as a switch. Simple switches are designed to be plug-and-play, and setting up these switches is as easy as plugging in your Ethernet cables and the power cord. Managed switches typically enable you to configure settings at the console and/or through a web interface, but in many cases, configuration is only necessary if you want to modify the operation of the switch. If you use pfSense as a switch, however, some configuration will be required. If none of this intimidates you, then feel free to use pfSense as a switch. While you're not likely to achieve the performance level or cost savings of using a commercially available switch, you will likely learn a great deal about pfSense and networking in the process. Moreover, advances in hardware could make using pfSense as a switch viable at some point in the future. Advances in low-power consumption computers are one factor that could make this possible. Yet another possibility is using pfSense as a wireless router/access point. A sizable proportion of modern networks incorporate some type of wireless connectivity. Connecting to networks wireless is not only easier, but in some cases, running Ethernet cable is not a realistic option. With pfSense, you can add wireless networking capabilities to your system by adding a wireless network card, provided that the network card is supported by FreeBSD. Generally, however, using pfSense as a wireless router or access point is not the best option. Support for wireless network cards in FreeBSD leaves something to be desired. Support for the IEEE's 802.11b and g standards is OK, but support for 802.11n and 802.11ac is not very good. A more likely solution is to buy a wireless router (even if it is one of the aforementioned consumer-grade units), set it up to act solely as an access point, connect it to the LAN port of your pfSense system, and let pfSense act as a Dynamic Host Configuration Protocol (DHCP) server. A typical router will work fine as a dedicated wireless access point, and they are more likely to support the latest wireless networking standards than pfSense. Another possibility is to buy a dedicated wireless access point. These are generally inexpensive and some have such features as multiple SSIDs, which allow you to set up multiple wireless networks (for example, you could have a separate guest network which is completely isolated from other local networks). Using pfSense as a router, in combination with a commercial wireless access point, is likely the least troublesome option. Hardware requirements and sizing guidelines Once you have decided where to deploy pfSense on your network, you should have a clearer idea of what your hardware requirements are. As a minimum, you will need a CPU, motherboard, memory (RAM), some form of disk storage, and at least two network interfaces (unless you are opting for a router on a stick setup, in which case you only need one network interface). You may also need one or more optional interfaces. Minimum specifications The starting point for our discussion on hardware requirements is the pfSense minimum specifications. As of January 2016, the minimum hardware requirements are as follows (these specifications are from the official pfSense site, pfsense.org): CPU – 500 MHz (1 GHz recommended) RAM – 256 MB (1 GB recommended) There are two architectures currently supported by pfSense: i386 (32-bit) and amd64 (64-bit). There are three separate images provided for these architectures: CD, CD on a USB memstick, and embedded. There is also an image for the Netgate RCC-VE 2440 system. A pfSense installation requires at least 1 GB of disk space. If you are installing to an embedded device, you can access the console either by a serial or VGA port. A step-by-step installation guide for the pfSense Live CD can be found on the official pfSense website at: https://doc.pfsense.org/index.php/PfSense_IO_installation_step_by_step. Version 2.3 eliminated the Live CD, which allowed you to try out pfSense without installing it onto other media. If you really want to use the Live CD, however, you could use a pre-2.3 image (version 2.2.6 or earlier). You can always upgrade to the latest version of pfSense after installation. Installation onto either a hard disk drive (HDD) or an SSD is the most common option for a full install of pfSense, whereas embedded installs typically use CF, SD, or USB media. A full install of the current version of pfSense will fit onto a 1 GB drive but will leave little room for installation of packages or for log files. Any activity that requires caching, such as running a proxy server, will also require additional disk space. The last installation option in the table is installation onto an embedded system. For the embedded version, pfSense uses NanoBSD, a tool for installing FreeBSD onto embedded systems. Such an install is ideal for a dedicated appliance (for example, a VPN server), and is geared toward fewer file writes. However, embedded installs cannot run some of the more interesting packages. Hardware sizing guidelines The minimum hardware requirements are general guidelines, and you may want to exceed these minimums based on different factors. It may be useful to consider these factors when determining what CPU, memory, and storage device to use. For the CPU, requirements increase for faster Internet connections. Guidelines for the CPU and network cards can be found at the official pfSense site at http://pfsense.org/hardware/#requirements. The following general guidelines apply: The minimum hardware specifications (Intel/AMD CPU of 500 MHz or greater) are valid up to 20 Mbps. CPU requirements begun to increase at speeds greater than 20 Mbps. Connections of 100 Mbps or faster will require PCI-e network adapters to keep up with the increased network throughput. If you intend to use pfSense to bridge interfaces—for example, if you want to bridge a wireless and wired network, or if you want to use pfSense as a switch—then the PCI bus speed should be considered. The PCI bus can easily become a bottleneck. Therefore, in such scenarios, using PCI-e hardware is the better option, as it offers up to 31.51 GB/s (for PCI-e v. 4.0 on a 16-lane slot) versus 533 MB/s for the fastest conventional PCI buses. If you plan on using pfSense as a VPN server, then you should take into account the effect VPN usage will have on the CPU. Each VPN connection requires the CPU to encrypt traffic, and the more connections there are, the more the CPU will be taxed. Generally, the most cost-effective solution is to use a more powerful CPU. But there are ways to reduce the CPU load from VPN traffic. Soekris has the vpn14x1 product range; these cards offload the CPU of the computing intensive tasks of encryption and compression. AES-NI acceleration of IPsec also significantly reduces the CPU requirements. If you have hundreds of simultaneous captive portal users, you will require slightly more CPU power than you would otherwise. Captive portal usage does not put as much of a load on the CPU as VPN usage, but if you anticipate having a lot of captive portal users, you will want to take this into consideration. If you're not a power user, 256 MB of RAM might be enough for your pfSense system. This, however, would leave little room for the state table (where, as mentioned earlier, active connections are tracked). Each state requires about 1 KB of memory, which is less memory than some consumer-grade routers require, but you still want to be mindful of RAM if you anticipate having a lot of simultaneous connections. The other components of pfSense require 32 to 48 MB of RAM, and possibly more, depending on which features you are using, so you have to subtract that from the available memory in calculating the maximum state table size. RAM Maximum Connections (States) 256 MB ~22,000 connections 512 MB ~46,000 connections 1 GB ~93,000 connections 2 GB ~190,000 connections Installing packages can also increase your RAM requirements; Snort and ntop are two such examples. You should also probably not install packages if you have limited disk space. Proxy servers in particular use up a fair amount of disk space, which is something you should probably consider if you plan on installing a proxy server such as Squid. The amount of disk space, as well as the form of storage you utilize, will likely be dictated by what packages you install, and what forms of logging you will have enabled. Some packages are more taxing on storage than others. Some packages require more disk space than others. Proxies such as Squid store web pages; anti-spam programs such as pfBlocker download lists of blocked IP addresses, and therefore require additional disk space. Proxies also tend to perform a great deal of read and write operations; therefore, if you are going to install a proxy, disk I/O performance is something you should likely take into consideration. You may be tempted to opt for the cheapest NICs. However, inexpensive NICs often have complex drivers that offload most of the processing to the CPU. They can saturate your CPU with interrupt handling, thus causing missed packets. Cheaper network cards typically have smaller buffers (often no more than 300 KB), and when the buffers become full, packets are dropped. In addition, many of them do not support Ethernet frames that are larger than the maximum transmission unit (MTU) of 1500 bytes. NICs that do not support larger frames cannot send or receive jumbo frames (frames with an MTU larger than 1500 bytes), and therefore they cannot take advantage of the performance improvement that using jumbo frames would bring. In addition, such NICs will often have problems with VLAN traffic, since a VLAN tag increases the size of the Ethernet header beyond the traditional size limit. The pfSense project recommends NICs based on Intel chipsets, and there are several reasons why such NICs are considered reliable. They tend to have adequately sized buffers, and do not have problems processing larger frames. Moreover, the drivers tend to be well-written and work well with Unix-based operating systems. For a typical pfSense setup, you will need two network interfaces: one for the WAN and one for the LAN. Each additional subnet (for example, for a guest network) will require an additional interface, as will each additional WAN interface. It should be noted that you don't need an additional card for each interface added; you can buy a multiport network card (most such cards have either 2 or 4 ports). You don't need to buy new NICs for your pfSense system; in fact, it is often economical to buy used NICs, and except in rare cases, the performance level will be the same. If you want to incorporate wireless connectivity into your network, you may consider adding a wireless card to your pfSense system. As mentioned earlier, however, the likely better option is to use pfSense in conjunction with a separate wireless access point. If you do decide to add a wireless card to your system and configure it for use as an access point, you will want to check the FreeBSD hardware compatibility list before making a purchase. Using a laptop You might be wondering if using an old laptop as a pfSense router is a good idea. In many respects, laptops are good candidates for being repurposed into routers. They are small, energy efficient, and when the AC power shuts off, they run on battery power, so they have a built-in uninterruptable power supply (UPS). Moreover, many old laptops can be purchased relatively cheaply at thrift shops and online. There is, however, one critical limitation to using a laptop as a router: in almost all cases, they only have one Ethernet port. Moreover, there is often no realistic way to add another NIC: as there are no expansion slots that will take another NIC (some, however, do have PCMCIA slots that will take a second NIC). There are gigabit USB-to-Ethernet adapters (for USB 3.0), but this is not much of a solution. Such adapters do not have the reliability of traditional NICs. Most laptops do not have Intel NICs either; high-end business laptops are usually the exception to this rule. There is a way to use a laptop with a single Ethernet port as a pfSense router, and that is to configure pfSense using VLANs. As mentioned earlier, VLANs, or virtual LANs, allow us to use a single NIC to serve multiple subnets. Thus, we can set up two VLANs on our single port: virtual LAN #1, which we will use for the WAN interface, and virtual LAN #2, which we will use for the LAN interface. The one disadvantage of this setup is that you must use a managed switch to make this work. Managed switches are switches that can usually be configured and managed as groups, they often have both a command-line and web interface for management, and they often have a wide range of capabilities, such as VLANs. Since unmanaged switches forward traffic to all other ports, they are unsuitable for this setup. You can, however, connect an unmanaged switch to the managed switch to add ports. Keep in mind that managed switches are expensive (more expensive than dual and quad port network cards), and if there are multiple VLANs on a single link, this link can easily become overloaded. In scenarios where you can add a network card, this is usually the better option. If you have an existing laptop, however, a managed switch with VLANs is a workable solution. Introduction to VLANs and DNS Two of the areas in which pfSense excels is in incorporating functionality to implement VLANs and DNS servers. First, let's consider why we would want to implement these. Introduction to VLANs The standard way to partition your network is to use a router to pass traffic between networks, and configure a separate switch (or switches) for each network. In this scenario, there is a one-to-one relationship between the number of network interfaces and the number of physical ports. This works well in many network deployments, especially in small networks. As the network gets larger, however, there are issues with this type of configuration. As the number of users on the network increases, we are faced with a choice of either having more users on each subnet, or increasing the number of subnets (and therefore the number of network interfaces on the router). Both solutions also create new problems: Each subnet makes up a separate broadcast domain. Increasing the number of users on a subnet increases the amount of broadcast traffic, which can bog down our network. Each user on a subnet can use a packet sniffer to sniff network traffic, which creates a security problem. Segmenting the network by adding subnets tends to be costly, as each new subnet requires a separate switch. VLANs offer us a way out of this dilemma with relatively little downside. VLANs allow us to divide traffic on a single network interface (for example, LAN) into several separate networks, by adding a special tag to frames entering the network. This tag, known as an 802.1q tag, identifies which VLAN to which the device belongs. Dividing network traffic in such a way offers several advantages: As each VLAN constitutes a separate broadcast domain, broadcast domains are now smaller, and thus there is less network traffic. Users on one VLAN cannot sniff traffic from another VLAN, even if they are on the same physical interface, thus improving security. Using VLANs requires us to have a managed switch on the interface on which VLANs exist. This is somewhat more expensive than an unmanaged switch, but the cost differential between a managed and unmanaged switch is less than it might be if we had to buy additional switches for new subnets. As a result, VLANs are often the most efficient way of making our networks more scalable. Even if your network is small, it might be advantageous to at least consider implementing a VLAN, as you will likely want to make future growth as seamless as possible. Introduction to DNS The DNS provides a means of converting an easy-to-remember domain name with a numerical (IP) address. It thus provides us with a phone book for the Internet as well as providing a structure that is both hierarchical (there is the root node, which covers all domain names, top-level domains like .com and .net, domain names and subdomain names) and decentralized (the Internet is divided into different zones, and a name server is authoritative for a specific zone). In a home or SOHO environment, we might not need to implement our own DNS server. In these scenarios, we could use our ISP's DNS servers to resolve Internet hostnames. For local hostnames, we could rely on NetBIOS under Windows, the Berkeley Internet Name Domain service (BIND) under Linux (using a configuration that does not require us to run name servers), or osx under Mac OS X. Another option for mapping hostnames to IP addresses on the local network would be to use HOSTS.TXT. This is a text file, which contains a list of hostnames and corresponding IP addresses. But there are certain factors that may prompt us to set up our own DNS server for our networks: We may have chosen to utilize HOSTS.TXT for name resolution, but maintaining the HOSTS.TXT file on each of the hosts on our network may prove to be too difficult. If we have roaming clients, it may even be impossible. If your network is hosting resources that are available externally (for example, an FTP server or a website), and you are constantly making changes to the IP addresses of these resources, you will likely find it much easier to update your own data rather than submit forms to third parties and wait for them to implement the changes. Although your DNS server will only be authoritative for your domains, it can cache DNS data from the rest of the Internet. On your local network, this cached data can be retrieved much faster than DNS data from a remote DNS server. Thus, maintaining your own DNS server should result in faster name resolution. If you anticipate ever having to implement a public DNS server, a private DNS server can be a good learning experience, and if you make mistakes in implementing a private DNS server, the consequences are not as far-reaching as they would be with a public one. Implementing a DNS server with pfSense is relatively easy. By using the DNS resolver, we can have pfSense answer DNS queries from local clients, and we can also have pfSense utilize any currently available DNS servers. We can also use third-party packages such as dns-server (which is a pfSense version of TinyDNS) to add DNS server functionality. The best practices for installation and configuration Once you have chosen your hardware and which version you are going to install, you can download pfSense. Browse to the Downloads section of pfsense.org and select the appropriate computer architecture (32-bit, 64-bit, or Netgate ADI), the appropriate platform (Live CD, memstick, or embedded), and you should be presented with a list of mirrors. Choose the closest one for the best performance. You will also want to download the MD5 checksum file in order to verify the integrity of the downloaded image. Windows has several utilities for displaying MD5 hashes for a file. Under BSD and Linux, generating the MD5 hash is as easy as typing the following command: md5 pfSense-LiveCD-2.2.6-RELEASE-amd64.iso This command would generate the MD5 checksum for the 64-bit Live CD version for pfSense 2.2.6. Compare the resulting hash with the contents of the .md5 file downloaded from the pfSense website. If you are doing a full install from the Live CD or memory stick, then you just need to write the ISO to the target media, boot from either the CD or memory stick, perform some basic configuration, and then invoke the installer. The embedded install is done from a compact flash (CF) card and console data can be sent to either a serial port or the VGA port, depending on which embedded configuration you chose. If you use the serial port version, you will need to connect the embedded system to another computer with a null modem cable. Troubleshooting installation In most cases, you should be able to invoke the pfSense installer and begin installing pfSense onto the system. In some cases, however, pfSense may not boot from the target media, or the system may hang during the boot process. If pfSense is not booting at all, you may want to check to make sure the system is set up to boot from the target media. This can be done by changing the boot sequence in the BIOS settings (which can be accessed during system boot, usually by hitting the Delete key). Most computers also have a means of choosing the boot device on a one-time basis during the boot sequence. Check your motherboard's manual on how to do this. If the system is already set up to boot from the target media, then you may want to verify the integrity of the pfSense image again, or repeat the process of writing the images to the target media. The initial pfSense boot menu when booting from a CD or USB flash drive. If the system hangs during the boot process, there are several options you can try. The first menu that appears, as pfSense boots, has several options. The last two options are Kernel and Configure Boot Options. Kernel allows you to select which kernel to boot from among the available kernels. If you have a reason to suspect that the FreeBSD kernel being used is not compatible with your hardware, you might want to switch to the older version. Configure Boot Options launches a menu (shown in the preceding screenshot) with several useful options. A description of these options can be found at: http://www.freebsd.org/doc/handbook/book.html. Toggling [A]CPI Support to off can help in some cases, as ACPI's hardware discovery and configuration capabilities may cause the pfSense boot process to hang. If turning this off doesn't work, you could try booting in Safe [M]ode, and if all else fails, you can toggle [V]erbose mode to On, which will give you detailed messages while booting. The two options after boot are [R]ecovery, and [I]nstaller. The [R]ecovery mode provides a shell prompt and helps recover from a crash by retrieving config.xml from a crashed hard drive. [I]nstaller allows you to install pfSense onto a hard drive or other media, and gets invoked by default after the timeout period. The installer provides you with the option to either do a quick install or a custom install. In most cases, the quick install option can be used. Invoking the custom install option is only recommended if you want to install pfSense on a drive other than the first drive on the target system, or if you want to install multiple operating systems on the system. It is not likely that either of these situations will apply, unless you are installing pfSense for evaluation purposes (and in such cases, you would probably have an easier time running pfSense on a virtual machine). If you were unable to install pfSense on to the target media, you may have to troubleshoot your system and/or installation media. If you are attempting to install from the CD, your optical drive may be malfunctioning, or the CD may be faulty. You may want to start with a known good bootable disc and see if the system will boot off of it. If it can, then your pfSense disc may be at fault; burning the disc again may solve the problem. If, however, your system cannot boot off the known good disc, then the optical drive itself, or the cables connecting the optical drive to the motherboard, may be at fault. In some cases, however, none of the aforementioned possibilities hold true, and it is possible that the FreeBSD boot loader will not work on the target system. If so, then you could opt to install pfSense on a different system. Another possibility is to install pfSense onto a hard drive on a separate system, then transfer the hard drive into the target system. In order to do this, go through the installation process on another system as you would normally until you get to the Assign Interfaces prompt. When the installer asks if you want to assign VLANS, type n. Type exit at the Assign Interfaces prompt to skip the interface assignment. Proceed through the rest of the installation; then power down the system and transfer the hard drive to the target system. Assuming that the pfSense hard drive is in the boot sequence, the system should boot pfSense and detect the system's hardware correctly. Then you should be able to assign network interfaces. The rest of the configuration can then proceed as usual. If you have not encountered any of these problems, the software should be installed on the target system, and you should get a dialog box telling you to remove the CD from the optical drive tray and press Enter. The system will now reboot, and you will be booting into your new pfSense install for the first time. pfSense configuration Configuration takes place in two phases. Some configuration must be done at the console, including interface configuration and interface IP address assignment. Some configuration steps, such as VLAN and DHCP setup, can be done both at the console and within the web GUI. Configuration from the console On boot, you should eventually see a menu identical to the one seen on the CD version, with the boot multi or single user options and other options. After a timeout period, the boot process will continue and you will get an options menu that is also identical to the CD version, except option 99 (installation option) will not be there. You should select 1 from the menu to begin interface assignment. This is where the network cards installed in the system are given their roles as WAN, LAN, and optional interfaces (OPT1, OPT2, and so on). If you select this option, you will be presented with a list of network interfaces. This list provides four pieces of information: pfSense's device name for the interface (fxp0, em1, and so on) The MAC address of the interface The link state of the interface (up if a link is detected; down otherwise) The manufacturer and model of the interface (Intel PRO 1000, for example) As you are probably aware, generally speaking, no two network cards have the same MAC address, so each of the interfaces in your system should have a unique MAC address. To begin the configuration, press 1 and Enter for the Assign Interfaces option. After that, a prompt will show up for VLAN configuration. Otherwise, type n and press Enter. Keep in mind that you can always configure VLANs later on. The interfaces must be configured, and you will be prompted for the WAN interface first. If you only configure one interface, it will be assigned to the WAN, and you will subsequently be able to login to pfSense through this port. This is not what you would normally want, as the WAN port is typically accessible from the other side of the firewall. Once at least one other interface is configured, you will no longer be able to login to pfSense from the WAN port. Unless you are using VLANs, you will have to set up at least two network interfaces. In pfSense, network interfaces are assigned rather cryptic device names (for example, fxp0, em1 and so on) and it is not always easy to know which ports correspond to particular device names. One way of solving this problem is to use the automatic interface assignment feature. To do this, unplug all network cables from the system and then type a and press Enter to begin auto-detection. The WAN interface is the first interface to be detected, so plug a cable into the port you intend to be the WAN interface. The process is repeated with each successive interface. The LAN interface is configured next, then each of the optional interfaces (OPT1, OPT2). If auto-detection does not work, or you do not want to use it, you can always choose manual configuration. You can always reassign network interfaces later on, so even if you make a mistake on this step, the mistake can be easily fixed. Once you have finished configuration, type y at the Do you want to proceed? prompt, or type n and enter to re-assign the interfaces. Option two on the menu is Set interface(s) IP address, and you will likely want to complete this step as well. When you invoke this option, you will be prompted to specify which interface's IP address is to be set. If you select WAN interface, you will be asked if you want to configure the IP address via DHCP. In most scenarios, this is probably the option you want to choose, especially if pfSense is acting as a firewall. In that case, the WAN interface will receive an IP address from your ISP's DHCP server. For all other interfaces (or if you choose not to use DHCP on the WAN interface), you will be prompted to enter the interface's IPv4 address. The next prompt will ask you for the subnet bit count. In most cases, you'll want to enter 8 if you are using a Class A private address, 16 for Class B, and 24 for Class C, but if you are using classless subnetting (for example, to divide a Class C network into two separate networks), then you will want to set the bit count accordingly. You will also be prompted for the IPv4 gateway address (any interface with a gateway set is a WAN, and pfSense supports multiple WANs); if you are not configuring the WAN interface(s), you can just hit Enter here. Next, you will be prompted to provide the address, subnet bit count, and gateway address for IPv6; if you want your network to fully utilize IPv6 addresses, you should enter them here. We have now configured as much as we need to from the console (actually, we have done more than we have to, since we really only have to configure the WAN interface from the console). The remainder of the configuration can be done from the pfSense web GUI. Configuration from the web GUI The pfSense web GUI can only be accessed from another PC. If the WAN was the only interface assigned during the initial setup, then you will be able to access pfSense through the WAN IP address. Once one of the local interfaces is configured (typically the LAN interface), pfSense can no longer be accessed through the WAN interface. You will, however, be able to access pfSense from the local side of the firewall (typically through the LAN interface). In either case, you can access the web GUI by connecting another computer to the pfSense system, either directly (with a crossover cable) or indirectly (through a switch), and then typing either the WAN or LAN IP address into the connected computer's web browser. The login screen should look similar to the following screenshot: The pfSense 2.3 web GUI login screen. When you initially log in to pfSense, the default username/password combination will be admin/pfsense respectively. On your first login, Setup Wizard will begin automatically. Click on the Next button to begin configuration. The first screen provides a link for information about a pfSense Gold subscription. You can click on the link to sign up, or click on the Next button. On the next screen, you will be prompted to enter the hostname of the router as well as the domain. Hostnames can contain letters, numbers and hyphens, but must begin with a letter. If you have a domain, you can enter it in the appropriate field. In the Primary DNS Server and Secondary DNS Server fields, you can enter your DNS servers. If you are using DHCP for your WAN, you can probably leave these fields blank, as they will usually be assigned automatically by your ISP. If you have alternate DNS servers you wish to use, you can enter them here. I have entered 8.8.8.8 and 8.8.4.4 as the primary and secondary DNS servers (these are two DNS servers run by Google that conveniently have easy to remember IP addresses). You can keep the Override DNS checkbox checked unless you have reason to use DNS servers other than the ones assigned by your ISP. Click on Next when finished. The next screen will prompt you for the Network Time Protocol (NTP) server as well as the local time zone. You can keep the default value for the server hostname for now. For the Timezone field, you should select the zone which matches your location and click on Next. The next screen of the wizard is the WAN configuration page. You will be prompted to select the WAN type. You can select either DHCP (the default type) or Static. If your pfSense system is behind another firewall and it is not going to receive an IP address from an upstream DHCP server, then you probably should choose Static. If pfSense is going to be a perimeter firewall, however, then DHCP is likely the correct setting, since your ISP will probably dynamically assign an IP address (this is not always the case, as you may have an IP address statically assigned to you by your ISP, but it is the more likely scenario). If you are not sure which WAN type to use, you will need to obtain this information from your ISP (the other choices are PPPoE, PPTP, and Static. PPPoE stands for Point-to-Point over Ethernet and PPTP stands for Point-to-Point Tunneling Protocol). The MAC address field allows you to enter a MAC address that is different from the actual MAC address of the WAN interface. This can be useful if your ISP will not recognize an interface with a different MAC address than the device that was previously connected, or if you want to acquire a different IP address (changing the MAC address will cause the upstream DHCP server to assign a different address). If you use this option, make sure the portion of the address reserved for the Organizationally Unique Identifier (OUI) is a valid OUI – in other words, an OUI assigned to a network card manufacturer. (The OUI portion of the address is the first three bytes of a MAC-48 address and the first five bytes of an EUI-48 address.) The next few fields can usually be left blank. Maximum Transmission Unit (MTU) allows you to change the MTU size if necessary. DHCP hostname allows you to send a hostname to your ISP when making a DHCP request, which is useful if your ISP requires this. Besides DHCP and Static, you can select PPTP or PPPoE as your WAN type. If you choose PPPoE, then there will be a field for a PPPoE Username, PPPoE Password, and PPPoE Server Name. The PPPoE dial-on-demand checkbox allows you to connect to your ISP only when a user requests data that requires an Internet connection. PPPoE Idle timeout specifies how long the connection will be kept open after transmitting data when this option is invoked. The Block RFC1918 Private Networks checkbox, if checked, will block registered private networks (as defined by RFC 1918) from connecting to the WAN interface. The Block Bogon Networks option blocks traffic from reserved and/or unassigned IP addresses. For the WAN interface, you should check both options unless you have special reasons for not invoking these options. Click the Next button when you are done. The next screen provides fields in which you can change LAN IP address and subnet mask, but only if you configured the LAN interface previously. You can keep the default, or change it to another value within the private address blocks. You may want to choose an address range other than the very common 192.168.1.x in order to avoid a conflict. Be aware that if you change the LAN IP address value, you will also need to adjust your PC's IP address, or release and renew its DHCP lease when finished with the network interface. You will also have to change the pfSense IP address in your browser to reflect the change. The final screen of the pfSense Setup Wizard allows you to change the admin password, which you should probably do. Enter the password, enter it again for confirmation in the next edit box, and click on Next. On the following screen, there will be a Reload button; click on Reload. This will reload pfSense with the new changes. Once you have completed the wizard, you should have network connectivity. Although there are other means of making changes to pfSense's configuration, if you want to repeat the wizard, you can do so by navigating to System | Setup Wizard. Completion of the wizard will take you to the pfSense dashboard. The pfSense dashboard, redesigned for version 2.3. Configuring additional interfaces By now, both the WAN and LAN interface configuration should be complete. Although additional interface configuration can be done at the console, it can also be done in the web GUI. To add optional interfaces, navigate to Interfaces | assign theInterface assignments tab will show a list of assigned interfaces, and at the bottom of the table, there will be an Available network ports: entry option. There will be a corresponding drop-down box with a list of unassigned network ports. These will have device names such as fxp0, em1, and so on. To assign an unused port, select the port you want to assign from the drop-down box, and click on the + button to the right. The page will reload, and the new interface will be the last entry in the table. The name of the interface will be OPTx, where x equals the number of optional interfaces. By clicking on interface name, you can configure the interface. Nearly all the settings here are similar to the settings that were available on the WAN and LAN configuration pages in the pfSense Setup Wizard. Some of the options under the General Configuration section, that are not available in the setup wizard, are MSS (Maximum Segment Size), and Speed and duplex. Normally, MSS should remain unchanged, although you can change this setting if your Internet connection requires it. If you click on the Advanced button under Speed and duplex, a drop-down box will appear in which you can explicitly set the speed and duplex for the interface. Since virtually all modern network hardware has the capability of automatically selecting the correct speed and duplex, you will probably want to leave this unchanged. If you have selected DHCP as the configuration type, then there are several options in addition to the ones available in the setup wizard. Alias IPv4 address allows you to enter a fixed IP address for the DHCP client. The Reject Leases from field allows you to specify the IP address or subnet of an upstream DHCP server to be ignored. Clicking on the Advanced checkbox in the DHCP client configuration causes several additional options to appear in this section of the page. The first is Protocol Timing, which allows you to control DHCP protocol timings when requesting a lease. You can also choose several presets (FreeBSD, pfSense, Clear, or Saved Cfg) using the radio buttons on the right. The next option in this section is Lease Requirements and Requests. Here you can specify send, request, and require options when requesting a DHCP lease. These options are useful if your ISP requires these options. The last section is Option Modifiers, where you can add DHCP option modifiers, which are applied to an obtained DHCP lease. There is a second checkbox at the top of this section called Config File Override. Checking this box allows you to enter a DHCP client configuration file. If you use this option, you must specify the full absolute path of the file. Starting with pfSense version 2.2.5, there is support for IPv6 with DHCP (DHCP6). If you are running 2.2.5 or above, there will be a section on the page called DHCP6 client configuration. The first setting is Use IPv4 connectivity as parent interface. This allows you to request an IPv6 address over IPv4. The second is Request only an IPv6 prefix. This is useful if your ISP supports Stateless Address Auto Configuration(SLAAC). In this case, instead of the usual procedure in which the DHCP server assigns an IP address to the client, the DHCP server only sends a prefix, and the host may generate its own IP address and test the uniqueness of a generated address in the intended addressing scope. By default, the IPv6 prefix is 64 bits, but you can change that by altering the DHCPv6 Prefix Delegation size in the corresponding drop-down box. The last setting is the Send IPv6 prefix hint, which allows you to request the specified prefix size from your ISP. The advanced DHCP6 client configuration section of the interface configuration page. This section appears if DHCP6 is selected as the IPv6 configuration type. Checking the Advanced checkbox in the heading of this section displays the advanced DHCP 6 options. If you check the Information Only checkbox on the left, pfSense will send requests for stateless DHCPv6 information. You can specify send and request options, just as you can for IPv4. There is also a Script field where you can enter the absolute path to a script that will be invoked on certain conditions. The next options are for the Identity Association Statement checkboxes. The Non-Temporary Address Allocation checkbox results in normal, that is, not temporary, IPv6 addresses to be allocated for the interface. The Prefix Delegation checkbox causes a set of IPv6 prefixes to be allocated from the DHCP server. The next set of options, Authentication Statement, allow you to specify authentication parameters to the DHCP server. The Authname parameter allows you to specify a string, which in turn specifies a set of parameters. The remaining parameters are of limited usefulness in configuring a DHCP6 client, because each has only one allowed value, and leaving them blank will result in only the allowed value being used. If you are curious as to what these values are, here they are: Parameter Allowed value Description Protocol Delayed The DHCPv6 delayed authentication protocol Algorithm hmac-md5, HMAC-MD5, hmacmd5, or HMACMD5 The HMAC-MD5 authentication algorithm rdm Monocounter The replay protection method; only monocounter is available Finally, Key info Statement allows you to enter a secret key. The required fields are key id, which identifies the key, and secret, which provides the shared secret. key name and realm are arbitrary strings and may be omitted. expire may be used to specify an expiration time for the key, but if it is omitted, the key will never expire. The last section on the page is identical to the interface configuration page in the Setup Wizard, and contains the Block Private Networks and Block Bogon Networks checkboxes. Normally, these are checked for WAN interfaces, but not for other interfaces. General setup options You can find several configuration options under System | General Setup. Most of these are identical to settings that can be configured in the Setup Wizard (Hostname, Domain, DNS servers, Timezone, and NTP server). There are two additional settings available. The Language drop-down box allows you to select the web configurator language. Under the Web Configurator section, there is a Theme drop-down box that allows you to select the theme. The default theme of pfSense is perfectly adequate, but you can select another one here. pfSense 2.3 also adds new options to control the look and feel of the web interface; these settings are also found in the Web Configurator section of the General Settings page. The Top Navigation drop-down box allows you to choose whether the top navigation scrolls with the page, or remains anchored at the top as you scroll. The Dashboard Columns option allows you to select the number of columns on the dashboard page (the default is 2). The next set of options is Associated Panels Show/Hide. These options control the appearance of certain panels on the Dashboard and System Logs page. The options are: Available Widgets: Checking this box causes the Available Widgets panel to appear on the Dashboard. Prior to version 2.3, the Available Widgets panel was always visible on the Dashboard. Log Filter: Checking this box causes the Advanced Log Filter panel to appear on the System Logs page. Advanced Log Filter allows you to filter the system logs by time, process, PID and message. Manage Log: Checking this box causes the Manage General Log panel to appear on the System Logs page. The Manage General Log panel allows you to control the display of the logs, how big the log file may be, and the formatting of the log file, among other things. The last option on this page, Left Column Labels, allows you to select/toggle the first item in a group by clicking on the left column if checked. Click on Save at the bottom of the page to save any changes. Advanced setup options Under System | Advanced, there are a number of options that you will probably want to configure before completing the initial setup. There are six separate tabs here, all with multiple options, and we won't cover all of them here, but we will cover the more common ones. The first setting allows you to choose between HTTP and HTTPS for the web configurator. If you plan on making the pfSense web GUI accessible from the WAN side, you will definitely want to choose HTTPS in order to encrypt access to the web GUI. Even if the web GUI will only be accessible over local networks, you probably will want to choose HTTPS. Modern web browsers will complain about the SSL certificate the first time you access the web GUI, but most of them will allow you to create an exception. The next setting, SSL certificate, allows you to choose a certificate from a drop-down list of available certificates. You can choose web Configurator default, or you can add another certificate (by navigating to System | Cert Manager and adding one), and use it instead. The next important setting, also in the Web Configurator section, is the Disable web Configurator anti-lockout rule. If left unchecked, access to the web GUI is always allowed on the LAN (or WAN if the LAN interface has not been assigned), regardless of any user-defined firewall rules. If you check this option and you don't have a user-defined rule to allow access to pfSense, you will lock yourself out of the web GUI. If you are locked out of the web GUI because of firewall rules, there are several options. The easiest option is probably to restore a previous configuration from the console. You can also reset pfSense to factory defaults, but if you don't mind typing in shell commands, there are less drastic options. One is to add an allow all rule on the WAN interface by typing the following command at the console shell prompt (type 8 at the console menu to invoke the shell): pfSsh.php playback enableallowallwan Once you issue this command, you will be able to access the web GUI through the WAN interface. To do so, either connect the WAN port to a network running DHCP (if the WAN uses DHCP), or connect the WAN port to another computer with an IP on the same network (if the WAN has a static IP). Be sure to delete the WAN allow all rule before deploying the system. Another possibility is to temporarily disable the firewall rules with the following shell command: pfctl –d Once you have regained access, you can re-enable the firewall rules with this command: pfctl -e In any case, you want to make sure your firewall rules are configured correctly before invoking the anti-lockout option. You can reset pfSense to factory defaults by selecting 4 from the console menu. If you need to go back to a previous configuration, you can do that by selecting 15 from the console menu; this option will allow you to select from automatically-saved restore points. The next section is Secure Shell; checking the Enable Secure Shell checkbox makes the console accessible via a Secure Shell (SSH) connection. This makes life easier for admins, but it also creates a security concern. Therefore, it is a good idea to change the default SSH port (the default is 22), which you can do in this section. You can add another layer of security by checking the Disable password login for the Secure Shell checkbox. If you invoke this option, you must create authorized SSH keys for each user that requires SSH access. The process for generating SSH keys is different depending on your OS. Under Linux, it is fairly simple. First, enter the following at the command prompt: ssh-keygen –t rsa You will receive the following prompt: Enter file in which to save the key (/home/user/.ssh/id-rsa): The directory in parenthesis will be a subdirectory of your home directory. You can change the directory or press Enter. The next prompt asks you for a passphrase: Enter passphrase (empty for no passphrase): You can enter a passphrase here or just press Enter. You will be prompted to enter the passphrase again, and then the public/private key pair will be generated. The public key will now be saved in a file called id-rsa.pub. Entering SSH keys for a user in the user manager. The next step is adding the newly generated public key to the admin account in pfSense. Open the file in the text editor of your choice and in the web GUI, select the public key and copy it to the clipboard. Then navigate to System | User Manager and click on the Edit user icon for the appropriate user. Scroll down to the Keys section and paste the key into the Authorized SSH keys box. Then click on Save at the bottom of the page. You should now be able to SSH into the admin account without entering the password. Type the following at the command line: ssh pfsense_address –ladmin Here pfsense_address is the IP address of the pfSense system. If you specified a passphrase earlier, you will be prompted to enter it in order to unlock the private key. You will not be prompted for the passphrase on subsequent logins. Once you unlock the private key, you should be logged into the console. The last section of the page, Console options, gives you one more layer of security by allowing you to require a password for console login. Check this checkbox if you want to enable this option, although this could result in being locked out if you forget the password. If this happens, you may still be able to restore access by booting from the live CD and doing a pre-flight install, described in a subsequent section. The next tab, Firewall/NAT, contains a number of important settings relating to pfSense's firewall functionality. Firewall Optimization Options allows you to select the optimization algorithm for the state table. The Normal option is designed for average case scenario network usage. High latency, as the name implies, is for connections in which it is expected that there will be a significant delay between a request and response (a satellite connection is a good example). Aggressive and Conservative are inverses of each other. Aggressive is more aggressive than Normal in dropping idle connections, while Conservative will leave idle connections open longer than Normal would. Obviously, the trade-off here is that if we expire idle connections too soon, legitimate connections may be dropped, while keeping them open too long will be costly from a resource (CPU and memory) standpoint. In the Firewall Advanced section, there is a Disable all packet filtering checkbox. Enabling this option disables all firewall functionality, including NAT. This should be used with caution, but may be useful in troubleshooting. The Firewall maximum settings and Firewall maximum table entries options allow you to specify the maximum number of connections and maximum number of table entries respectively to hold in the system state table. If you leave these entries blank, pfSense will assign reasonable defaults based on the amount of memory your system has. Since increasing the maximum number of connections and/or state table entries will leave less memory for everything else, you will want to invoke these options with caution. The static route filtering checkbox, if checked, will result in firewall rules not taking effect for traffic that enters and leaves through the same interface. This can be useful if you have a static route in which traffic enters pfSense through an interface, but the source of the traffic is not the same as the interface on which it enters. This option does not apply to traffic whose source and destination is the same interface – such traffic is intra-network traffic, and firewall rules would not apply to it whether or not this option was invoked. The next section of the page, Bogon Networks, allows you to select the update frequency of the list of addresses reserved, or not yet assigned, by IANA. If someone is trying to access your network from a newly-assigned IP address, but the Bogon networks list has not yet been updated, they may find themselves blocked. If this is happening on a frequent basis, you may want to change the update frequency. The next tab, Networking, contains a number of IPv6 options. The Allow IPv6 checkbox must be checked in order for IPv6 traffic to pass (it is checked by default). The next option, IPv6 over IPv4 Tunneling, allows you to enable the transitional IPv6 over IPv4. There is also an option called Prefer IPv4 even when IPv6 is available, which will cause IPv4 to be used in cases where a hostname resolves both IPv4 and IPv6 addresses. The next tab is called Miscellaneous. The Proxy Port section allows you to specify URL for a remote proxy server, as well as the proxy port as well as a username and password. The following section, Load Balancing, has two settings. The first setting, Use sticky connections, causes successive connections from the same source to be connected to the same server, instead of directing them to the next web server in the pool, which would be the normal behavior. The timeout period for sticky connections may be adjusted in the adjacent Edit box. The default is 0, so the sticky connection expires as soon as the last connection from the source expires. The second setting, Enable default gateway switching, switches from the default gateway to another available one when the default gateway goes down. This is not necessary in most cases, since it is easier to incorporate redundancy into gateways with gateway groups. The Scheduling section has only one option, but it has significance if you use rule scheduling. Checking the Do not kill connections when schedule expires checkbox will cause connections permitted by the rule to survive even after the time period specified by the schedule expires. Otherwise, pfSense will kill all existing connections when a schedule expires. Upgrading, backing up, and restoring pfSense You can usually upgrade pfSense from one version to another, although the means of upgrading may differ depending on what platform you are using. So long as the firmware is moving from an older version to a newer version, pfSense will work unless otherwise noted. Before you make any changes, you should make an up-to-date backup. In the web GUI, you can back up the configuration by navigating to Diagnostics | Backup/Restore. In the Backup Configuration section of the page, set Backup Area to ALL. Then click on Download Configuration and save the file. Before you upgrade pfSense, it is a good idea to have a plan on how to recover in case the upgrade goes wrong. There is always a chance that an upgrade will leave pfSense in an unusable state. In these cases, it is always helpful to have a backup system available. Also, with advance planning, the firewall can be quickly returned to the previous release. There are three methods for upgrading pfSense. The first is to download the upgrade binaries from the official pfSense site. The same options are available as are available for a full install. Just download the appropriate image, write the image to the target media, and boot the system to be upgraded from the target media. For embedded systems, releases prior to 1.2.3 are not upgradable (in such cases, a full install would be the only way to upgrade), but newer NanoBSD-based embedded images do support upgrades. The second method is to upgrade from the console. From the console menu, select 13 (the Upgrade from Console option). pfSense will check the repositories to see if there is an update, and if there is, how much more disk space is required, and also inform you that upgrading will require a reboot. It will also prompt you to confirm that the upgrade should proceed. Type y and Enter, and the upgrade will proceed. pfSense will also automatically reboot 10 seconds after downloading and installing the upgrade. Rebooting may take slightly longer than it would normally, since pfSense must extract the new binaries from a tarball during the boot sequence. Upgrading pfSense from the console. The third method is the easiest way to upgrade your system: from the web GUI. Navigate to Status | Dashboard (this should also be the screen you see when initially logging into the web GUI). The System Information widget should have a section called Version, and this section should provide: The current version of pfSense Whether an update is available If an update is available, there will be a link to the firmware auto update page; click on this link. (Alternatively, you can access this page by navigating to System | Update and clicking on the System Update tab (note that on versions prior to 2.3, this menu option was called Firmware instead of Update.) If there is an update available, this page will let you know. Choosing a firmware branch from the Update Settings tab of the Update option. The Update Settings tab contains options that may be helpful in some situations. The Firmware Branch section has a drop-down box, allowing you to select either the Stable branch or Development branch. The Dashboard check checkbox allows you to disable the dashboard auto-update check. Once you are satisfied with these settings, you can click on the Confirm button on the System Update tab. The updating process will then begin, starting with the backup (if you chose that option). Upgrading can take as little as 15 minutes, especially if you are upgrading from one minor version to another. If you are upgrading in a production environment, you will want to schedule your upgrade for a suitable time (either during the weekend or after normal working hours). The web GUI will keep you informed of the status of the update process and when it is complete. Another means of updating pfSense in the web GUI is to use the manual update feature. To do so, navigate to System | Update and click on the Manual Update tab. Click on the Enable firmware upload button. When you do this, a new section should appear on the page. The Choose file button launches a file dialog box where you can specify the firmware image file. Once you select the file, click on Open to close out the file dialog box. There is a Perform full backup prior to upgrade checkbox you can check if you want to back up the system, and also an Upgrade firmware button that will start the upgrade process. If the update is successful, the System Information widget on the Dashboard should indicate that you are on the current version of pfSense (or the version to which you upgraded, if you invoked the manual update). If something went wrong and pfSense is not functioning properly, and you made a backup prior to updating, you can restore the old version. Available methods of backing up and restoring pfSense are outlined in the next section. Backing up and restoring pfSense The following screenshot shows the options related to backing up and restoring pfSense: Backup and restore options in pfSense 2.3. You can back up and restore the config.xml file from the web GUI by navigating to Diagnostics | Backup/Restore. The first section, Backup configuration, allows you to back up some or all of the configuration data. There is a drop-down box which allows you to select which areas to backup. There are checkbox options such as do not backup package information, and Encrypt this configuration file. The final checkbox, selected by default, allows you to disable the backup of round robin database (RRD) data, real-time traffic data which you likely will not want to save. The Download Configuration as XML button allows you to save config.xml to a local drive. Restoring the configuration is just as easy. In the Restore configuration section of the page, select the area to restore from the drop-down box and browse to the file by clicking on the Choose File button. Specify whether config.xml is encrypted with the corresponding checkbox, and then click the Restore configuration button. Restoring a configuration with Pre-Flight Install You may find it is necessary to restore an old pfSense configuration. Moreover, it is possible that restoring an old configuration from the console or web GUI as described previously in this article is not possible. In these cases, there is one more possible way of restoring an old configuration, and that is with a Pre-Flight Install (PFI), A PFI essentially involves the following: Copying a backup config.xml file into a directory called conf on a DOS/FAT formatted USB drive. Plugging the USB drive into the system whose configuration is to be restored, and then booting off the Live CD. Installing pfSense from the CD onto the target system. Rebooting the system, and allowing pfSense to boot (off the target media, not the CD). The configuration should now be restored. Another option that is useful if you want to retain your configuration while reinstalling pfSense is to choose the menu option Rescue config.xml during the installation process. This allows you to select and load a configuration file from any storage media attached to the system. Summary The goal of this article was to provide an overview of how to get pfSense up and running. Completion of this article should give you an idea of where to deploy your pfSense system as well as what hardware to utilize. You should also know how to troubleshoot the most common installation problems, and how to do basic system configuration and interface setup for both IPv4 and IPv6 networks. You should know how to configure pfSense for remote access. Finally, you should know how to upgrade, backup, and restore pfSense. Resources for Article: Further resources on this subject: Configuring the essential networking services provided by pfSense [article] pfSense: Configuring NAT and Firewall Rules [article] Upgrading a Home Network to a Small Business System Using pfSense [article]
Read more
  • 0
  • 0
  • 9379

article-image-creating-and-consuming-web-services-cakephp-13
Packt
10 Mar 2011
7 min read
Save for later

Creating and Consuming Web Services in CakePHP 1.3

Packt
10 Mar 2011
7 min read
CakePHP 1.3 Application Development Cookbook Over 70 great recipes for developing, maintaining, and deploying web applications     Creating an RSS feed RSS feeds are a form of web services, as they provide a service, over the web, using a known format to expose data. Due to their simplicity, they are a great way to introduce us to the world of web services, particularly as CakePHP offers a built in method to create them. In this recipe, we will produce a feed for our site that can be used by other applications. Getting ready To go through this recipe we need a sample table to work with. Create a table named posts, using the following SQL statement: CREATE TABLE `posts`(posts `id` INT NOT NULL AUTO_INCREMENT, `title` VARCHAR(255) NOT NULL, `body` TEXT NOT NULL, `created` DATETIME NOT NULL, `modified` DATETIME NOT NULL, PRIMARY KEY(`id`) ); Add some sample data, using the following SQL statements: INSERT INTO `posts`(`title`,posts `body`, `created`, `modified`) VALUES ('Understanding Containable', 'Post body', NOW(), NOW()), ('Creating your first test case', 'Post body', NOW(), NOW()), ('Using bake to start an application', 'Post body', NOW(), NOW()), ('Creating your first helper', 'Post body', NOW(), NOW()), ('Adding indexes', 'Post body', NOW(), NOW()); We proceed now to create the required controller. Create the class PostsController in a file named posts_controller.php and place it in your app/controllers folder, with the following contents: <?php class PostsController extends AppController { public function index() { $posts = $this->Post->find('all'); $this->set(compact('posts')); } } ?> Create a folder named posts in your app/views folder, and then create the index view in a file named index.ctp and place it in your app/views/posts folder, with the following contents: <h1>Posts</h1> <?php if (!empty($posts)) { ?> <ul> <?php foreach($posts as $post) { ?> <li><?php echo $this->Html->link( $post['Post']['title'], array( 'action'=>'view', $post['Post']['id'] ) ); ?></li> <?php } ?> </ul> <?php } ?> How to do it... Edit your app/config/routes.php file and add the following statement at the end: Router::parseExtensions('rss'); Edit your app/controllers/posts_controller.php file and add the following property to the PostsController class: public $components = array('RequestHandler'); While still editing PostsController, make the following changes to the index() method: public function index() { $options = array(); if ($this->RequestHandler->isRss()) { $options = array_merge($options, array( 'order' => array('Post.created' => 'desc'), 'limit' => 5 )); } $posts = $this->Post->find('all', $options); $this->set(compact('posts')); } Create a folder named rss in your app/views/posts folder, and inside the rss folder create a file named index.ctp, with the following contents: <?php $this->set('channel', array( 'title' => 'Recent posts', 'link' => $this->Rss->url('/', true), 'description' => 'Latest posts in my site' )); $items = array(); foreach($posts as $post) { $items[] = array( 'title' => $post['Post']['title'], 'link' => array('action'=>'view', $post['Post']['id']), 'description' => array('cdata'=>true, 'value'=>$post['Post'] ['body']), 'pubDate' => $post['Post']['created'] ); } echo $this->Rss->items($items); ?> Edit your app/views/posts/index.ctp file and add the following at the end of the view: <?php echo $this->Html->link('Feed', array('action'=>'index', 'ext'=>'rss')); ?> If you now browse to http://localhost/posts, you should see a listing of posts with a link entitled Feed. Clicking on this link should produce a valid RSS feed, as shown in the following screenshot: If you view the source of the generated response, you can see that the source for the first item within the RSS document is: <item> <title>Understanding Containable</title> <link>http://rss.cookbook7.kramer/posts/view/1</link> <description><![CDATA[Post body]]></description> <pubDate>Fri, 20 Aug 2010 18:55:47 -0300</pubDate> <guid>http://rss.cookbook7.kramer/posts/view/1</guid> </item> How it works... We started by telling CakePHP that our application accepts the rss extension with a call to Router::parseExtensions(), a method that accepts any number of extensions. Using extensions, we can create different versions of the same view. For example, if we wanted to accept both rss and xml as extensions, we would do: Router::parseExtensions('rss', 'xml'); In our recipe, we added rss to the list of valid extensions. That way, if an action is accessed using that extension, for example, by using the URL http://localhost/posts.rss, then CakePHP will identify rss as a valid extension, and will execute the ArticlesController::index() action as it normally would, but using the app/views/posts/rss/index.ctp file to render the view. The process also uses the file app/views/layouts/rss/default.ctp as its layout, or CakePHP's default RSS layout if that file is not present. We then modify how ArticlesController::index() builds the list of posts, and use the RequestHandler component to see if the current request uses the rss extension. If so, we use that knowledge to change the number and order of posts. In the app/views/posts/rss/index.ctp view, we start by setting some view variables. Because a controller view is always rendered before the layout, we can add or change view variables from the view file, and have them available in the layout. CakePHP's default RSS layout uses a $channel view variable to describe the RSS feed. Using that variable, we set our feed's title, link, and description. We proceed to output the actual item files. There are different ways to do so, the first one is making a call to the RssHelper::item() method for each item, and the other one requires only a call to RssHelper::items(), passing it an array of items. We chose the latter method due to its simplicity. While we build the array of items to be included in the feed, we only specify title, link, description, and pubDate. Looking at the generated XML source for the item, we can infer that the RssHelper used our value for the link element as the value for the guid (globally unique identifier) element. Note that the description field is specified slightly differently than the values for the other fields in our item array. This is because our description may contain HTML code, so we want to make sure that the generated document is still a valid XML document. By using the array notation for the description field, a notation that uses the value index to specify the actual value on the field, and by setting cdata to true, we are telling the RssHelper (actually the XmlHelper from which RssHelper descends) that the field should be wrapped in a section that should not be parsed as part of the XML document, denoted between a <![CDATA[ prefix and a ]]> postfix. The final task in this recipe is adding a link to our feed that is shown in the index.ctp view file. While creating this link, we set the special ext URL setting to rss. This sets the extension for the generated link, which ends up being http://localhost/posts.rss.  
Read more
  • 0
  • 0
  • 9377
article-image-getting-started-apache-spark-dataframes
Packt
22 Sep 2015
5 min read
Save for later

Getting Started with Apache Spark DataFrames

Packt
22 Sep 2015
5 min read
 In this article article about Arun Manivannan’s book Scala Data Analysis Cookbook, we will cover the following recipes: Getting Apache Spark ML – a framework for large-scale machine learning Creating a data frame from CSV (For more resources related to this topic, see here.) Getting started with Apache Spark Breeze is the building block of Spark MLLib, the machine learning library for Apache Spark. In this recipe, we'll see how to bring Spark into our project (using SBT) and look at how it works internally. The code for this recipe could be found at https://github.com/arunma/ScalaDataAnalysisCookbook/blob/master/chapter1-spark-csv/build.sbt. How to do it... Pulling Spark ML into our project is just a matter of adding a few dependencies on our build.sbt file: spark-core, spark-sql, and spark-mllib: Under a brand new folder (which will be our project root), we create a new file called build.sbt. Next, let's add to the project dependencies the Spark libraries: organization := "com.packt" name := "chapter1-spark-csv" scalaVersion := "2.10.4" val sparkVersion="1.3.0" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % sparkVersion, "org.apache.spark" %% "spark-sql" % sparkVersion, "org.apache.spark" %% "spark-mllib" % sparkVersion ) resolvers ++= Seq( "Apache HBase" at "https://repository.apache.org/content/repositories/releases", "Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/" ) How it works... Spark has four major higher level tools built on top of the Spark Core: Spark Streaming, Spark ML Lib (Machine Learning), Spark SQL (An SQL interface for accessing data), and GraphX (for graph processing). The Spark Core is the heart of Spark, providing higher level abstractions in various languages for data representation, serialization, scheduling, metrics, and so on. For this recipe, we skipped streaming and GraphX and added the remaining three libraries. There’s more… Apache Spark is a cluster computing platform that claims to run about 100 times faster than Hadoop (that's a mouthful). In our terms, we could consider that as a means to run our complex logic over a massive amount of data at a blazingly high speed. The other good thing about Spark is that the programs we write are much smaller than the typical Map Reduce classes that we write for Hadoop. So, not only do our programs run faster, but it also takes lesser time to write them in the first place. Creating a data frame from CSV In this recipe, we'll look at how to create a new data frame from a Delimiter Separated Values (DSV) file. The code for this recipe could be found athttps://github.com/arunma/ScalaDataAnalysisCookbook/tree/master/chapter1-spark-csv in the DataFrameCSV class. How to do it... CSV support isn't first-class in Spark but is available through an external library from databricks. So, let's go ahead and add that up in build.sbt: After adding the spark-csv dependency, our complete build.sbt looks as follows: organization := "com.packt" name := "chapter1-spark-csv" scalaVersion := "2.10.4" val sparkVersion="1.3.0" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % sparkVersion, "org.apache.spark" %% "spark-sql" % sparkVersion, "org.apache.spark" %% "spark-mllib" % sparkVersion, "com.databricks" %% "spark-csv" % "1.0.3" ) resolvers ++= Seq( "Apache HBase" at"https://repository.apache.org/content/repositories/releases", "Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/" ) fork := true Before we create the actual data frame, there are three steps that we ought to do: create the Spark configuration, create the Spark context, and create the SQL context. SparkConf holds all of the information for running this Spark cluster. For this recipe, we are running locally, and we intend to use only two cores in the machine—local[2]: val conf = new SparkConf().setAppName("csvDataFrame").setMaster("local[2]") For this recipe, we'll be running Spark on standalone mode. Now let's load our pipe-separated file: org.apache.spark.sql.DataFrame val students=sqlContext.csvFile(filePath="StudentData.csv", useHeader=true, delimiter='|') How it works... The csvFile function of sqlContext accepts the full filePath of the file to be loaded. If the CSV has a header, then the useHeader flag will read the first row as column names. The delimiter flag, as expected, defaults to a comma, but you can override the character as needed. Instead of using the csvFile function, you can also use the load function available in the SQL context. The load function accepts the format of the file (in our case, it is CSV) and options as a map. We can specify the same parameters that we specified earlier using Map, like this: val options=Map("header"->"true", "path"->"ModifiedStudent.csv") val newStudents=sqlContext.load("com.databricks.spark.csv",options) Summary In this article, you learned in detail Apache Spark ML, a framework for large-scale machine learning. Then we saw the creation of a data frame from CSV with the help of example code. Resources for Article: Further resources on this subject: Integrating Scala, Groovy, and Flex Development with Apache Maven[article] Ridge Regression[article] Reactive Data Streams [article]
Read more
  • 0
  • 0
  • 9371

article-image-web-scraping-python
Packt
17 Feb 2010
5 min read
Save for later

Web Scraping with Python

Packt
17 Feb 2010
5 min read
To perform this task, usually three basic steps are followed: Explore the website to find out where the desired information is located in the HTML DOM tree Download as many web pages as needed Parse downloaded web pages and extract the information from the places found in the exploration step The exploration step is performed manually with the aid of some tools that make it easier to locate the information and reduce the development time in next steps. The download and parsing steps are usually performed in an iterative cycle since they are interrelated. This is because the next page to download may depend on a link or similar in the current page, so not every web page can be downloaded without previously looking into the earlier one. This article will show an example covering the three steps mentioned and how this could be done using python with some development. The code that will be displayed is guaranteed to work at the time of writing, however it should be taken into account that it may stop working in future if the presentation format changes. The reason is that web scraping depends on the DOM tree to be stable enough, that is to say, as happens with regular expressions, it will work fine for slight changes in the information being parsed. However, when the presentation format is completely changed, the web scraping scripts have to be modified to match the new DOM tree. Explore Let's say you are a fan of Pack Publishing article network and that you want to keep a list of the titles of all the articles that have been published until now and the link to them. First of all, you will need to connect to the main article network page (http://www.packtpub.com/article-network) and start exploring the web page to have an idea about where the information that you want to extract is located. Many ways are available to perform this task such as view the source code directly in your browser or download it and inspect it with your favorite editor. However, HTML pages often contain auto-generated code and are not as readable as they should be, so using a specialized tool might be quite helpful. In my opinion, the best one for this task is the Firebug add-on for the Firefox browser. With this add-on, instead of looking carefully in the code looking for some string, all you have to do is press the Inspect button, move the pointer to the area in which you are interested and click. After that, the HTML code for the area marked and the location of the tag in the DOM tree will be clearly displayed. For example, the links to the different pages containing all the articles are located inside a right tag, and, in every page, the links to the articles are contained as list items in an unnumbered list. In addition to this, the links URLs, as you probably have noticed while reading other articles, start with http://www.packtpub.com/article/ So, our scraping strategy will be Get the list of links to all pages containing articles Follow all links so as to extract the article information in all pages One small optimization here is that main article network page is the same as the one pointed by the first page link, so we will take this into account to avoid loading the same page twice when we develop the code. Download Before parsing any web page, the contents of that page must be downloaded. As usual, there are many ways to do this: Creating your own HTTP requests using urllib2 standard python library Using a more advanced library that provides the capability to navigate through a website simulating a browser such as  mechanize. In this article mechanize will be covered as it is the easiest choice. mechanize is a library that provides a Browser class that lets the developer to interact with a website in a similar way a real browser would. In particular it provides methods to open pages, follow links, change form data and submit forms. Recalling the scraping strategy in our previous version, the first thing we would like to do is to download the main article network web page. To do that we will create a Browser class instance and then open the main article network page: >>> import mechanize>>> BASE_URL = "http://www.packtpub.com/article-network">>> br = mechanize.Browser()>>> data = br.open(BASE_URL).get_data()>>> links = scrape_links(BASE_URL, data) Where the result of the open method is an HTTP response object, the get_data method returns the contents of the web page. The scrape_links function will be explained later. For now, as pointed out in the introduction section, bear in mind that the downloading and parsing steps are usually performed iteratively since some contents to be downloaded depends on the parsing done in some kind of initial contents such as in this case.
Read more
  • 0
  • 0
  • 9367

article-image-learning-d3js-mapping
Packt
08 May 2015
3 min read
Save for later

Learning D3.js Mapping

Packt
08 May 2015
3 min read
What is D3.js? D3.js (Data-Driven Documents) is a JavaScript library used for data visualization. D3.js is used to display graphical representations of information in a browser using JavaScript. Because they are, essentially, a collection of JavaScript code, graphical elements produced by D3.js can react to changes made on the client or server side. D3.js has seen major adoption as websites include more dynamic charts, graphs, infographics and other forms of visualized data. (For more resources related to this topic, see here.) Why this book? This book by the authors, Thomas Newton and Oscar Villarreal, explores the JavaScript library, D3.js, and its ability to help us create maps and amazing visualizations. You will no longer be confined to third-party tools in order to get a nice looking map. With D3.js, you can build your own maps and customize them as you please. This book will go from the basics of SVG and JavaScript to data trimming and modification with TopoJSON. Using D3.js to glue together these three key ingredients, we will create very attractive maps that will cover many common use cases for maps, such as choropleths, data overlays on maps, and interactivity. Key features Dive into D3.js and apply its powerful data binding ability in order to create stunning visualizations Learn the key concepts of SVG, JavaScript, CSS, and the DOM in order to project images onto the browser Solve a wide range of problems faced while building interactive maps with this solution-based guide Authors Thomas Newton has 20 years of experience in the technical industry, working on everything from low-level system designs and data visualization to software design and architecture. Currently, he is creating data visualizations to solve analytical problems for clients. Oscar Villarreal has been developing interfaces for the past 10 years, and most recently, he has been focusing on data visualization and web applications. In his spare time, he loves to write on his blog, oscarvillarreal.com. In short You will find a few books on D3.js but they require intermediate-level developers who already know how to use D3.js, a few of them will cover a much wider range of D3 usage, whereas this book is focused exclusively on mapping; it fully explores this core task, and takes a solution-based approach. Recommendations and all the wealthy knowledge that authors have shared in the book is based on many years of experience and many projects delivered using D3. What this book covers Learn all the tools you need to create a map using D3 A high-level overview of Scalable Vector Graphics (SVG) presentation with explanation on how it operates and what elements it encompasses Exploring D3.js—producing graphics from data Step-by-step guide to build a map with D3 Get you started with interactivity in your D3 map visualizations Most important aspects of map visualization in detail via the use of TopoJSON Assistance for the long-term maintenance of your D3 code base and different techniques to keep it healthy over the lifespan of your project Summary So far, you may have got an idea of what all be covered in the book. This book is carefully designed to allow the reader to jump between chapters based on what they are planning to get out of the book. Every chapter is full of pragmatic examples that can easily provide the foundation to more complex work. Authors have explained, step by step, how each example works. Resources for Article: Further resources on this subject: Using Canvas and D3 [article] Interacting with your Visualization [article] Simple graphs with d3.js [article]
Read more
  • 0
  • 0
  • 9366
article-image-spotfire-architecture-overview
Packt
21 Oct 2013
6 min read
Save for later

The Spotfire Architecture Overview

Packt
21 Oct 2013
6 min read
(For more resources related to this topic, see here.) The companies of today face innumerable market challenges due to the ferocious competition of a globalized economy. Hence, providing excellent service and having customer loyalty are the priorities for their survival. In order to best achieve both goals and have a competitive edge, companies can resort to the information generated by their digitalized systems, their IT. All the recorded events from Human Resources (HR) to Customer Relationship Management (CRM), Billing, and so on, can be leveraged to better understand the health of a business. The purpose of this article is to present a tool that behaves as a digital event analysis enabler, the TIBCO Spotfire platform. In this article, we will list the main characteristics of this platform, while also presenting its architecture and describing its components. TIBCO Spotfire Spotfire is a visual analytics and business intelligence platform from TIBCO software. It is a part of new breed of tools created to bridge the gap between the massive amount of data that the corporations produce today and the business users who need to interpret this data in order to have the best foundation for the decisions they make. In my opinion, there is no better description of what TIBCO Spotfire delivers than the definition of visual analytics made in the publication named Illuminating the Path: The Research and Development Agenda for Visual Analytics. "Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces". –Illuminating the Path: The Research and Development Agenda for Visual Analytics, James J. Thomas and Kristin A. Cook, IEEE Computer Society Press The TIBCO Spotfire platform offers the possibility of creating very powerful (yet easy to interpret and interact) data visualizations. From real-time Business Activity Monitoring (BAM) to Big Data, data-based decision making becomes easy – the what, the why, and the how becomes evident. Spotfire definitely allowed TIBCO to establish itself in an area where until recently it had very little experience and no sought-after products. The main features of this platform are: More than 30 different data sources to choose from: Several databases (including Big Data Teradata), web services, files, and legacy applications. Big Data analysis: Spotfire delivers the power of MapReduce to regular users. Database analysis: Data visualizations can be built on top of databases using information links. There is no need to pull the analyzed data into the platform, as a live link is maintained between Spotfire and the database. Visual join: Capability of merging data from several distinct sources into a single visualization. Rule-based visualizations: The platform enables the creation and tailoring of rules, and the filtering of data. These features facilitate emphasizing of outliers and foster management by exception. It is also possible to highlight other important features, such as commonalities and anomalies. Data drill-down: For data visualizations it is possible to create one (or many) details visualization(s). This drill-down can be performed in multiple steps as drill-down of drill-down of drill-down and so on. Real-time integration with other TIBCO tools. Spotfire platform 5.x The platform is composed of several intercommunicating components, each one with its own responsibilities (with clear separation of concerns) enabling a clustered deployment. As this is an introductory article, we will not dive deep into all the components, but we will identify the main ones and the underlying architecture. A depiction the platform's components is shown in the following diagram: The descriptions of each of the components in the preceding diagram are as follows: TIBCO Spotfire Server: The Spotfire server makes a set of services available to the analytics clients (TIBCO Spotfire Professional and TIBCO Spotfire Web Player Server): User services: It is responsible for authentication and authorization Deployment services: It handles the consistent upgrade of Spotfire clients Library services: It manages the repository of analysis files Information services: It persists information links to external data sources The Server component is available for several operating systems, such as Linux, Solaris, and Windows. TIBCO Spotfire Professional: This is a client application (fat client) that focuses on the creation of data visualizations, taking advantage of all of the platform's features. This is the main client application, and because of that, it has enabled all the data manipulation functionalities such as use of data filters, drill down, working online and offline (working offline allows embedding data in the visualizations for use in limited connectivity environments), and exporting visualizations to MS PowerPoint, PDF, and HTML. It is only available for Windows environment. TIBCO Spotfire Web Player Server: This offers users the possibility of accessing and interacting with visualizations created in TIBCO Spotfire Professional. The existence of this web application enables the usage of an Internet browser as a client, allowing for thin client access where no software has to be installed on the user's machine. Please be aware that the visualizations cannot be created or altered this way. They can only be accessed in a read-only mode, where all rules are enabled, as well as data is drill down. Since it is developed in ASP.NET, this server must be deployed in a Microsoft IIS server, and so it is restricted to Microsoft Windows environments. Server Database: This database is accessed by the TIBCO Spotfire Server for storage of server information. It should not be confused with the data stores that the platform can access to fetch data from, and build visualizations. Only two vendor databases are supported for this role: Oracle Database and Microsoft SQL Server. TIBCO Spotfire Web Player Client: These are thin clients to the Web Player Server. Several Internet browsers can be used on various operating systems (Microsoft Internet Explorer on Windows, Mozilla Firefox on Windows and Mac OS, Google Chrome on Windows and Android, and so on). TIBCO has also made available an application for iPad, which is available in iTunes. For more details on the iPad client application, please navigate to: https://itunes.apple.com/en/app/spotfire-analytics/id417436823?mt=8 Summary In this article, we introduced the main attributes of the Spotfire platform in the scope of visual analytics, and we detailed the platform's underlying architecture. Resources for Article: Further resources on this subject: Core Data iOS: Designing a Data Model and Building Data Objects [Article] Database/Data Model Round-Trip Engineering with MySQL [Article] Drilling Back to Source Data in Dynamics GP 2013 using Dashboards [Article]
Read more
  • 0
  • 0
  • 9365

article-image-page-events
Packt
02 Jan 2013
4 min read
Save for later

Page Events

Packt
02 Jan 2013
4 min read
(For more resources related to this topic, see here.) Page initialization events The jQuery Mobile framework provides the page plugin which automatically handles page initialization events. The pagebeforecreate event is fired before the page is created. The pagecreate event is fired after the page is created but before the widgets are initialized. The pageinit event is fired after the complete initialization. This recipe shows you how to use these events. Getting ready Copy the full code of this recipe from the code/08/pageinit sources folder. You can launch this code using the URL http://localhost:8080/08/pageinit/main.html How to do it... Carry out the following steps: Create main.html with three empty <div> tags as follows: <div id="content" data-role="content"> <div id="div1"></div> <div id="div2"></div> <div id="div3"></div> </div> Add the following script to the <head> section to handle the pagebeforecreate event : var str = "<a href='#' data-role='button'>Link</a>"; $("#main").live("pagebeforecreate", function(event) { $("#div1").html("<p>DIV1 :</p>"+str); }); Next, handle the pagecreate event : $("#main").live("pagecreate", function(event) { $("#div1").find("a").attr("data-icon", "star"); }); Finally, handle the pageinit event : $("#main").live("pageinit", function(event) { $("#div2").html("<p>DIV 2 :</p>"+str); $("#div3").html("<p>DIV 3 :</p>"+str); $("#div3").find("a").buttonMarkup({"icon": "star"}); }); How it works... In main.html, add three empty divs to the page content as shown. Add the given script to the page. In the script, str is an HTML string for creating an anchor link with the data-role="button" attribute. Add the callback for the pagebeforecreate event , and set str to the div1 container. Since the page was not yet created, the button in div1 is automatically initialized and enhanced as seen in the following image. Add the callback for the pagecreate event . Select the previous anchor button in div1 using the jQuery find() method, and set its data-icon attribute. Since this change was made after page initialization but before the button was initialized, the star icon is automatically shown for the div1 button as shown in the following screenshot. Finally, add the callback for the pageinit event and add str to both the div2 and div3 containers. At this point, the page and widgets are already initialized and enhanced. Adding an anchor link will now show it only as a native link without any enhancement for div2, as shown in the following screenshot. But, for div3, find the anchor link and manually call the buttonmarkup method on the button plugin, and set its icon to star. Now when you load the page, the link in div3 gets enhanced as follows:     There's more... You can trigger "create" or "refresh" on the plugins to let the jQuery Mobile framework enhance the dynamic changes done to the page or the widgets after initialization. Page initialization events fire only once The page initialization events fire only once. So this is a good place to make any specific initializations or to add your custom controls. Do not use $(document).ready() The $(document).ready() handler only works when the first page is loaded or when the DOM is ready for the first time. If you load a page via Ajax, then the ready() function is not triggered. Whereas, the pageinit event is triggered whenever a page is created or loaded and initialized. So, this is the best place to do post initialization activities in your app. $(document).bind("pageinit", callback() {...});</p>  
Read more
  • 0
  • 0
  • 9361
Modal Close icon
Modal Close icon