Network Enhancers - "Delivering Beyond Boundaries" Headline Animator

Wednesday, August 23, 2017

How to set up an all open-source IT infrastructure from scratch

Courtesy - Bryan



Hypothetical: You need to set up the IT infrastructure (email, file sharing, etc.) for a new company. No restrictions. No legacy application support necessary. How would you do it? What would that ideal IT infrastructure look like?

I decided to sit down and think of my ideal setup — based on quite a few years of being a vice president of engineering at various companies — and document them here. Maybe you’ll find my choices useful; maybe you’ll think I’m crazy. Either way, these are good things to consider for any organization. 
Run services on your own servers 

The first thing I’m going to decide on, right up front, is to self-host as many services as I possibly can. 

Sure, there are noteworthy benefits to paying another company host and maintain your services for you (primary that you don’t have to have someone on staff to perform that function) but the drawbacks far outweigh the good points. 


Having full control over your own data — how it is stored and who it is shared with — is critical to any business (and any individual). Most of the choices I make below would also work as a remotely hosted option. But, where possible, I will focus on them being self-hosted. 

Some of the following functionality can be hosted on a single server, but I recommend breaking out key services to run on dedicated servers — possibly many, depending on your particular needs (such as an expectation of large file repositories) or large numbers of employees. 

Only open-source software 

For security and customization reasons, I will be opting to utilize only open-source and free software here. There are simply far too many drawbacks to basing a corporate infrastructure on closed source systems. 

This decision was easy and obvious for anyone who’s worked in IT for more than a few years. 

Kolab for email and calendaring 

For email, calendaring and general groupware functionality (meeting requests and the like) I opt to go with Kolab. It’s open source, and there’s a company behind it that will provide paid support as needed or desired. 

Kolab has a great web interface for all of the key functionality, but it will work just as well with almost any email and calendar clients in existence.

Owncloud or Nextcloud for file sharing/document collaboration

Since we’ll be going all open source, file sharing (and online file storage) options such as Dropbox and Google Drive are simply not an option. 

There are some features along these lines built into Kolab but not quite enough. I’d like something a little more powerful and extensible, which means running either Owncloud or Nextcloud

The two systems are very similar in many respects — not surprising because Nextcloud is forked from and run by the founder of Owncloud. Both will, in all reality, meet most file sharing/storage needs quite well. 

However, OwnCloud does contain some closed source bits focusing on larger organizations. On the flipside, NextCloud has made a public commitment to offer all features as 100% free and open source software. With that in mind, I would opt to go with NextCloud. 

As an added bonus, NextCloud handles document collaboration quite well via Collabora Online. Two birds, one stone. 

Matrix for instant messaging 

No. Using Google Hangouts is not a reasonable option for your company’s instant messaging. Neither is Skype. We need something that a) can be hosted in house, b) is open source, and c) is as secure and private as possible. 

I’ve opted to go with Matrix. Not only does it check all of those three key criteria, but it has two rather interesting features that, while may not be used, are nice to have around as options: 
  • A decentralized design. Meaning that, as the organization grows, new server instances could be added, say, for different parts of the company or different locales. 
  • The ability to bridge Matrix to other services, such as IRC, Slack, etc. This can make it much easier to integrate with external teams or communities.

Again. Maybe those your organization will never use those two features, but having them around doesn’t hurt.

Bonus points: Matrix handles video chats. Got a big, remote team? If everyone’s on Matrix, there’s no need for company-issued cell phones (or land lines). 

Linux-based OS and software for workstations

Not choosing Microsoft Windows is the first obvious decision here. The cost is to high (both in terms of up-front monetary investment and recurring costs associated with securing a closed platform). MacOS is, for the same reason, off the table. 

What specific platform I chose, at that point, comes down to what my specific needs are within the organization. Chance are I would select a Linux-based platform (either a free Linux distribution – Debian, openSUSE, Fedora, etc. – or a similar system with paid support). Support is the main reason to consider a paid, closed system anyway, so might as well get all the benefits with none of the drawbacks of a system like Windows.

Save money, increase security. No brainer. 

For applications, I’d also standardize around LibreOffice for the office suite and one of the several open-source web browsers (such as Firefox).

In short: All open source 

Clearly an all open-source workplace makes the most sense. Save money. Be more secure. More flexible. Those are all good things.

If you’re reading this and you are responsible for making IT decisions within your company, remember all of these when it comes time to renew your Microsoft Exchange license. Or it’s time to upgrade Windows. Or pay for yet another month/quarter of your video conferencing and file storage system.

Maybe my specific choices here won’t match your needs exactly, but for most of you, there are going to be open-source solutions that will.




Tuesday, August 22, 2017

Ethereum Versus NEM - The Obvious Choice

                               


Lots of new projects and startups are coming into the crypto space. Right now, more than 10 ICOs are running at the same time, and every day we see more and more coming.


It is very important to focus on the planification and procedures of everything related to blockchain technology when we want to start a new project. The money and confidence of thousands of investors is one of the most difficult things to achieve in this early stage, and The DAO incident is still present in the minds of every single member of the ecosystem.


Every entrepreneur in the world knows how difficult is to run a startup, and one of the most important things to have in mind is that the confidence of the clients is really difficult to achieve and really easy is to lose, and this behavior is even more accentuated in our sector.


The blockchain ecosystem has long had massive software catastrophes one after another, and this is probably one of the reasons for its slowed expansion. Because of this, we need now, more than ever, a development framework which lets us build robust applications with the highest level of security possible.


Nowadays, Ethereum is the blockchain system which provides us more flexibility to build Smart Contract based systems. It has a broad range of development tools made by and for developers, and some of its architecture and security standards have been well defined.


However, Ethereum contracts are still like legal contracts in the ancient Greek age, where humans had some “secure” and standardized contracts but when they had to write more complex contracts, lots of tricks and backdoors appeared.


This sticking point for lots of developers and entrepreneurs is one of their highest pains right now.

We can compare Ethereum with the Android OS. Android, for example, is really useful because it lets developers make mobile apps easily, but one of the tradeoffs for this is the security. Lots of phishing apps have been discovered, and lots of money has been lost because of this broad freedom to create mobile apps provided by Android.


We also spend lots of time thinking about how to promote adoption on blockchain technology. For this to happen, we probably need some more simple applications which are more secure, so we can promote its use while developing more complex systems and testing them.


Here’s where NEM kicks in. NEM was built from scratch and has been tested by many banks and big corporations. With extensive enterprise developing experience under their belts, several full-time NEM developers write and execute thousands of tests before every development cycle and release, and this gives NEM a very secure core.


NEM’s approach is to let developers use a wide range of combinable functionalities which let them build powerful applications based on a closed set of atomic operations, and opens the network to almost any technological combination thanks to its REST API. Notably, mix matching and combining Namespaces (unique domains), Mosaics (customizable assets), 2.0 multisig contracts, and three forms of messaging, allows for a wide variety of application frameworks to be built. By assigning meaning to these different functions and combining these in various ways a wide range of applications are currently under development including apps for transmitting financial value, notarizations, tracking and logistics, voting, land management, ID management, and more.


NEM's architecture makes it very simple for developers to build blockchain applications in almost every device with the same degree of decentralization and security. To give an example, the NEM NanoWallet can run on a desktop and smartphone without any problem and can be used as a boilerplate to build more complicated applications making customized apps built with it accessible on Windows, Mac, Linux, iOS, and Android.

Another critical thing for a decentralized application is scalability. Right now, Ethereum can have at most, near 15 transactions per second, while NEM can scale to hundreds of transactions per second, and has already been tested privately and independently to scale to thousands of transactions per second in the Catapult release. Catapult is currently in a closed beta and is scheduled to go live in 2017 if all goes well.

But the most important thing is the following; NEM has never had a serious security issue. The commitment of the NEM Foundation and the core developers with regards to the security and the availability of the network is of utmost importance, and this gives to the community, third-party developers, and investors a guarantee which is hard to equalize.

No one doubts on the strong potential of both technologies, and right now NEM seems to be a more stable choice for building new applications which have to support real business models and do so now. Its security and development facilities can let blockchain entrepreneurs focus on relevant problems and not in technical difficulties, and its learning curve is far and away smoother than the Ethereum.

NEM is great for lots and lots of applications that many people would want and it is easy to build on. Ethereum is better for very specialized projects but needs somebody who has a very high level of skill to do it exactly right. Whereas, in NEM, it’s already done for you, so you can't really hurt yourself if you go wrong.


Monday, August 21, 2017

What is Proof-of-Importance (POI) and Why Is It Better, and What Is Vesting?




What is POI?

Proof-of-Importance, proof-of-work, and proof-of-stake all have one thing in common. They are all algorithms, which when applied to cryptocurrency help to maintain the order in which blocks are selected. This becomes important when we start to think of things such as double spending. This is where money is spent more than once (fraudulently). For example, some currencies use verification of each transaction in the blockchain to prevent this.

To understand why NEM's idea of POI is so revolutionary, we first need to understand how POW and POS work.

POW (Proof-of-Work) was the first system to be implemented and is used by cryptocurrencies such as Bitcoin and also Scrypt coins such as Dogecoin.

In order to “earn” these cryptocurrencies, you must use your computer to mine the coin; the greater your machine’s power, the bigger chance you have of earning.

Why did they make it so that making a block was both expensive and time-consuming? As it requires large computational power to make a block, attacks on the blockchain become harder to carry out as the attacker would have to use an unfeasible amount of resources.

Note that very many cryptocurrencies (including NEM) have some sort of blockchain explorer, which allows anyone to see any transactions as well as the mining of blocks.

Blockchain technology can also be used for file sharing and proof of asset ownership, and many other things!

However, it was not long before people realised the obvious problems.

Mining (the process in which computational power is used to make new blocks), has very little use.

As technology gets better, people have to spend more money to get the latest ASICs (machines specifically for mining), meaning, even more energy is wasted.

It is pointless to try mining with a CPU. You are competing against companies with rooms of ASICs, and electricity costs mean it is a waste of time. For example, if you had a decent CPU hash rate of 0.1kH, in one week you would not even make a cent!

Another problem is that as the rich can afford expensive ASICs, they only get richer and richer. In other words, wealth is spread very unevenly, with the top 1% in Bitcoin holding 80% of all Bitcoins (starts from 2014). Many of the richest do not actively use their money, meaning that they are contributing very little to the community.

This was why the POS (Proof-of-Stake) system was introduced. It was implemented first by the well-known Peercoin cryptocurrency. Instead of conventional mining, it asks participants to prove ownership of their “stake,” or how much Peercoin they possess.





Larger and older sets of coins have a higher probability of signing the next block, and a lot of computing power is saved.

Again, however, there are problems. Richer users are more likely to sign the next block, and the more blocks they obtain, the richer they get. The problem is the same, richer users will gain wealth much faster than others.

This is where NEM comes in. Its POI system not only rewards those with a large account balance, but also takes into account how much they transact to others and who they transact with.

This means that those who actively help the economy and therefore NEM benefit, meaning the right people, are rewarded. Each user is given a trust score, the higher it is, the more chance they have of being rewarded.

The good thing is that this will mean much more even wealth distribution; anyone who contributes can gain extra XEM (the currency of the NEM network). NEM is great because it gives similar opportunities to everyone. The main aim is to empower regular people.





This rewarding is done through harvesting, a process in which a node will calculate blocks and they are added to the blockchain.

To do this you need a vested balance of 10,000 XEM.

But what is vesting?

Vesting

When a person first deposits XEM into an account, none of it will be vested. After 24 hours, 10% of the balance becomes vested. After the next 24 hours, another 10% of the remaining unvested balance is then vested. This cycle carries on as long as the XEM is kept in your wallet. If you make a transaction, both vested and unvested coins will be used so that the ratio of unvested:vested coins remains the same.

The point of this is to build up trust; you have held your coins for a while or you have a very large amount (for example, if you have 100,000 NEM, it will take only 24 hours for you to have 10,000 vested coins).





Once you have enough vested coins, you can mine either through local harvesting or delegated harvesting.

Local harvesting is much easier to setup, but has more disadvantages than delegated harvesting.





To start local harvesting, select harvested blocks from the left-hand menu, then click “Start local harvesting.”

If you are interested in setting up delegated harvesting, please go here







Cloud Native Landscape






What's the difference between XEM, BTC, and ETH?


Ever since the dawn of currency, currency was controlled by a central entity. This central entity could decide to do whatever it wanted with its currency. It could weaken it, strengthen it, take it away from you, anything. The money was only valuable because this central entity said it was. The sad part is, we're still using this form of currency today - in the form of your dollars, euros, yen, or anything of that sort.

In 2008 a man calling himself Satoshi Nakamoto decided he wanted to fix all this, and created the original cryptocurrency - Bitcoin. Bitcoin was a great and innovative idea, and it created the idea of the blockchain. The blockchain is a public ledger of every transaction that ever occurred, and as such could be verified by anyone.

Now it's 2016, and there are hundreds of other cryptocurrencies out there, so I'm going to explain to you the pros and cons of the larger ones, and why XEM is really the way to go.
Bitcoin
Bitcoin was the original. As we have seen, the original is not always best - but it still was innovative. It uses a public ledger called a blockchain for security, but that's about the only security measure added.

The ideas in Bitcoin are applied to both Ethereum and NEM, and a simple rundown of all of this can be found in this video, created around the time bitcoin started becoming popular.

The ideas behind bitcoin have been used in every cryptocurrency since, so it’s important to understand how a transaction in Bitcoin works.



As far as all the advantages of Bitcoin, NEM and Ethereum both do whatever Bitcoin can, but better.





Ethereum
If you want to know how a transaction works in Ethereum, look at the infographic about Bitcoin, it works the exact same way.

Ethereum is really big right now because it includes two main features over Bitcoin.

  • Smart Contracts allow you to write applications in the blockchain that usually run as programmed.
  • 'ASIC-proof' algorithm makes it profitable to mine for people without expensive hardware.
The asic-proof algorithm is still proof-of-work however, and so it suffers from the same exact pitfalls that Bitcoin suffers from.

The cryptocurrency community really loved smart contracts for a while. The way it was advertised was absolutely brilliant. "World Computer." "Applications that run as programmed with no possibility of downtime." Except, maybe it works a little too well.

Recently the largest smart contract in Ethereum was hacked, due to a fatal flaw with Solidity and how smart contracts work. This guy explains it well, which notes that if there is a smart contract vulnerability (which we just saw happen in an audited smart contract) - the hacker can legally take off with the funds. This is absolute heaven for a hacker.




NEM

NEM uses PoI, also known as proof-of-importance. This means that (unlike Bitcoin and Ethereum), NEM is environmentally friendly, and more secure. Unlike mining Bitcoin and Ethereum, network upkeep does not require hundreds and thousands of electricity hogging mining machines. A NEM node can be run on a computer as simple and cheap as a Raspberry Pi, which is only $35 and takes up very little electricity. PoI also encourages people to actually use NEM, rather than just hoard it. For a more detailed explanation, check out the previous article comparing PoW, PoS, and PoI.

NEM is also superior in security. It uses EigenTrust++ for node reputation, which is not used in any other cryptocurrency, and strengthens the security of the network considerably. It also uses localized spam protection, which shuts down spammers, and only the spammers, when the network is at full capacity. Both are only found in NEM.

NEM was built with a two tier design in mind as well. If you want a wallet, you don’t need a full node and a copy of the blockchain. Instead, you can just connect to any node, and have access to all the same features without trusting it. Even a malicious node has no access to your funds, and the worst it can do is just not work. In order to make sure that nodes continue to operate, the developers created a supernode program, which gives a greater incentive for people to maintain the network for years to come.

NEM isn’t only better in the security aspect, however. It also brings a lot of new or improved features to the table. Unlike Bitcoin, multi-signature accounts are on the blockchain, and do not require trusting a third party in order to have a multi-signature account. Ethereum does have contracts, but you need to write it yourself, which means that pretty much only developers can do it. As mentioned in the Ethereum section, this can be very, very hard to do right due to the language in which smart contracts are written. NEM has made making or editing multi-signature contracts as easy as a few clicks.

Another advanced feature is mosaics. This works similar to colored coins (custom currencies) in bitcoin, but is done completely on chain, rather than requiring the trust of a third party. The names of these colored coins are based off of namespaces, which are similar to how domain names work on the internet. Once a namespace is created, no one can claim the same one, and the owner can make unlimited subdomains.

A platform is never complete without messaging, however, and NEM includes either encrypted or unencrypted messaging between addresses, completely through the network. There’s even hex messaging for developers.

While Ethereum and Bitcoin are rewarding miners for making blocks, they aren’t giving incentives for running full nodes and supporting network throughput. NEM has a program called Supernodes that rewards people for running high powered nodes that serve light wallets with data quickly and securely. These rewards were set aside during the first block of the NEM network and are planned to last for years. In the event that the supernode funds do run out, there is always incentive to maintain the network. Anyone with 10,000 XEM can make a harvesting node, and collect transaction fees based on their PoI score. And instead of having to buy powerful and expensive mining equipment that uses high amounts of electricity, NEM harvesters can run a node on a computer as simple as a Raspberry Pi.

We are the first private/public blockchain, which is the same system that was used to create Linux, widely accepted as the most secure OS in the business world. NEM was built by experienced developers and was built for scalability and stability from day one. We are also currently the only platform that has been stress tested by banks and approved for financial use. Other currencies have been tested, but haven’t shared any proof, but all of our tests are open for anyone to see the results for.

NEM also has tried to make it as easy as possible for third parties to build on the blockchain. In platforms like Bitcoin most third party developers are locked into using one centralized service like Coinbase or Bitpay to build their ecosystem. This means that they rely on these services to build, update, and maintain APIs. And at Ethereum each developer will write their own code for contracts, which is much more versatile and flexible but as mentioned before comes with a risk that the developer must know exactly what they are doing. NEM on the other hand offers a full set of rich and easy to use JSON/restful APIs that work across the entire network with any node, and work with a large variety of calls including all transaction types.

All of this was built from 100% new code, and as such does not hit any of the pitfalls of the other two platforms. However, it can still benefit from the advantages of the other platforms, as it also uses the blockchain.

If you skipped all of that because you don’t like giant walls of text, here’s an easier to digest infographic.












Sunday, August 20, 2017

"C" is the center of Security


After every major breach, there is a wave of information on security technologies which could have prevented it. Conferences have hundreds of companies exhibiting their cool technologies, differentiators and success stories. The research firms publish informative findings, guides, quadrants, waves etc. and most products and solutions are good if implemented and used well. So, in some ways, we have abundance of security controls and technologies. 
We often talk about coming together to fight the bad actors. Industry groups of companies sharing experiences, Government and industry partnerships etc. are examples. These initiatives have been successful to the extent possible considering business and competitive pressures. 
There is a need for collaboration between security vendors to help clients manage risk. Risk managers don’t create a single dimensional posture with dependence on one technology, that goes against the basic principles of risk management. So, naturally the collaboration between security technologies is important.
This collaboration, the “C” in Security, is manifesting itself in three models:
  1. The Security Marketplaces: Large platform players have marketplaces (Splunkbase, IBM AppExchange, Cisco etc.) enabling other security products to publish their “apps”. The idea is clients of the platform's can simply download the apps for desired functionality. 
  2. The Security API’s: Companies have published API’s, enabling relevant integrations with other technologies. These API’s typically provide information & data which can be ingested for enhancements or actions.
  3. The Security Apps: To enhance its functionality and value, companies are creating apps for the marketplaces & integrations to the API's. This provides an excellent platform for niche companies to join the ecosystem, the Digital business of security.
The good news is that these models are not just driven out of the “good to have” need of collaboration, but have the “must have” commercial model and are utilizing the Digital way.
The “C” in Security is here to stay and grow. 


Three most Common Cloud Native Development and Deployment Model


1. Kubernetes: (Containers) Full control over infrastructureand maximum portability.

2. CloudFoundry: (Applications) Focus on the applications and let the platform handle the rest.

3. Apache OpenWhisk: (Functions) Auto-Scaled, event-driven applications that respond to a variety of triggers.













Sunday, August 13, 2017

Blockchain is the Future of Healthcare



Everything we know about health care can benefit from blockchain technology. Especially when it comes to medical records, sharing patient information, and making data more interoperable. Right now, there are a lot of intermediaries involved in sharing patient data as very few systems are compatible with one another. The blockchain will effectively create a new model for the exchange of health information. Electronic records are the way forward. However, they need to be made more efficient, secure, and no longer reliant on intermediaries.

Overcoming all of those challenges will not be easy, though. A blockchain-powered healthcare ecosystem would raise the bar as far as interoperability is concerned. Eliminating friction and reducing costs associated with using current systems will create a healthier ecosystem for all parties involved. Now is the time to capitalize on this technology and its momentum.

Additionally, using blockchain will help in generating valuable insights. Being able to see the proverbial bigger picture will benefit both patients and healthcare personnel alike. Moreover, it will help improve the value of care as well. All of this will require a universal blockchain system connecting health care facilities from all over the world. Using private blockchains which only require approved partners to examine and share data is not a viable strategy for the healthcare sector by any means.

“While digital technology has dramatically transformed the management of our financial information and transactions, most of us have not seen such gains when it comes to our health data. As the world watches the adoption of blockchain as a new model of decentralization and security for financial services, we believe the time has arrived when the health sector can finally offer the freedom to own our health data using a smart wallet to manage private and public keys with different levels of permission. This will enable patients to share data on an as needed basis with trusted providers.”

– Dr. Rhea Mehta

There are indeed economic, technical, and behavioral challenges that are faced by the current healthcare models. No one has built a large-scale blockchain ecosystem for the healthcare sector to date. That situation will come to change sooner rather than later, with companies like Bowhead Health working toward bringing blockchain technology to the medical sector as we speak. These unique opportunities need to be taken advantage of with properly developed proofs of concept.



Saturday, August 12, 2017

Open Source Dashboard Softwares - Business Intelligence


https://www.predictiveanalyticstoday.com/open-source-dashboard-software/


https://logz.io/blog/business-intelligence-tools/







Visualized history of the cryptocurrencies



http://mapofcoins.com/bitcoin


Ultimate Cryptocurrency Guide



https://cryptocoinmastery.com/the-ultimate-cryptocurrency-guide-everything-you-need-to-know/

Contents -





Tuesday, August 1, 2017

Why the DEVOPS model is here to stay






The DEVOPS model may seem like just another one of those terms which management gets enamored with every few years but it is much more than that. The DEVOPS model will probably be the main way that development is done going into the future. We are sure that not only will it be the main way to develop software but that it will bleed out into other industries as well and become the main way of designing and developing new products and services. Here are a few reasons why.

DEVOPS enables user-centric products
How many times have you been using software and gotten frustrated and asked “Who the hell made this software?”The answer is “a person who will never use the software”. Bad design almost always comes from developers and designers who don’t really know the use case of the software. DEVOPS involves users in the development phase of the software which allows it to be user-friendly at its core. This is why DEVOPS is so important – it allows the people who will end up utilizing the software make important contributions in its creation and design.

We cannot expect developers to know everything
There is a huge problem in software design and development which DEVOPS solves. Development can only be done by developers. However developers are almost never the end users of the software they are creating. Imagine that you want a software to be made which helps architects design houses and buildings. Who will you hire to build the software? Developers, obviously. Yet at the same time you cannot expect developers to know about architecture as well.

This problem is replicated in pretty much every industry out there. You want software that helps the agriculture industry? Developers don’t know anything about agriculture and agriculture experts don’t know anything about software. The DEVOPS model solves this big problem by enabling the users and developers to work together. If there are any fundamental flaws in the way the software operates they can be eliminated at the design phase before the developers waste their time creating software that is of no use to anyone.

It increases success levels
Due to the reasons discussed above, DEVOPS results in software that works better than ever before. There are many horror stories of companies that spent millions developing software and solutions that their employees could never use properly and they ended up taking a huge loss. DEVOPS ensures that horror stories like this won’t happen. Developers are free to focus on the features that will be the most helpful for users because the users are right there to tell them what they need. There is no guess-work involved and thus there is a much, much lower chance that things will end in failure. DEVOPS isn’t just a new way to work – it is the understanding that the mistakes which happened previously happened because of the information and communication gap between users and developers. The sooner we close this gap the better it will be for all of us.



Monday, July 31, 2017

Big Data and Hadoop Interview Questions and Answers













What is BIG DATA?

Big Data represents a huge and complex data that is difficult to capture, store, process, retrieve and analyze with the help of on-hand traditional database management tools.


What are the three major characteristics of Big Data?

According to IBM, the three characteristics of Big Data are:

Volume: Facebook generating 500+ terabytes of data per day.

Velocity: Analyzing 2 million records each day to identify the reason for losses.

Variety: images, audio, video, sensor data, log files, etc.


What is Hadoop?

Hadoop is a framework that allows distributed processing of large data sets across clusters of commodity hardware(computers) using a simple programming model.


What is the basic difference between traditional RDBMS and Hadoop?

Traditional RDBMS is used for transactional systems to store and process the data, whereas Hadoop is used to store and process large amount of data in the distributed file system.


What are the basic components of Hadoop?

HDFS and MapReduce are the basic components of hadoop.

HDFS is used to store large data sets and MapReduce is used to process such large data sets.


What is HDFS?

HDFS stands for Hadoop Distributed File System and it is designed for storing very large files with streaming data access patterns, running clusters on commodity hardware.


What is Map Reduce?

Map Reduce is a java based programming paradigm of Hadoop framework that provides scalability across various Hadoop clusters


How Map Reduce works in Hadoop?

MapReduce distributes the workload into two different jobs namely 1. Map job and 2. Reduce job that can run in parallel.

1.The Map job breaks down the data sets into key-value pairs or tuples.

2.The Reduce job then takes the output of the map job and combines the data tuples into smaller set of tuples.


What is a Name node?

Name node is the master node on which job tracker runs and consists of the metadata. It maintains and manages the blocks which are present on the data nodes. It is a high-availability machine and single point of failure in HDFS.


What is a Data node?

Data nodes are the slaves which are deployed on each machine and provide the actual storage. These are responsible for serving read and write requests for the clients.


What is a job tracker?

Job tracker is a daemon that runs on a name node for submitting and tracking MapReduce jobs in Hadoop. It assigns the tasks to the different task tracker. In a Hadoop cluster, there will be only one job tracker but many task trackers. If the job tracker goes down all the running jobs are halted.


How job tracker works?

When a client submits a job, the job tracker will initialize the job and divide the work and assign them to different task trackers to perform MapReduce tasks.


What is a task tracker?

Task tracker is also a daemon that runs on data nodes. Task Trackers manage the execution of individual tasks on slave node.


How task tracker works?

Task tracker is majorly responsible to execute the work assigned by the job tracker and while performing this action, the task tracker will be simultaneously communicating with job tracker by sending heartbeat.


What is Heart beat?

Task tracker communicate with job tracker by sending heartbeat based on which Job tracker decides whether the assigned task is completed or not. If the job tracker does not receive heartbeat from task tracker within specified time, then it will assume that task tracker has crashed and assign that task to another task tracker in the cluster.


Is Namenode machine same as datanode machine as in terms of hardware?

It depends upon the cluster you are trying to create. The Hadoop VM can be there on the same machine or on another machine. For instance, in a single node cluster, there is only one machine, whereas in the development or in a testing environment, Namenode and datanodes are on different machines.


What is a commodity hardware?

Commodity hardware is a non-expensive systems which is not of high quality or high-availability. Hadoop can be installed in any average commodity hardware. We don’t need super computers or high-end hardware to work on Hadoop.


Is Namenode also a commodity?

No. Namenode can never be a commodity hardware because the entire HDFS rely on it. It is the single point of failure in HDFS. Namenode has to be a high-availability machine.


What is a metadata?

Metadata is the information about the data stored in datanodes such as location of the file, size of the file and so on.


What is a daemon?

Daemon is a process or service that runs in background. In general, we use this word in UNIX environment. The equivalent of Daemon in Windows is “services” and in Dos is ” TSR”.


Are Namenode and job tracker on the same host?

No, in practical environment, Namenode is on a separate host and job tracker is on a separate host.


What is a ‘block’ in HDFS?

A ‘block’ is the minimum amount of data of default block size 64MB that can be read or written from or to the HDFS.


If a data Node is full how it’s identified?

When data is stored in datanode, then the metadata of that data will be stored in

the Namenode. So Namenode will identify if the data node is full.


If datanodes increase, then do we need to upgrade Namenode?

While installing the Hadoop system, Namenode is determined based on the size of

the clusters. Most of the time, we do not need to upgrade the Namenode because

it does not store the actual data, but just the metadata, so such a requirement

rarely arise.


On what basis Namenode will decide which datanode to write on?

As the Namenode has the metadata (information) related to all the data nodes, it knows which datanode is free.


Is client the end user in HDFS?

No, Client is an application which runs on your machine, which is used to interact with the Namenode (job tracker) or datanode (task tracker).


What is a rack?

Rack is a storage area with all the datanodes put together. These datanodes can be physically located at different places. Rack is a physical collection of datanodes which are stored at a single location. There can be multiple racks in a single location.


What is Hadoop Single Point Of Failure (SPOF)

If the Namenode fails, the entire Hadoop system goes down. This is called Hadoop Single Point Of Failure.


What is a Secondary Namenode?

The secondary Namenode constantly reads the data from the RAM of the Namenode and writes it into the hard disk or the file system.


Which are the three modes in which Hadoop can be run?

The three modes in which Hadoop can be run are:

1.standalone (local) mode

2.Pseudo-distributed mode

3.Fully distributed mode


What are the features of Stand alone (local) mode?

In stand-alone mode there are no daemons, everything runs on a single JVM. It has no DFS and utilizes the local file system. Stand-alone mode is suitable only for running MapReduce programs during development. It is one of the most least used environments.


What are the features of Pseudo mode?

Pseudo mode is used both for development and in the QA environment. In the

Pseudo mode all the daemons run on the same machine.


Can we call VMs as pseudos?

No, VMs are not pseudos because VM is something different and pesudo is very

specific to Hadoop.


What are the features of Fully Distributed mode?

Fully Distributed mode is used in the production environment, where we have ‘n’

number of machines forming a Hadoop cluster. Hadoop daemons run on a cluster

of machines. There is one host onto which Namenode is running and another host

on which datanode is running and then there are machines on which task tracker

is running. We have separate masters and separate slaves in this distribution.


In which directory Hadoop is installed?

Cloudera and Apache has the same directory structure. Hadoop is installed in

cd/usr/lib/hadoop/


What are the port numbers of Namenode, job tracker and task tracker?

The port number for Namenode is ’50070′, for job tracker is ’50030′ and for task

tracker is ’50060′.


What is the Hadoop-core configuration?

Hadoop core is configured by two xml files:

1.hadoop-default.xml which was renamed to

2.hadoop-site.xml.

These files are written in xml format. We have certain properties in these xml files,

which consist of name and value.


What are the Hadoop configuration files at present?

There are 3 configuration files in Hadoop:

1.core-site.xml

2.hdfs-site.xml

3.mapred-site.xml

These files are located in thehadoop/conf/subdirectory.


How to exit the Vi editor?

To exit the Vi Editor, press ESC and type :q and then press enter.


Which are the three main hdfs-site.xml properties?

The three main hdfs-site.xml properties are:

1.dfs.name.dir which gives you the location on which metadata will be stored and

where DFS is located – on disk or onto the remote.

2.dfs.data.dir which gives you the location where the data is going to be stored.

3.fs.checkpoint.dir which is for secondary Namenode.


What is Cloudera and why it is used?

Cloudera is the distribution of Hadoop. It is a user created on VM by default.

Cloudera belongs to Apache and is used for data processing.


How can I restart Namenode?

1.Click on stop-all.sh and then click on start-all.sh OR

2.Write sudo hdfs (press enter), su-hdfs (press enter), /etc/init.d/ha (press enter)

and then /etc/init.d/hadoop-namenode start (press enter).


What does ‘jps’ command do?

This command checks whether your Namenode, datanode, task tracker, job

tracker, etc are working or not.


How can we check whether Namenode is working or not?

To check whether Namenode is working or not, use the command

/etc/init.d/hadoop-namenode status.


How can we look for the Namenode in the browser?

If you have to look for Namenode in the browser, you don’t have to give

localhost:8021, the port number to look for Namenode in the brower is 50070.


Which files are used by the startup and shutdown commands?

Slaves and Masters are used by the startup and the shutdown commands.


What do slaves consist of?

Slaves consist of a list of hosts, one per line, that host datanode and task tracker

servers.


What do masters consist of?

Masters contain a list of hosts, one per line, that are to host secondary namenode

servers.


What does hadoop-env.sh do?

hadoop-env.sh provides the environment for Hadoop to run. JAVA_HOME is set

over here.


Can we have multiple entries in the master files?

Yes, we can have multiple entries in the Master files.


Where is hadoop-env.sh file present?

hadoop-env.sh file is present in the conf location.


In Hadoop_PID_DIR, what does PID stands for?

PID stands for ‘Process ID’.


What does /var/hadoop/pids do?

It stores the PID.


What does hadoop-metrics.properties file do?

hadoop-metrics.properties is used for ‘Reporting‘ purposes. It controls the reporting

for Hadoop. The default status is ‘not to report‘.


What are the network requirements for Hadoop?

The Hadoop core uses Shell (SSH) to launch the server processes on the slave

nodes. It requires password-less SSH connection between the master and all the

slaves and the secondary machines.


On which port does SSH work?

SSH works on Port No. 22, though it can be configured. 22 is the default Port

number.


Can you tell us more about SSH?

SSH is nothing but a secure shell communication, it is a kind of a protocol that

works on a Port No. 22, and when you do an SSH, what you really require is a

password.


Why password is needed in SSH localhost?

Password is required in SSH for security and in a situation where passwordless

communication is not set.


Do we need to give a password, even if the key is added in SSH?

Yes, password is still required even if the key is added in SSH.


What if a Namenode has no data?

If a Namenode has no data it is not a Namenode. Practically, Namenode will have

some data.


What happens to job tracker when Namenode is down?

When Namenode is down, your cluster is OFF, this is because Namenode is the

single point of failure in HDFS.


What happens to a Namenode, when job tracker is down?

When a job tracker is down, it will not be functional but Namenode will be present.

So, cluster is accessible if Namenode is working, even if the job tracker is not

working.


Can you give us some more details about SSH communication between Masters and the Slaves?

SSH is a password-less secure communication where data packets are sent across

the slave. It has some format into which data is sent across. SSH is not only between

masters and slaves but also between two hosts.


What is formatting of the DFS?

Just like we do for Windows, DFS is formatted for proper structuring. It is not

usually done as it formats the Namenode too.


Does the HDFS client decide the input split or Namenode?

No, the Client does not decide. It is already specified in one of the configurations

through which input split is already configured.


In Cloudera there is already a cluster, but if I want to form a cluster on Ubuntu can we do it?

Yes, you can go ahead with this! There are installation steps for creating a new

cluster. You can uninstall your present cluster and install the new cluster.


Can we create a Hadoop cluster from scratch?

Yes we can do that also once we are familiar with the Hadoop environment.



Understanding DevOps and why it’s important


If you’re a developer, you have probably started to hear the term DevOps being thrown around. If you haven’t, well, you will soon be hearing it around the office. Someone in your upper management will hear about it in a conference somewhere and come back asking his company’s people if they need to consider it. Soon there will a team formulated to discuss implementation of DevOps. What we are saying is, this is the right time to know what DevOps is. We all remember what happened when ‘Agile’ became the most popular model for development. Most people at the beginning had no idea what Agile meant and how they needed to change the way they work.

What exactly is DevOps?
DevOps is a new way of thinking about development and a new way of development itself. There has always been a huge divide between the development team and the operations team. The development team takes complete control of the development. It is their job to design the system, build the system, test the system, and then deliver it to the operations team. The operations team receives the fully developed and quality tested product and learns how to use it. Then the operations team uses it for their work and reports any errors of mistakes that pop up to the development team.

DevOps, as the name suggests, is the combination of development and operations. DevOps is based on the understanding that in order to build the most efficient systems it is important to involve the operations people in the development cycle. Too many of us have used software and wondered why it was designed so badly or why we had to work so hard to mold our work according to it. This disconnect between what the users need and what the product delivers has long been a problem in the industry. DevOps is a way of working that understands that while operations cannot develop, developers cannot develop the right thing either without the right inputs.

A very different development cycle
In DevOps, the operations team doesn’t move in once the development is complete. Instead, they are there from the start. They help design the system, they even help in development, they help in quality assurance and much more. The advantages of these are obvious; the product that is build is exactly what is needed to work at the highest efficiency in real world scenarios.

There’s also another huge advantage – the operations team is also much better at utilizing the product. Digital solutions are only useful if the users know how to use the powerful toolset they offer. By including the operations team right from the start they gain a deep understanding of how everything works and allows them to exploit the solution to their full potential. Instead of waiting to be trained after the solution has been developed, they see it being developed and develop an understanding not just of the UI but the underlying infrastructure as well. DevOps is the future because it results in the best development and deployment.

My Blog List

Networking Domain Jobs