Big Data Interview


What do you think of the following big data funtions?


Collaborating as part of a cross-functional Agile team to create and enhance software that enables state of the art, next generation Big Data applications and tools

Java is at the heart of the big data movement. Agile development is integral to java development. Agile methodologies would be helpful with a complex implementation. It is important to understand Agile methodologies and team interaction related to Agile development. Several on line courses can help with agile methods. Agile in the real world, Agile fundamentals, Agile Requirements Process: From Idea to Minimum Viable Product, and many more. Many books can help with agile methodologies. Companies will adopt forms of agile development such as scrum. It is important to be on board with each type. Positive team interaction is key to delivering expected results.

Working with customers to creatively solve complex business problems with customized Big Data solutions

Big Data Implementations are high profile projects. It can be tricky to satisfied different concerns. Technology solutions have to fit into the normal user community expectations. In other words the solution has to be a viable tool for the user. The solution has to provide value to the company. I have been fortunate to be able to work on high profile projects for many years. These are great projects that really affect the success of the company. It is helpful to have a background in Accounting and finance, Management Business Administration and knowledge of the business. I have helped credit card companies, banks, insurance companies, pharmaceuticals and each project was critical to management needs.


Building efficient storage protocols for unstructured data


Today is a great time to be in technology. Open source has really come along. Storage has changed. You now have the ability to store any type of data on affordable hardware. As a design engineer I am able to solve many problems without huge buildup of costly development.


Utilizing Hadoop modules such as YARN & MapReduce, and related Apache projects such as Hive, Hbase, Pig, and Cassandra

Hortonworks and cloudera are two companies that help with development environments on virtual images. This makes it possible to stay up to speed on the versions of all the main modules. MapReduce is the one area you can put some extra time and organize your libraries to be prepared to handle unique requirements. There is usually a need to get the data out. This sounds easy but it is difficult when you want it fast. So, having someone who spends some extra time on providing quick and accurate access is usually needed and hard to find. Since hadoop is relatively new you can't find someone with 5+ years of experience in this area. Most developers know mapreduce jobs are very rare. This is another reason you can't find a huge supply of help in this area.


Developing data-enabling software utilizing open source frameworks or projects such as Spring, Angular JS, SOLR, Drools, etc


This is another great area for open source developers. SOLR or elasticsearch are great tools to help create searching front ends for users. AngularJS and bootstrap help create incredible tools that are responsive. This means one design works on all devices. This site is built with angularjs, bootstrap and wordpress for the blog. Java spring has many open source cms implementations to choice from. I have my favorites and usually like open source versions versus costly proprietary ones. Building full stack cms corporate systems is a lot of fun today. There are many resources available.


Determining your level of effort to develop code, and owning your commitments to your team


You have to be responsible for your development. Ownership is everything. If you see a project without ownership you know it will fail.


Leveraging development frameworks such as continuous integration and test driven development to enable the rapid delivery of working code


I have never had the luxury of delivering late. I was doing TDD before they made a book about it. OOP created the ability to deliver powerful solutions. It also allows software to have hundreds of ways to do the same thing.


Performing unit tests and conducting reviews with other team members to make sure your code is rigorously designed, elegantly coded, and effectively tuned for performance


This is great to do. I start with unit testing right away. I team review is great when it is available.


Your experience probably includes a few of these, or if not, you’re ready to start learning:

BS in Computer Science or related field, or professional experience in either developing software or developing code for data movement


Both! I have an Electrical Engineering Degree from Drexel University when the emphasis was on computer design and software development. All of my classes were software design related. I was at the bleeding edge of technology. I was teaching my instructors. I was consulting while going to school full time. As the industry evolved I provided large companies with a resource to take advantage of the new technologies. I have always been leading edge. I temper that with a practicality. I want to use the best technology without taking a risk. Each high profile software development involved corporate data. I have done it all. I have reported to upper management on marketing results reporting and merge purge direct mail support, human resources, finance, accounting, budgeting, dashboards and most business intelligence projects. When I had some extra time I would help with operations related project. I am an expert with the commodity equipment used today. I have worked with credit card data for many different companies. I was always surprised at how many people did not understand how to get clean and accurate data. I am an expert sql developer. I work with mongo or any nosql. I stay current with subscription services and constantly learn new build methodologies. I have a knack for seeing what no one else sees in data. I have built many memory oriented servers for performance and have the proper background to tackle any technology.


Experience working both independently and collaboratively to solve problems and deliver high quality results in a fast-paced, unstructured environment

This is what I live for! This is what makes it fun. If anyone could do it. It wouldn't be as much fun.


Experience developing software solutions to build out capabilities on a Big Data Platform

I like json as a vehicle to move data around so I look for tools that use json by default. I also spend time on mapreduce because I know it is so critically needed when files are huge and people are waiting for data. I prefer to work with known sources of interfaces. I find it hard to locate companies that want to be software houses.


Experience with the various tools & frameworks that enable capabilities within the Hadoop ecosystem (MapReduce, YARN, Pig, Hive, Hbase) and NoSQL


I have worked with all the above. I have some of my own favorites not listed.


Experience developing front-end and back-end solutions written in Java


I am a full stack developer. Front end web is going to be mostly html css and javascript. I am able to create any front ends with these tools. I use the latest technologies. I have the full adobe suite for graphics and web development. I am doing less java gui front ends. I have access to continuous training and stay up to speed on all java full stack development. I also work with all the jvm based languages. My favorite is scala.


Proficiency with scripting languages (Python, Perl, JavaScript, Shell)


I worked with Python and Django on openstack projects. I work with perl a little. I am a full stack javascript developer with nodejs mean stack. I work with shell.


Experience designing, developing, and implementing ETL and relational database systems


Expert. I have been with this stuff from the beginning.


Experience working with automated build and continuous integration systems (Maven, Jenkins) and test-driven development and unit testing frameworks (jUnit, xUnit, Nose)


You have to be up to speed on all build tools. You have to be willing to put up with their shortcomings and lack of documentation. The same goes for testing frameworks.


Experience with UNIX/Linux including basic commands, shell scripting and system administration/configuration


Linux is my primary operating system. I support my own web site for clients. I offer hosting services. I use virtualization for operations sanity. My favorite linux is ubuntu and have been using that since 2007. I use apache along with Tomcat


Experience with data mining, machine learning, statistical modeling tools or underlying algorithms


Spark offers libraries for all of these in scala and python. This is were I spend most of my time. I am happy to see how spark just works for most of all the standard queries. I also spend time with R.


Robust portfolio of shipped code on GitHub and/or open source contributions of which you are proud to share


I am a big user of github as a customer. I usually have legal restrictions on my sharing.


Your interests:

You geek out over obscure sports statistics. You ponder what drove your streaming music service’s algorithms to accurately predict that you’d like the song you’re listening to. Nate Silver asks you who’s going to win the next election. You love data


I love to figure it out. I love to make the data tell me something. I love to make it affect the bottom line. After collecting data and delivering it to upper management, I was asked to tell them if I saw anything they were missing. I thought they were kidding. My primary role was to create a start to finish secure data collection system. I was not brought in as a financial analyst. This is a great story. I save them an obscene amount of money. After that I was asked to consult on all data collections.


You get a thrill out of using large data sets, some of them slightly messy, to answer real-world questions


Same as previous answer. I want to see if something of value is in there. I want it to hit the bottom line in some way.


You yearn to be a part of cutting edge, high profile projects and are motivated by delivering world-class solutions on an aggressive schedule


That's me! I am just as interested in using the tools that will do the job. I like to think I can stay bleeding edge without being out on a limb. So, I guess I am not reckless.


You are passionate about finding refined solutions to complex coding challenges and helping the entire team meet its commitments


I have always been the lead technical guy. I have the serious design and software background to handle anything.


You love learning new technologies and mentoring more junior developers


I am the coach! I wrote a book on mentoring.


Humor and fun are a natural part of your flow

That is my pick.


#ilovedata #bigdata #transforminganalytics

Basic Qualifications:

Bachelor’s Degree in Computer Science, Computer Engineering, or military experience

EE Electrical Engineer


At least 1 year of professional work experience coding in data management, data warehousing, or unstructured data environments


20+

Preferred Qualifications:

2+ years in-depth experience with the Hadoop stack (MapReduce, Pig, Hive, Hbase)


2+ using in development environment


2+ years of experience with NoSQL implementation (Cassandra a plus)


Work with both mostly configuration oriented.


5+ years experience developing Java based software solutions


5+ using  randomly - full training from OCJA OCJP OCJD - continued education


5+ years experience in at least one scripting language (Python, Perl, JavaScript, Shell)


many scripting languages including the above and PHP javascript


5+ years experience developing software solutions to solve complex business problems

20++


5+ years experience with Relational Database Systems and SQL


20+ sql expert


5+ years experience designing, developing, and implementing ETL


Many years of designing my own but am aware of some great open source tools


5+ years experience with UNIX/Linux including basic commands and shell scripting


linux is my primary operating system and operating system of choice with 5+ years easily supporting all operations functions related to hosting web sites.