Following are the most growing technology interview questions which is common asked in every Big Data interview.


1). The maximum amount of data you treated? How did you deal with them? Result of the processing.

2). Told me two analysis or computer science-related project? How do you measure the results were?

3). What is: to enhance the value, key performance indicators, robustness, model by adaptation, experimental design, 2/8 principles?

4). What is: collaborative filtering, n-grams, map reduce, cosine distance?

5). Clickstream data should be processed in real time? Why? Which part should be processed in real time?

6). Which do you think is better: a good data simultaneously or a good model of how you define "good" in all cases the presence of a common model you have you do not know the definition of some of the models are not so good????

7). What is the probability of the merger (AKA fuzzy fusion)? Easy to use SQL processing or other languages? For processing semi-structured data you choose which language to use?

8). How do you deal with the lack of data? What do you recommend to use processing techniques?

9). You participated in the design database and data model it?

10). Are you involved in the design and selection of indicators dashboard? You for business intelligence and reporting tools have any ideas?

11). What features you like TD database?

12). How do you intend to send one million e-mail marketing campaigns. How to optimize you send? How you optimize the reaction rate? You can optimize these two parts to open it?

13). How to make a web crawler faster and better information and better extract summary data to obtain a clean database?

14). How to design a solution to plagiarism?

15). How to test a personal payment accounts are more personal use?

16). If there are several customers ORACLE database query efficiency is very low. Why? What can you do to improve the speed more than 10 times, and can better handle a large number of output?

17). How to convert unstructured data into structured data? Does it really necessary to do this conversion? To save data into a flat text file is better than the deposit into a relational database?

18). What is a hash table collision attack? How to avoid? What is the frequency of occurrence?

19). The difference between SAS, R, Python, Perl language is?

20). What is the curse of big data?

21). How to distinguish a good load balancing mapreduce process? What is load balancing?

22). What is the star model? What is a lookup table?

23). You can use excel build a logistic regression model it? How can, explain the building process?

24). On SQL, Perl, C ++, Python and other programming process, to be optimized to enhance the speed-dependent code or algorithm it? How and enhance how much?

25). What is your favorite programming language? Why?

26). For your favorite statistical software to tell you like and do not like three reasons.

27). Why Naive Bayes difference? How do you use the Naive Bayes algorithm to improve the reptile test?

28). You are treated white-list it? The main rule? (In the case of fraud or crawling inspection)

29). How to define and measure the predictive power of an indicator?

30). How to create a keyword classification?

31). Using the five days to complete the 90% accuracy of solutions or spend 10 days to complete the 100% accuracy of the solution? Depends on what?

32). The definition: QA (quality assurance), Six Sigma, experimental design. Good and bad experimental design can cite cases?

33). How do you create a new anonymous digital account?

34). You have not thought about their own business? What kind of ideas?

35). Please illustrate how mapreduce work? Well under what scenarios work? Cloud security issues are there?

36). (in the case of memory meet) do you think is a good 100 small hash table or a large hash tables, for internal or runs it? For database analysis of evaluation?

37). Zillow's algorithm How does it work?

38). There is scientific data which you most admire? Where to start?

39). What is a botnet? How to detect?

40). How do you prove that you bring the improved algorithm is really effective without any change compared to? Your A / B test done it?

41). What is sensitivity analysis? Has a lower sensitivity (ie better robustness) and low ability to predict good or just the opposite? How do you use cross-validation? You insert the concentrated noise in the data and thus to The sensitivity of the model to see how the idea of ​​testing?

42). For about logistic regression, decision trees, neural networks. In the past 15 years, these techniques do what big improvement?

43). In addition to the principal component analysis outside you also use other data reduction technology? How would you like regression? Are you familiar with the stepwise regression technique have? Dimensionality reduction of data or sample what better time than the full data?

44). What defect ordinary linear regression model is? Do you know the other regression model it?

45). Do you think the decision tree leaf number less than 50 is better than big? Why?

46). Do you think the account and password login box will disappear? It will be what alternative?

47). You used the time series model it? Correlation Delays? Correlation diagram? Spectroscopy? Signal processing and filtering technology? In what kind of scene?

48). Actuarial whether it is a branch of statistics? If not, why how?

49). Gives a non-compliance and non-compliance with the Gaussian distribution lognormal data cases. It gives a very chaotic distribution of the number of cases.

50). Why MSE is not a good indicator to measure the model? You suggest an alternative use which indicators?

51). How do you propose a nonparametric confidence interval?

52). You are familiar with extreme value theory, Monte Carlo mathematical logic, or other statistical methods to properly assess the probability of occurrence of a sparse event?

53). What is the attribution analysis? How to Identify Cause and correlation coefficient? Examples.

54). What is a good, fast computational complexity of clustering algorithm? What is a good clustering algorithm? How you decided to gather the number of a cluster?

55). You are familiar with price optimization, price elasticity, inventory management, competitive intelligence it? Respectively to the case.

56). You have had the experience of using the API interface it? What kind of API? Is Google or Amazon or software services immediately?

57). When its own ID code than using the data scientists have developed a good package better?

58). Visualized using what tools? In terms of plot, how do you evaluate Tableau? R? SAS? In an effective figure shows five dimensions?

59). You know that the use of statistical or computational science in the "rule of thumb"? Or in business analysis.

60). What is the proof of concept?

61). You primarily work with what clients: internal, external, sales / finance department / marketing department / IT departments who have consulting experience you had dealings with suppliers, including supplier selection and testing??.

62). What makes a graphic misleading, difficult to understand or explain? A useful graphic feature?

63). You are familiar with the software life cycle it? And the life cycle of IT projects, maintenance requirements to the project from income?

64). What is the cron task?

65). You are interested in how to start scientific data?

66). What is the efficiency curve? What is their flaw is, how do you overcome these drawbacks?

67). What is the recommended engine? How does it work?

68). What is the precise test? How and when simulation can help us not to use sophisticated testing?

69). You think how you can become a good data scientist?

70). Do you think the data scientist is an artist or scientist?

71). Tell me some of the data science 'best practice cases"


  1. Thanks for practical answers! I haven’t been reading such informative and instructive answers for a long time. Recently I have been interested of VDR technology for business and made some virtual data room (like Ideals) comparison in order to figure out what is better. I’d like to know what you think of this technology.



Contact Form