Skip to contents

Introduction

This notebook explains how to compute social smells for a given open source project. Before we begin, it is important to understand what data is required to compute social smells, what Kaiaulu will do and also what will not do for you.

Social smell metrics requires both collaboration and communication data. Collaboration data is extracted from version control systems, while communication data can be obtained from whatever the project of interest uses for developers to communicate.

Obtaining collaboration data is relatively painless: You need only clone the version control system locally. Currently Kaiaulu only supports analysis in Git, but in the future it may support other version control systems. Obtaining communication data, however, requires more effort on your part depending on the project you choose. This is because there is a large variety of communication mediums and archive types open source projects use.

Broadly, open source projects use mailing lists, issue tracker (comments), or both for communication.

Example of mailing list archive types are GNU Mailing List Manager, Google Groups, Mail Archive, Apache’s MOD Mbox, Free Lists, Discourse, etc. On March 2021, even GitHub launched its own built-in communication medium, Discussions.

Examples of issue tracker types include JIRA, GitHub’s built-in Issue Tracker, GitLab’s Issue Tracker, Monorail (used in Google’s Chromium), Trac, Bugzilla, etc. You may also be interested in including discussion that occurs in GitHub’s pull requests / GitLab merge requests. Of the above, GitHub Issue Tracker comments, GitHub Pull Requests comments (download_github_comments.Rmd), and JIRA comments (download_jira_data.Rmd) are currently supported. See the associated vignettes for details on how to download the data.

It is of course not viable for Kaiaulu to implement interfaces to every single archive type out there. Therefore, to calculate social smells, we expect you can obtain a .mbox representation of the mailing list of interest. This may be available from the open source project directly (e.g. Apache projects mod_mbox (see download_mod_mbox.Rmd) and pipermail (see the R/download.R/download_pipermail()), or via a crawler someone already made. For example, gg_scraper outputs a mbox file from Google Group mailing lists (although it can only obtain partial information of the e-mails, as Google Groups truncate them, which may pose limitations to some steps of the analysis of the identity matching discussed below).

The bottom line is, the required effort to obtain the mailing list data will vary depending on the open source project of interest, as open source projects may even transition over time through different archive types. Once you have available in your computer both git log, and at least one source of communication data (e.g. mbox, jira, or github), you are ready to proceed with the social smells analysis of this notebook.

Libraries

Please ensure the following R packages are installed on your computer.

Project Configuration File (Parameters Needed)

The parameters necessary for analysis are kept in a project configuration file to ensure reproducibility. In this project, we will use the Apache’s Helix open source project. Refer to the conf/ folder on Kaiaulu’s git repository for Helix and other project configuration files. It is in this project configuration file we specify where Kaiaulu can find the git log and communication sources from Helix. We also specify here filters of interest: For example, if Kaiaulu should ignore test files, or anything that is not source code.

At the scope of this notebook, only the first branch (top) specified in the project configuration file will be analyzed. Refer to the CLI interface if you are interested in executing this analysis over multiple branches.

We also provide the path for tools.yml. Kaiaulu does not implement all available functionality from scratch. Conversely, it will also not expect all dependencies to be installed. Every function defined in the API expects as parameter a filepath to the external dependency binary. Tools.yml is a convenience file that stores all the binary paths, so it can be set once during setup and reused multiple times for analysis. You can find an example of tools.yml on the github repo from Kaiaulu root directory. For this notebook, you will need to install Perceval (use version 0.12.24) and OSLOM. Instructions to do so are available in the Kaiaulu README.md. Once you are finished, set the “perceval,” “oslom_dir,” and “oslom_undir” paths in your tools.yml.

tools_path <- "../tools.yml"
conf_path <- "../conf/helix.yml"

tool <- yaml::read_yaml(tools_path)
scc_path <- tool[["scc"]]

oslom_dir_path <- tool[["oslom_dir"]]
oslom_undir_path <- tool[["oslom_undir"]]

conf <- yaml::read_yaml(conf_path)

perceval_path <- tool[["perceval"]]
git_repo_path <- conf[["version_control"]][["log"]]
git_branch <- conf[["version_control"]][["branch"]][1]

start_commit <- conf[["analysis"]][["window"]][["start_commit"]]
end_commit <- conf[["analysis"]][["window"]][["end_commit"]]
window_size <- conf[["analysis"]][["window"]][["size_days"]]

mbox_path <- conf[["mailing_list"]][["mbox"]]
github_replies_path <- conf[["issue_tracker"]][["github"]][["replies"]]
jira_issue_comments_path <- conf[["issue_tracker"]][["jira"]][["issue_comments"]]




# Filters
file_extensions <- conf[["filter"]][["keep_filepaths_ending_with"]]
substring_filepath <- conf[["filter"]][["remove_filepaths_containing"]]

The remainder of this notebook does not require modifications. If you encounter an error in any code block below, chances are one or more parameters above have been specified incorrectly, or the project of choice may have led to an outlier case. Please open an issue if you encounter an error, or if not sure post on discussions in Kaiaulu’s GitHub. E-mailing bugs is discouraged as it is hard to track.

Parsing Input Data

As stated in the introduction, we need both git log and at least one communication source (here named replies) to compute social smells. Therefore, the first step is to parse the raw data.

Parse Gitlog

To get started, we use the parse_gitlog function to extract a table from the git log. You can inspect the project_git variable to inspect what information is available from the git log.

git_checkout(git_branch,git_repo_path)
## [1] "Your branch is up to date with 'origin/master'."
project_git <- parse_gitlog(perceval_path,git_repo_path)
project_git <- project_git  %>%
  filter_by_file_extension(file_extensions,"file_pathname")  %>% 
  filter_by_filepath_substring(substring_filepath,"file_pathname")

Parse Replies

Next, we parse the various communication channels the project use. Similarly to parse_gitlog, the returned object is a table, which we can inspect directly in R to see what information is available.

We also have to parse and normalize the timezone across the different projects. Since one of the social metrics in the quality framework is the count of different timezones, we separate the timezone information before normalizing them.

project_git$author_tz <- sapply(stringi::stri_split(project_git$author_datetimetz,
                                                          regex=" "),"[[",6)
project_git$author_datetimetz <- as.POSIXct(project_git$author_datetimetz,
                                            format = "%a %b %d %H:%M:%S %Y %z", tz = "UTC")


project_git$committer_tz <- sapply(stringi::stri_split(project_git$committer_datetimetz,
                                                          regex=" "),"[[",6)
project_git$committer_datetimetz <- as.POSIXct(project_git$committer_datetimetz,
                                            format = "%a %b %d %H:%M:%S %Y %z", tz = "UTC")

We apply the same logic above to the communication channels available. Not all of the communication channels may be in use in a given project, but you will want to ensure you accurately chose all the development communication used in a project. You can usually find a “How to contribute” page on the project website, which specifies the used sources.

Remember: Kaiaulu will not throw errors if you omit relevant sources of developer communication, instead the computed smells will be higher than what they should (as developers will deem to have not communicated because the source where they communicate was simply not included in the analysis!).

project_mbox <- NULL
project_jira <- NULL
project_github_replies <- NULL



if(!is.null(mbox_path)){
  project_mbox <- parse_mbox(perceval_path,mbox_path)  
  
  project_mbox$reply_tz <- sapply(stringi::stri_split(project_git$reply_datetimetz,
                                                            regex=" "),"[[",6)
  
  project_mbox$reply_datetimetz <- as.POSIXct(project_mbox$reply_datetimetz,
                                        format = "%a, %d %b %Y %H:%M:%S %z", tz = "UTC")

  
}
if(!is.null(jira_issue_comments_path)){
  project_jira <- parse_jira_replies(parse_jira(jira_issue_comments_path)) 
  
  # Timezone is embedded on separated field. All times shown in UTC.
  project_jira$reply_tz <- "0000"
  
  project_jira$reply_datetimetz <- as.POSIXct(project_jira$reply_datetimetz,
                                        format = "%Y-%m-%dT%H:%M:%S.000+0000", tz = "UTC")
}
if(!is.null(github_replies_path)){
  project_github_replies <- parse_github_replies(github_replies_path)  
  
  
  # Timezone is not available on GitHub timestamp, all in UTC
  project_github_replies$reply_tz <- "0000"
  
  project_github_replies$reply_datetimetz <- as.POSIXct(project_github_replies$reply_datetimetz,
                                        format = "%Y-%m-%dT%H:%M:%S", tz = "UTC")

}

For the included reply types, we combine them. While we sort both tables here for clarity if the tables are explored, sorting is not assumed in any of the remaining analysis.

# All replies are combined into a single reply table. 
project_reply <- rbind(project_mbox,
                      project_jira,
                      project_github_replies)

project_git <- project_git[order(author_datetimetz)]
 
project_reply <- project_reply[order(reply_datetimetz)]

#project_reply <- project_reply[reply_datetimetz >= start_date & reply_datetimetz <= end_date]

Smells

Having parsed both git log and replies, we are ready to start computing the social smells. Social smells are computed on a “time window” granularity. For example, we may ask “between January 2020 and April 2020, how many organizational silos are identified in Helix?”. This means we will inspect both the git log and mailing list for the associated time period, perform the necessary transformations in the data, and compute the number of organizational silos.

So we begin by specifying how large our time window should be, in days:

window_size <- window_size # 90 days

Kaiaulu will then perform a non-overlapping time window of every 3 months of git log history and mailing list to identify the number of organizational silos, missing link and radio silence social smells.

We “slice” the git log and mailing list tables parsed earlier in window_size chunks, and iterate in a for loop on each “slice”.

Within a slice we do the following:

  1. Apply identity matching (the same person may have multiple identities) inter and intra sources.
  2. Construct a network representation of both the git log and the mailing list.
  3. Apply, if necessary, a projection of the networks (depending on the network function used)
  4. We apply community detection algorithms, necessary to calculate some of the social smells
  5. We compute the social smells

There is a large variety of customization in the above 5 steps. We will discuss them briefly here, as they directly impact the quality of your results.

Identity Matching

It is very common authors have multiple names and e-mails within and across git log, mailing lists, issue trackers etc. There is no perfect way to identify all identities of an individual, only heuristics. Kaiaulu has a large number of unit tests, each capturing a different example of how people use their e-mails. However, this is not magic. It is possible, and it has been encountered before, cases where a software is used in an open source project that may entirely compromise the analysis if done blind. For example, a common heuristic used for identity matching is to consider two accounts to have the same identity if either the First+Last name OR E-mail have an exact match. In one open source project we analyzed, we found a software was being used that masked all users with commit access as “”. Using this heuristic would have, therefore, compromised the entire data.

Using the code below, you can manually inspect project_git and project_reply in R the assigned identity to the various users. I strongly encourage you to do so. It is also possible to specify the identity_match() function to consider only names, instead of name and e-mails, to avert the example above. If you find a edge case where the identity is incorrectly assigned, please open an issue so we can add the edge case. You may also manually correct the identity numbers, before executing the remaining code blocks, to improve the accuracy of the results.

At this point, it is also important to consider how the author and e-mail (if available) are obtained on the various sources. In particular, currently Kaiaulu can only obtain name data from JIRA. For GitHub and Pull Request comments, only users who had at least one commit will contain name and e-mail (otherwise the user name will be their github ID). These limitations are due directly to limitations of the sources respective API. If you believe additional information can be obtained, please open an issue with suggestions!

#Identity matching
project_log <- list(project_git=project_git,project_reply=project_reply)
project_log <- identity_match(project_log,
                                name_column = c("author_name_email","reply_from"),
                                assign_exact_identity,
                                use_name_only=TRUE,
                                label = "raw_name")

project_git <- project_log[["project_git"]]
project_reply <- project_log[["project_reply"]]

Remember: Social smells rely heavily on patterns of collaboration and communication. If the identities are poorly assigned, the social smells will not reflect correctly the project status (since in essence several people considered to be communicating with one another, are the same individual!).

Constructing a Network Representation

As mentioned in the introduction, there are multiple types of mailing list archives out there, and it may be more sensible to use an issue tracker instead of a mailing list, or a combination of both depending on the project. Besides data types, there are also different types of transformations that can be done, when we transform the data in networks. This notebook implements the bipartite transformation (see parse_gitlog_network()). It is also possible to use a temporal transformation (see parse_gitlog_temporal_network()). The choice of transformation impacts the direction and overall type of network that will be generated, so it is important you understand how this impact your research conclusions. A similar transformation could be applied to mailing lists, but it is not yet implemented. Because we use bipartite in the code block below, we also perform a bipartite projection. These are well known operations in graph theory which also impact the interpretability of the results.

Another transformation that you can choose is whether the analysis should be done on files, or entities (e.g. functions). See parse_gitlog_entity_network() and parse_gitlog_entity_temporal_network() for entities. You may choose functions, classes, and other more specific types of entities depending on the language of interest (e.g. typedef structs for C).

A third choice we make here is whether the collaboration being analyzed is done by authors or committers. Normally a open source project has both. In the code block below, we analyze authors. If you are interested in committers, or potentially their interaction, see the available parameters of parse_gitlog().

Community Detection

For some social smells, such as radio silence and primma donna, community detection is required to be applied to the constructed networks. Do consider the implications of the one chosen below in your results.

# Define all timestamp in number of days since the very first commit of the repo 
# Note here the start_date and end_date are in respect to the git log.

# Transform commit hashes into datetime so window_size can be used
start_date <- get_date_from_commit_hash(project_git,start_commit)
end_date <- get_date_from_commit_hash(project_git,end_commit)
datetimes <- project_git$author_datetimetz
reply_datetimes <- project_reply$reply_datetimetz

# Format time window for posixT
window_size_f <- stringi::stri_c(window_size," day")

# Note if end_date is not (and will likely not be) a multiple of window_size, 
# then the ending incomplete window is discarded so the metrics are not calculated 
# in a smaller interval
time_window <- seq.POSIXt(from=start_date,to=end_date,by=window_size_f)

# Create a list where each element is the social smells calculated for a given commit hash
smells <- list()
size_time_window <- length(time_window)
for(j in 2:size_time_window){
  
   # Initialize
  commit_interval <- NA
  start_day <- NA
  end_day <- NA
  org_silo <- NA
  missing_links <- NA
  radio_silence <- NA
  primma_donna <- NA
  st_congruence <- NA
  communicability <- NA
  num_tz <- NA
  code_only_devs <- NA
  code_files <- NA
  ml_only_devs <- NA
  ml_threads <- NA
  code_ml_both_devs <- NA
  
  i <- j - 1

  # If the time window is of size 1, then there has been less than "window_size_f"
  # days from the start date.
  if(length(time_window)  == 1){
    # Below 3 month size
    start_day <- start_date
    end_day <- end_date 
  }else{
    start_day <- time_window[i]
    end_day <- time_window[j] 
  }
  
  
  # Note: The start and end commits in your project config file should be set so 
  # that the dates cover overlapping date ranges in bothproject_git_slice and project_reply_slice dates.
  # Double-check your project_git and project_reply to ensure this is the case if an error arises.
  
  # Obtain all commits from the gitlog which are within a particular window_size
  project_git_slice <- project_git[(author_datetimetz >= start_day) & 
                     (author_datetimetz < end_day)]
  
  # Obtain all email posts from the reply which are within a particular window_size
  project_reply_slice <- project_reply[(reply_datetimetz >= start_day) & 
                       (reply_datetimetz < end_day)]
  
  # Check if slices contain data
  gitlog_exist <- (nrow(project_git_slice) != 0)
  ml_exist <- (nrow(project_reply_slice) != 0)
  
 
  # Create Networks 
  if(gitlog_exist){
    i_commit_hash <- data.table::first(project_git_slice[author_datetimetz == min(project_git_slice$author_datetimetz,na.rm=TRUE)])$commit_hash

    j_commit_hash <- data.table::first(project_git_slice[author_datetimetz == max(project_git_slice$author_datetimetz,na.rm=TRUE)])$commit_hash
    
    # Parse networks edgelist from extracted data
    network_git_slice <- transform_gitlog_to_bipartite_network(project_git_slice,
                                                               mode="author-file")
    
    # Community Smells functions are defined base of the projection networks of 
    # dev-thread => dev-dev, and dev-file => dev-dev. This creates both dev-dev via graph projections
    
    git_network_authors <- bipartite_graph_projection(network_git_slice,
                                                      mode = TRUE,
                                                      weight_scheme_function = weight_scheme_sum_edges)
    
    code_clusters <- community_oslom(oslom_undir_path,
                                     git_network_authors,
                                     seed=seed,
                                     n_runs = 1000,
                                     is_weighted = TRUE)
    
  }
  if(ml_exist){
    network_reply_slice <- transform_reply_to_bipartite_network(project_reply_slice)
    
    
    reply_network_authors <- bipartite_graph_projection(network_reply_slice,
                                                      mode = TRUE,
                                                      weight_scheme_function = weight_scheme_sum_edges)    
    
    # Community Detection
    
    mail_clusters <- community_oslom(oslom_undir_path,
              reply_network_authors,
              seed=seed,
              n_runs = 1000,
              is_weighted = TRUE)
      
  }
  # Metrics #
  
  if(gitlog_exist){
    commit_interval <- stri_c(i_commit_hash,"-",j_commit_hash)
    # Social Network Metrics 
    code_only_devs <- length(unique(project_git_slice$identity_id))
    code_files <- length(unique(project_git_slice$file_pathname))
    
  }  
  if(ml_exist){
    # Smell
    
    radio_silence <- length(smell_radio_silence(mail.graph=reply_network_authors, 
                                                          clusters=mail_clusters))
    
    # Social Technical Metrics
    ml_only_devs <- length(unique(project_reply_slice$identity_id))
    ml_threads <- length(unique(project_reply_slice$reply_subject))
  }
  if (ml_exist & gitlog_exist){
    # Smells 
    org_silo <- length(smell_organizational_silo(mail.graph=reply_network_authors,
                                                          code.graph=git_network_authors))

    missing_links <- length(smell_missing_links(mail.graph=reply_network_authors,
                                                code.graph=git_network_authors))
    # Social Technical Metrics
    st_congruence <- smell_sociotechnical_congruence(mail.graph=reply_network_authors,
                                                     code.graph=git_network_authors)
#    communicability <- community_metric_mean_communicability(reply_network_authors,git_network_authors)
    num_tz <- length(unique(c(project_git_slice$author_tz,
                              project_git_slice$committer_tz,
                              project_reply_slice$reply_tz)))
    code_ml_both_devs <- length(intersect(unique(project_git_slice$identity_id),
                                          unique(project_reply_slice$identity_id)))
    
  }

  # Aggregate Metrics
  smells[[stringi::stri_c(start_day,"|",end_day)]] <- data.table(commit_interval,
                                                                 start_datetime = start_day,
                                                                 end_datetime = end_day,
                                                                 org_silo,
                                                                 missing_links,
                                                                 radio_silence,
                                                                 #primma_donna,
                                                                 st_congruence,
                                                                 #communicability,
                                                                 num_tz,
                                                                 code_only_devs,
                                                                 code_files,
                                                                 ml_only_devs,
                                                                 ml_threads,
                                                                 code_ml_both_devs)
}
smells_interval <- rbindlist(smells)

Community Inspection per Time Slice

This shows the last loop slice:

project_collaboration_network <- recolor_network_by_community(git_network_authors,code_clusters)

gcid <- igraph::graph_from_data_frame(d=project_collaboration_network[["edgelist"]], 
                      directed = FALSE,
                      vertices = project_collaboration_network[["nodes"]])

visIgraph(gcid,randomSeed = 1)

It may appear counter-intuitive the 5 connected nodes are not considered a single community, however recall the algorithm also consider the weights among the nodes. In this case, the algorithm did not consider the weight sufficiently high among the nodes to consider it a community.

You can also observe the identity match algorihtm in action and its potential implications: Different identities matched to the same author are separated by the | ). Had it not performed as intended, single nodes would appear separately and very likely connected, thus biasing the social metrics.

project_collaboration_network <- recolor_network_by_community(reply_network_authors,mail_clusters)

gcid <- igraph::graph_from_data_frame(d=project_collaboration_network[["edgelist"]], 
                      directed = FALSE,
                      vertices = project_collaboration_network[["nodes"]])

visIgraph(gcid,randomSeed = 1)

Other Metrics

The remainder of this notebook does not compute any social smells. It provide some popular metrics commonly reported in software engineering literature, which may be useful to you when interpreting the social smells. While their granularity is not at “time window” level, they are computed like so in order to be placed in the same table of social smells after being aggregated to the same granularity.

Churn

churn <- list()

for(j in 2:length(time_window)){
  i <- j - 1
  
  # If the time window is of size 1, then there has been less than "window_size_f"
  # days from the start date.
  if(length(time_window)  == 1){
    # Below 3 month size
    start_day <- start_date
    end_day <- end_date 
  }else{
    start_day <- time_window[i]
    end_day <- time_window[j] 
  }
  
  # Obtain all commits from the gitlog which are within a particular window_size
  project_git_slice <- project_git[(author_datetimetz >= start_day) & 
                     (author_datetimetz < end_day)]
  
  gitlog_exist <- (nrow(project_git_slice) != 0)
  if(gitlog_exist){
    # The start and end commit
    i_commit_hash <- data.table::first(project_git_slice[author_datetimetz == min(project_git_slice$author_datetimetz,na.rm=TRUE)])$commit_hash
    j_commit_hash <- data.table::first(project_git_slice[author_datetimetz == max(project_git_slice$author_datetimetz,na.rm=TRUE)])$commit_hash

    
    # The start and end datetime
    #start_datetime <- first(project_git_slice)$author_datetimetz
    #end_datetime <- last(project_git_slice)$author_datetimetz
    
    commit_interval <- stri_c(i_commit_hash,"-",j_commit_hash)
    churn[[commit_interval]] <- data.table(commit_interval,
                                #           start_datetime,
                                #           end_datetime,
                        churn=metric_churn_per_commit_interval(project_git_slice),
                        n_commits = length(unique(project_git_slice$commit_hash)))  
  }
}
churn_interval <- rbindlist(churn)

Line Metrics

time_window <- seq.POSIXt(from=start_date,to=end_date,by=window_size_f)
#time_window <- seq(from=start_daydiff,to=end_daydiff,by=window_size)
line_metrics <- list()
for(j in 2:length(time_window)){
  i <- j - 1
  
  # If the time window is of size 1, then there has been less than "window_size_f"
  # days from the start date.
  if(length(time_window)  == 1){
    # Below 3 month size
    start_day <- start_date
    end_day <- end_date 
  }else{
    start_day <- time_window[i]
    end_day <- time_window[j] 
  }
  
  # Obtain all commits from the gitlog which are within a particular window_size
  project_git_slice <- project_git[(author_datetimetz >= start_day) & 
                     (author_datetimetz < end_day)]
  
  gitlog_exist <- (nrow(project_git_slice) != 0)
  if(gitlog_exist){
  
    i_commit_hash <- data.table::first(project_git_slice[author_datetimetz == min(project_git_slice$author_datetimetz,na.rm=TRUE)])$commit_hash
    # Use the ending hash of that window_size to calculate the flaws
    j_commit_hash <- data.table::first(project_git_slice[author_datetimetz == max(project_git_slice$author_datetimetz,na.rm=TRUE)])$commit_hash

    # Checkout to commit of interest
    git_checkout(j_commit_hash, git_repo_path)
    # Run line metrics against the checkedout commit
    commit_interval <- stri_c(i_commit_hash,"-",j_commit_hash)
    line_metrics[[commit_interval]] <- parse_line_metrics(scc_path,git_repo_path)
    line_metrics[[commit_interval]]$commit_interval <- commit_interval
    line_metrics[[commit_interval]]$git_checkout <- j_commit_hash
    line_metrics[[commit_interval]] <- line_metrics[[commit_interval]][,.(commit_interval,
                                                                          git_checkout,
                                                                          Location,
                                                                          Lines,
                                                                          Code,
                                                                          Comments,
                                                                          Blanks,
                                                                          Complexity)]
    
    # Filter Files
    line_metrics[[commit_interval]] <- line_metrics[[commit_interval]]  %>%
    filter_by_file_extension(file_extensions,"Location")  %>% 
    filter_by_filepath_substring(substring_filepath,"Location")
  }
}
# Reset Repo to HEAD
git_checkout(git_branch,git_repo_path)
## [1] "Your branch is up to date with 'origin/master'."
line_metrics_file <- rbindlist(line_metrics)
line_metrics_interval <- line_metrics_file[,.(Lines = sum(Lines),
                                 Code = sum(Code),
                                 Comments = sum(Comments),
                                 Blanks = sum(Blanks),
                                 Complexity = sum(Complexity)),
                              by = c("commit_interval","git_checkout")]

Merge Churn, Smells, and Line Metrics

dt <- merge(smells_interval,churn_interval,by="commit_interval")
dt <- merge(dt,line_metrics_interval,by="commit_interval")
kable(dt)
commit_interval start_datetime end_datetime org_silo missing_links radio_silence st_congruence num_tz code_only_devs code_files ml_only_devs ml_threads code_ml_both_devs churn n_commits git_checkout Lines Code Comments Blanks Complexity
1103fecb67def5e610a7b22636ba4ac25e23777b-371b972565aedda872d0190a9c5c36eb682882fb 2018-09-16 18:19:46 2018-12-15 18:19:46 7 13 25 0.1333333 4 7 146 29 228 5 8159 67 371b972565aedda872d0190a9c5c36eb682882fb 104236 65975 25949 12312 10287
16df4ccd05923879fc4bd0cd6d0017996ac766bc-fc6009feca610756dd134eb070d07b0c4918c1ba 2016-06-28 18:19:46 2016-09-26 18:19:46 2 2 20 0.3333333 4 3 104 20 44 2 13420 87 fc6009feca610756dd134eb070d07b0c4918c1ba 73059 45547 18599 8913 7011
1737cd2aca3246967b89e7a3767e29cf0c8a0fb9-594d94aca6d2bcc461e809c53d4ae2ee2d96cd0a 2016-09-26 18:19:46 2016-12-25 18:19:46 0 0 1 1.0000000 4 2 54 21 96 2 5337 43 594d94aca6d2bcc461e809c53d4ae2ee2d96cd0a 70191 43639 18032 8520 6689
1a37d5c692c85db20d0c61e2f72659f3724e125e-52b47197a9ef8bf52a227ce6fdc6e512d161b408 2021-06-02 18:19:46 2021-08-31 18:19:46 0 1 13 0.6666667 2 6 57 13 96 6 2841 29 52b47197a9ef8bf52a227ce6fdc6e512d161b408 142453 88002 38289 16162 12925
1f683b863df23f16bd893fc675f88ed8b7f3d3b8-0f7c3e42080ba8e2b17e36ca1c5c51c6209b0f03 2016-03-30 18:19:46 2016-06-28 18:19:46 1 1 15 0.0000000 4 2 64 15 30 1 6301 43 0f7c3e42080ba8e2b17e36ca1c5c51c6209b0f03 70911 44098 18209 8604 6775
310457c2462345127f6a1e5e133bd8d19c4e5482-384978a2e16ab0f4adb388e32c7e448c77996ca2 2016-12-25 18:19:46 2017-03-25 18:19:46 5 9 11 0.1000000 4 7 125 26 61 5 14003 82 384978a2e16ab0f4adb388e32c7e448c77996ca2 74134 46175 18903 9056 7125
350691015253443577dc7e3d34a08d81cd511189-7f05db2c1c2be03379d58f86723bc549863deb0a 2021-03-04 18:19:46 2021-06-02 18:19:46 1 4 19 0.3333333 3 9 74 19 203 8 4676 38 7f05db2c1c2be03379d58f86723bc549863deb0a 140356 86645 37802 15909 12760
39bf0ad429660797d70a0a1b53a2105e4ec1cd50-6f5ca159af4236ad1ad709133bc5cb9ce823dc83 2020-06-07 18:19:46 2020-09-05 18:19:46 0 23 9 0.2812500 3 11 752 27 391 11 13321 85 6f5ca159af4236ad1ad709133bc5cb9ce823dc83 135864 83532 36777 15555 12272
4079396612e8b92b7da161c3befbf6fa10800592-79f94cedae2dca5f57113bc80f7cd1af91b0ce31 2015-10-02 18:19:46 2015-12-31 18:19:46 1 1 15 0.0000000 4 2 27 19 39 1 6720 15 79f94cedae2dca5f57113bc80f7cd1af91b0ce31 64611 40113 16621 7877 6070
58f407a7229b44b0a4849511f9831c3de13e88ca-b6b89de5cf00c1d1d1cba2cd09fcd25054b2247d 2015-12-31 18:19:46 2016-03-30 18:19:46 1 2 28 0.0000000 4 3 69 28 52 2 8567 37 b6b89de5cf00c1d1d1cba2cd09fcd25054b2247d 66723 41509 17106 8108 6327
6930d1dcf19c8acbc1476f68c4e589179b6dacc0-0c3ac37b0b442f20d08eaba86da7d94ec1494d1f 2018-06-18 18:19:46 2018-09-16 18:19:46 0 8 24 0.1111111 3 5 119 32 341 5 11323 68 0c3ac37b0b442f20d08eaba86da7d94ec1494d1f 101269 63803 25377 12089 10061
75904ef969af01d052c9d3b649c2af5da1412154-912e7943e3c2a6e70f2427bc3295e52828f37b73 2019-12-10 18:19:46 2020-03-09 18:19:46 14 29 8 0.0333333 4 14 400 27 290 11 34100 89 912e7943e3c2a6e70f2427bc3295e52828f37b73 123218 76510 32404 14304 11470
7aaa89d8abe5eb8b1002df32662d74338cde9dd4-3870470761745a00e73650cf589f852be356a3a6 2017-06-23 18:19:46 2017-09-21 18:19:46 0 4 21 0.2000000 4 5 130 21 57 5 17205 63 3870470761745a00e73650cf589f852be356a3a6 86192 54453 21224 10515 8499
7b3e525f4b3fd9a52d5fb92a5d1365dedc8ee2ba-f9c710b5dcedd0e1182ca2d612a55d3827f86a39 2017-03-25 18:19:46 2017-06-23 18:19:46 1 6 4 0.3333333 4 6 129 26 103 5 487499 95 f9c710b5dcedd0e1182ca2d612a55d3827f86a39 246888 157773 74473 14642 21012
8a279a366d4e6c43366a1c5867a02f26768e5627-8819220738b18c54652e4b32b9677ea78d585da2 2015-04-05 18:19:46 2015-07-04 18:19:46 0 1 32 0.0000000 3 2 15 32 65 2 815 6 8819220738b18c54652e4b32b9677ea78d585da2 63537 39527 16267 7743 5962
8f6091b05f1d9995bc95cf819d91e11e71292c2c-b43681634820ea5990851efae941e1123c5b4c54 2018-12-15 18:19:46 2019-03-15 18:19:46 8 13 11 0.0714286 4 8 124 28 189 6 7737 48 b43681634820ea5990851efae941e1123c5b4c54 106047 67102 26476 12469 10495
8fa068344e0f8029a9fee6f41c1c3838ba86cdf1-6bb6e2c010055c9df2e1945cea428cfe660d7a24 2013-04-15 18:19:46 2013-07-14 18:19:46 12 13 9 0.0000000 4 7 57 30 379 3 6047 38 6bb6e2c010055c9df2e1945cea428cfe660d7a24 66622 45062 13860 7700 4934
9a1c205d986ca6a60609b2467d264222f01daa16-c37d1e81f5dddcf552ac6d896c7def45e42e7053 2014-04-10 18:19:46 2014-07-09 18:19:46 1 1 18 0.0000000 3 2 132 18 175 1 19689 29 c37d1e81f5dddcf552ac6d896c7def45e42e7053 60366 37631 15343 7392 5676
a18deb06c28e25e79a9a773069a79ce21a2399b3-bdf4dbd01a42570f267d8d604a4e192755af18d4 2020-12-04 18:19:46 2021-03-04 18:19:46 1 3 15 0.5000000 3 8 67 15 95 7 2474 26 bdf4dbd01a42570f267d8d604a4e192755af18d4 139897 86187 37818 15892 12676
a92a11d6a969bd848964cdf834ea3e3505269c7b-8c5e63ab263d2cbdf1f17bb98335afb69974be99 2014-07-09 18:19:46 2014-10-07 18:19:46 1 1 10 0.0000000 3 2 33 41 325 1 1503 16 8c5e63ab263d2cbdf1f17bb98335afb69974be99 61359 38297 15573 7489 5786
ad0c2edb31a4d0e5c455d1fa0f96658bac1382d5-cf7b7a50d9ef38e039123d004a85ce7b087dc196 2019-06-13 18:19:46 2019-09-11 18:19:46 7 19 9 0.1363636 5 11 156 29 442 8 11264 67 cf7b7a50d9ef38e039123d004a85ce7b087dc196 114540 71994 29192 13354 11083
ae2e3cb4d40f15cd89ba9b99831d0feb1b64bdd8-0616e972b318c66fdd8f5ce787fc8670aa9459dd 2019-03-15 18:19:46 2019-06-13 18:19:46 2 2 5 0.5000000 4 7 78 19 57 5 6159 65 0616e972b318c66fdd8f5ce787fc8670aa9459dd 107977 68389 26915 12673 10627
b23f983ae2f2bf92cf7b3a12ed24b0cc9eb4e9a6-c92428023a6b8456c0e0ecce0649e61ea2575863 2013-10-12 18:19:46 2014-01-10 18:19:46 1 1 11 0.0000000 4 2 77 33 387 1 10151 23 c92428023a6b8456c0e0ecce0649e61ea2575863 60097 37045 15636 7416 5491
b65f6ec92b62c1aed180c93b90f989a9dcd44f69-310d47660b2c31b569ddfb98058ce5877441095f 2017-09-21 18:19:46 2017-12-20 18:19:46 11 12 9 0.0769231 4 8 333 20 77 3 190844 65 310d47660b2c31b569ddfb98058ce5877441095f 89403 56462 22136 10805 8909
b72ff29d1fc2845affb9ee943396424c5a7e5721-4965bec738d3c748ff1acb7760770ccad90c3bd8 2015-07-04 18:19:46 2015-10-02 18:19:46 1 2 24 0.0000000 4 3 25 27 51 2 842 9 4965bec738d3c748ff1acb7760770ccad90c3bd8 63704 39638 16307 7759 5972
ba1628e76e43555c376ebef2293c88131a4f8c85-58b78f6a4f16913053f7250c667a91bb31fd6225 2013-07-14 18:19:46 2013-10-12 18:19:46 2 2 26 0.0000000 3 3 399 43 582 1 47147 32 58b78f6a4f16913053f7250c667a91bb31fd6225 60111 37181 15432 7498 5375
d5687fa41d091420f78023d950a9dc33f5e769ab-e94a9f5f90099a248181d6dc50314aec0e8d9512 2015-01-05 18:19:46 2015-04-05 18:19:46 3 3 35 0.0000000 4 4 30 38 161 2 2125 16 e94a9f5f90099a248181d6dc50314aec0e8d9512 63379 39408 16250 7721 5928
de238d68efbc63374470aa15ca552e81147a1cc9-e2e3fec2daaba859f8b9ff40c9bb08720f63db02 2014-10-07 18:19:46 2015-01-05 18:19:46 0 1 17 0.6666667 3 3 73 47 160 3 2195 16 e2e3fec2daaba859f8b9ff40c9bb08720f63db02 62320 38812 15929 7579 5838
e43c65b797842d2a1f01405c73cc040c44670b5e-914b6780267fb029b05b774f253ed0dc628067fb 2012-10-17 18:19:46 2013-01-15 18:19:46 3 3 14 0.0000000 6 3 708 14 307 0 143232 35 914b6780267fb029b05b774f253ed0dc628067fb 57465 39404 11534 6527 4201
e49d26b73cf6b8c7d31d4d43bc1d473a29558dca-751d746abb6cb6a1fc6d8b6717d96ab06a68af00 2019-09-11 18:19:46 2019-12-10 18:19:46 12 17 22 0.0555556 5 11 299 22 236 8 9571 78 751d746abb6cb6a1fc6d8b6717d96ab06a68af00 109819 69543 27422 12854 10836
e5728469e02f196690654b4f7f2ed8ca9130a631-5e4e26cc8bfcc04236e87a721a39070f7be238c8 2018-03-20 18:19:46 2018-06-18 18:19:46 5 13 17 0.1333333 4 8 124 37 246 7 7931 74 5e4e26cc8bfcc04236e87a721a39070f7be238c8 98024 61761 24533 11730 9666
e690aff80eae7120a7392e430ce3b9e9c83a1a49-6b70b77838287dadc2bfc6717f8043bd87a2c0ef 2013-01-15 18:19:46 2013-04-15 18:19:46 10 10 23 0.0000000 4 5 130 29 285 0 19368 68 6b70b77838287dadc2bfc6717f8043bd87a2c0ef 63768 43214 13242 7312 4726
e9428536d940852c927dac59667017dcf1fa9a56-c1ab0b5ed27c777ef63bfc7247415b6928e72906 2017-12-20 18:19:46 2018-03-20 18:19:46 2 9 9 0.1818182 4 8 95 38 159 6 5117 49 c1ab0b5ed27c777ef63bfc7247415b6928e72906 91484 57723 22733 11028 9104
f678f79f34be5061c91a687b7ff826198d1daca9-2a37f3493c2600c3530ca5fe2373663809f3fd54 2014-01-10 18:19:46 2014-04-10 18:19:46 2 2 26 0.0000000 4 3 64 26 382 1 6560 21 2a37f3493c2600c3530ca5fe2373663809f3fd54 63521 39554 16059 7908 5845
f901b2f4e1bffcfe19a2d7c5d088f3661a682b02-2e30348d1447f0a107e6c19f59e38d37662787fa 2020-09-05 18:19:46 2020-12-04 18:19:46 0 7 9 0.5882353 4 9 98 27 269 9 5680 64 2e30348d1447f0a107e6c19f59e38d37662787fa 138186 85030 37408 15748 12477
fcf78cdb1b114a5424ecb179536d9c2931e9bba1-842bf1f9290cf30e946ff2bc8377b3e2f6014554 2020-03-09 18:19:46 2020-06-07 18:19:46 8 28 18 0.2000000 3 12 147 25 288 10 10846 82 842bf1f9290cf30e946ff2bc8377b3e2f6014554 133950 83077 35456 15417 12256