Package: RecordLinkage 0.4-12.4

RecordLinkage: Record Linkage Functions for Linking and Deduplicating Data Sets

Provides functions for linking and deduplicating data sets. Methods based on a stochastic approach are implemented as well as classification algorithms from the machine learning domain. For details, see our paper "The RecordLinkage Package: Detecting Errors in Data" Sariyar M / Borg A (2010) <doi:10.32614/RJ-2010-017>.

Authors:Murat Sariyar [aut, cre], Andreas Borg [aut]

RecordLinkage_0.4-12.4.tar.gz
RecordLinkage_0.4-12.4.zip(r-4.5)RecordLinkage_0.4-12.4.zip(r-4.4)RecordLinkage_0.4-12.4.zip(r-4.3)
RecordLinkage_0.4-12.4.tgz(r-4.4-x86_64)RecordLinkage_0.4-12.4.tgz(r-4.4-arm64)RecordLinkage_0.4-12.4.tgz(r-4.3-x86_64)RecordLinkage_0.4-12.4.tgz(r-4.3-arm64)
RecordLinkage_0.4-12.4.tar.gz(r-4.5-noble)RecordLinkage_0.4-12.4.tar.gz(r-4.4-noble)
RecordLinkage_0.4-12.4.tgz(r-4.4-emscripten)RecordLinkage_0.4-12.4.tgz(r-4.3-emscripten)
RecordLinkage.pdf |RecordLinkage.html
RecordLinkage/json (API)
NEWS

# Install 'RecordLinkage' in R:
install.packages('RecordLinkage', repos = c('https://sym33.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Datasets:

On CRAN:

This package does not link to any Github/Gitlab/R-forge repository. No issue tracker or development information is available.

73 exports 6 stars 3.21 score 47 dependencies 7 dependents 11 mentions 430 scripts 3.2k downloads

Last updated 2 years agofrom:b324521498. Checks:OK: 7 NOTE: 2. Indexed: yes.

TargetResultDate
Doc / VignettesOKSep 18 2024
R-4.5-win-x86_64NOTESep 18 2024
R-4.5-linux-x86_64NOTESep 18 2024
R-4.4-win-x86_64OKSep 18 2024
R-4.4-mac-x86_64OKSep 18 2024
R-4.4-mac-aarch64OKSep 18 2024
R-4.3-win-x86_64OKSep 18 2024
R-4.3-mac-x86_64OKSep 18 2024
R-4.3-mac-aarch64OKSep 18 2024

Exports:[.RecLinkData[.RecLinkResult[.RLBigData[.RLResult%append%beginblockfldfunclassifySupvclassifyUnsupclearclonecompare.dedupcompare.linkagecountpatterndeleteNULLseditMatchemClassifyemWeightsepiClassifyepiWeightserrorMeasuresfsClassifyfsWeightsgenSamplesgetColumnNamesgetErrorMeasuresgetExpectedSizegetFalsegetFalseNeggetFalsePosgetFrequenciesgetMatchCountgetMinimalTraingetNACountgetNonMatchCountgetPairsgetPairsBackendgetParetoThresholdgetPatternCountsgetSQLStatementgetTablegetThresholdsgpdEsthasWeightsinit_sqlite_extensionsisFALSEjarowinklerlevenshteinDistlevenshteinSimloadRLObjectmakeBlockingPairsmrlmygllmnextPairsoptimalThresholdplotMRLprint.summaryRLBigDataDedupprint.summaryRLBigDataLinkageprint.summaryRLResultresampleRLBigDataDedupRLBigDataLinkagesaveRLObjectsoundexsplitDatasummary.RecLinkDatasummary.RecLinkResultsummary.RLBigDataDedupsummary.RLBigDataLinkagesummary.RLResulttexSummarytrainSupvunorderedPairs

Dependencies:adabitbit64blobcachemclassclicodetoolscpp11data.tableDBIdiagramdigeste1071evdfastmapfffuturefuture.applyglobalsglueipredKernSmoothlatticelavalifecyclelistenvMASSMatrixmemoisennetnumDerivparallellypkgconfigplogrprodlimprogressrproxyRcpprlangrpartRSQLiteshapeSQUAREMsurvivalvctrsxtable

Classes for record linkage of big data sets

Rendered fromBigData.rnwusingknitr::knitron Sep 18 2024.

Last update: 2020-04-09
Started: 2012-01-11

Record Linkage with Extreme Value Theory

Rendered fromEVT.rnwusingknitr::knitron Sep 18 2024.

Last update: 2020-04-09
Started: 2012-01-11

Supervised Classification

Rendered fromSupervised.rnwusingknitr::knitron Sep 18 2024.

Last update: 2020-04-09
Started: 2012-01-11

Weight-based deduplication

Rendered fromWeightBased.rnwusingknitr::knitron Sep 18 2024.

Last update: 2022-11-08
Started: 2012-01-11

Readme and manuals

Help Manual

Help pageTopics
Concatenate comparison patterns or classification results%append% %append%,RecLinkData,RecLinkData-method %append%,RecLinkResult,RecLinkResult-method %append%-methods
Supervised ClassificationclassifySupv classifySupv,RecLinkClassif,RecLinkData-method classifySupv,RecLinkClassif,RLBigData-method classifySupv-methods
Unsupervised ClassificationclassifyUnsup
Serialization of record linkage object.clone clone,RLBigData-method clone,RLResult-method clone-methods loadRLObject saveRLObject saveRLObject,RLBigData-method saveRLObject,RLResult-method saveRLObject-methods
Compare Recordscompare.dedup compare.linkage
Remove NULL ValuesdeleteNULLs
Edit Matching StatuseditMatch editMatch,RecLinkData-method editMatch,RLBigData-method editMatch-methods
Weight-based Classification of Data PairsemClassify emClassify,RecLinkData,ANY,ANY-method emClassify,RecLinkData,missing,missing-method emClassify,RLBigData,ANY,ANY-method emClassify,RLBigData,missing,missing-method emClassify,RLBigData-method
Calculate weightsemWeights emWeights,RecLinkData-method emWeights,RLBigData-method emWeights-methods
Classify record pairs with EpiLink weightsepiClassify epiClassify,RecLinkData-method epiClassify,RLBigData-method epiClassify-methods
Calculate EpiLink weightsepiWeights epiWeights,RecLinkData-method epiWeights,RLBigData-method epiWeights-methods
Class '"ff_vector"'ff_vector-class
Class '"ffdf"'ffdf-class
Generate Training SetgenSamples
Calculate Error MeasureserrorMeasures getErrorMeasures getErrorMeasures,RecLinkResult-method getErrorMeasures,RLResult-method getErrorMeasures-methods
Estimate number of record pairs.getExpectedSize getExpectedSize,data.frame-method getExpectedSize,RLBigDataDedup-method getExpectedSize,RLBigDataLinkage-method getExpectedSize-methods
Get attribute frequenciesgetFrequencies getFrequencies,RLBigData-method getFrequencies-methods
Create a minimal training setgetMinimalTrain getMinimalTrain,RecLinkData-method getMinimalTrain,RLBigData-method getMinimalTrain-methods
Extract Record PairsgetFalse getFalseNeg getFalsePos getPairs getPairs,RecLinkData-method getPairs,RecLinkResult-method getPairs,RLBigData-method getPairs,RLResult-method getPairs-methods
Estimate Threshold from Pareto DistributiongetParetoThreshold getParetoThreshold,RecLinkData-method getParetoThreshold,RLBigData-method getParetoThreshold-methods
Build contingency tablegetTable getTable,RecLinkResult-method getTable,RLResult-method getTable-methods
Estimate Threshold from Pareto DistributiongpdEst
Check for FALSEisFALSE
Generalized Log-Linear Fittingmygllm
Optimal Threshold for Record LinkageoptimalThreshold optimalThreshold,RecLinkData-method optimalThreshold,RLBigData-method optimalThreshold-methods
Phonetic Codephonetics soundex
Class "RecLinkClassif"RecLinkClassif RecLinkClassif-class
Class "RecLinkData"RecLinkData-class
Record Linkage Data ObjectRecLinkData RecLinkData.object
Class "RecLinkResult"RecLinkResult-class
Record Linkage Result ObjectRecLinkResult
Safe Samplingresample
Class "RLBigData"RLBigData-class
Constructors for big data objects.RLBigDataDedup RLBigDataLinkage
Class "RLBigDataDedup"RLBigDataDedup-class
Class "RLBigDataLinkage"RLBigDataLinkage-class
Test data for Record Linkageidentity.RLdata10000 identity.RLdata500 RLdata10000 RLdata500
Class "RLResult"RLResult-class
Show a RLBigData objectshow show,RLBigData-method
Split DatasplitData
Stochastic record linkage.fsClassify fsClassify,RecLinkData-method fsClassify,RLBigData-method fsClassify-methods fsWeights fsWeights,RecLinkData-method fsWeights,RLBigData-method fsWeights-methods
String Metricsjaro jarowinkler levenshtein levenshteinDist levenshteinSim strcmp winkler
Subset operator for record linkage objects[.RecLinkData [.RecLinkResult [.RLBigData [.RLResult
Print Summary of Record Linkage Datasummary.RecLinkData summary.RecLinkResult
summary methods for '"RLBigData"' objects.print.summaryRLBigDataDedup print.summaryRLBigDataLinkage summary.RLBigData summary.RLBigDataDedup summary.RLBigDataLinkage
Summary method for '"RLResult"' objects.print.summaryRLResult summary,RLResult-method summary.RLResult
Train a ClassifiertrainSupv
Create Unordered PairsunorderedPairs