分享
三行代码  ›  专栏  ›  技术社区  ›  hk2

基于组[重复]筛选多个引用 - Filter multiple occurrences based on group [duplicate]

  •  1
  • hk2  · 技术社区  · 1 周前

    我有一个如下所述的数据集:

    df=data.frame(Supplier_id=c("1","2","7","7","7","4","5","8","12","7"), Supplier=c("Tian","Yan","Goldy","Goldy","Goldy","Amy","Lauren","Cassy","Shaan","Goldy"),Date=c("1/17/2019","4/30/2019","11/29/2018","11/29/2018","11/29/2018","5/21/2018","5/23/2018","5/24/2018","6/15/2018","6/20/2018"),Buyer=c("Unclassified","Unclassified","Kelly","Kelly","Kelly","Kelly","Amanda","Echo","Shao","Shao"))
    
    df$Supplier_id=as.numeric(as.character(df$Supplier_id))
    
    

    因此,df如下所示:

    
    | Supplier_id | Supplier | Date       | Buyer        |
    |-------------|----------|------------|--------------|
    | 1           | Tian     | 1/17/2019  | Unclassified |
    | 2           | Yan      | 4/30/2019  | Unclassified |
    | 7           | Goldy    | 11/29/2018 | Kelly        |
    | 7           | Goldy    | 11/29/2018 | Kelly        |
    | 7           | Goldy    | 11/29/2018 | Kelly        |
    | 4           | Amy      | 5/21/2018  | Kelly        |
    | 5           | Lauren   | 5/23/2018  | Amanda       |
    | 8           | Cassy    | 5/24/2018  | Echo         |
    | 12          | Shaan    | 6/15/2018  | Shao         |
    | 7           | Goldy    | 6/20/2018  | Shao         |
    
    

    现在,我要筛选出每个唯一买家只出现一次的供应商ID。例如,在上述数据集中,供应商ID“1”和“2”属于“未分类”买方,但由于它们具有不同的ID,因此我不希望它们出现在最终输出中。然而,当我们看到买方“kelly”时,它有两个供应商ID,“7”和“4”,其中,“7”出现3次,而“4”只出现一次。因此,输出表应该有供应商id为7的记录。分组应基于“买方”。因此,需要注意的是,由于“Kelly”和“Shao”都存在供应商ID“7”,但对于这两个买家,应该对其进行不同的分组,而不是一起考虑。

    预期产出应为:

    | Supplier_id | Supplier |       Date | Buyer_id |
    |-------------|:--------:|-----------:|----------|
    | 7           |   Goldy  | 11/29/2018 | Kelly    |
    | 7           |   Goldy  | 11/29/2018 | Kelly    |
    | 7           |   Goldy  | 11/29/2018 | Kelly    |
    

    我试过使用group_by和filter,但这不起作用,因为每个买家都有不同的供应商ID。我也试过使用duplicate,但不确定如何对每个买家的供应商ID进行分组。

    df <-df %>% group_by(Buyer) %>% filter(Supplier_id>1)
    

    还有这个

    df2=df[duplicated(df[1]) | duplicated(df[1], fromLast=TRUE),]
    

    编辑:原始数据集有许多这样的实例,每个买方有N个不同的供应商ID。 还有什么其他方法可以获得所需的输出?

    1 回复  |  直到 1 周前
        1
  •  2
  •   Shree    1 周前

    我想你需要-

    df %>% group_by(Supplier_id, Buyer) %>% filter(n() > 1)